Ceph Calculator

This calculator helps you to calculate the usable storage capacity of your ceph cluster. You define each node and the capacity and the calculator will tell you your storage capability.

In this calculator we assume that we want to have absolutely no problems if even the biggest node fails. So no degraded PGs and no dataloss. This means, the remaining nodes should be able to save all the data from the biggest node that could fail.

Accept degraded PGs (not recommended)

Define each node manually

Capacity per Node (TB):

Total Node Count:

Number of replicas

Raw Storage	{{total}} TB
Usable Storage Size	{{totalSize}} TB
Storage Efficiency	{{safeEfficiency}} %
OSD Nearfull Ratio	{{optimalRatio}}

Explanation

Settings

Accept degraded PGs: When ceph thinks there should be 3 replicas of a dataset and an one OSD in the PG is offline, the PG will be in a degraded state. This can be desired if you are quick to rotate broken/unavailable OSDs. If not, it can result in the pool running full and loosing data.

Define each node manually: You can also enter the total space per node manually if the capacity on nodes differs. You can also name them if you want, this is just for cosmetic purposes.

Capacity per Node (TB): This is the sum of all OSDs on a node. If you have a balanced cluster this should be equal for each node.

Total Node Count: The number of nodes in your ceph cluster.

Number of replicas: The number of replicas in your cluster (the pool's size parameter). This is not equal to the min_size, which decides wether IO should be possible or not.

Outputs

Total Node Count: The number of nodes that you have in your ceph cluster.

Capacity per Node: The raw storage capacity of a node. You can basically just sum the storage capacity of each OSD of that node.

Number of replicas: In your ceph pool, you need to define the number of replicas an object should have. This indicates on how many OSDs a block of data is saved, so it basically tells you how many replicas of a give block should exist in the cluster. The higher the number, the more resilient the data is to an outage. For example, defining 1 would result in ceph only writing the data to a single OSD. If the node where the OSD is living on fails the data is unavailable, if the OSD breaks (Hardware failure), the data is gone. Setting it to 2 enables ceph to store the data on 2 different OSDs, which are on 2 different nodes. In case of a whole node failure, the data is still available.

Raw Storage: The sum of all OSDs

Safe Storage Size: With your configuration, this would be the achievable storage size that allows for the failure of the biggest (highest raw storage capacity) node in the cluster.

Safe Storage Efficiency: The above, but in percent. Can give you an idea how much of your cluster you are giving up to resilience.

OSD Nearfull Ratio: In the case of a node failure, this is the OSD nearfull ratio that allows all your data to be replicated. Going above this value and having a full pool will break the cluster.

What would happen if you increase the total OSD size on each node?

What would happen if you add more nodes?

Understanding Ceph and Its Integration with Proxmox VE

Ceph is an open-source software-defined storage platform that provides highly scalable object, block, and file-based storage under a unified system. It is designed to handle large amounts of data by distributing it across multiple storage nodes, ensuring data redundancy and fault tolerance. Ceph's architecture allows for seamless scaling, both vertically (scale-up) by adding more storage to existing nodes, and horizontally (scale-out) by adding more nodes to the cluster.

How Ceph Works

Ceph operates by using a cluster of storage nodes, each running the Ceph OSD (Object Storage Daemon) to store data and handle data replication. The system automatically distributes data across these nodes, ensuring that even if one node fails, the data remains accessible. Ceph uses a CRUSH (Controlled Replication Under Scalable Hashing) algorithm to determine how data is stored and retrieved, optimizing for performance and resilience.

Integrating Ceph with Proxmox VE

Proxmox VE, a powerful open-source virtualization management platform, integrates seamlessly with Ceph to provide robust and scalable storage solutions. The integration is straightforward thanks to Proxmox's user-friendly web-based wizard, which guides users through the setup process. This wizard simplifies the installation and configuration of Ceph, making it accessible even to those with limited technical expertise.

Ease of Setup and Maintenance

Setting up Ceph within Proxmox VE is designed to be hassle-free. The web-based wizard handles the installation of necessary packages and configuration files, distributing them automatically across the cluster nodes. This integration allows administrators to manage storage and compute resources from a single interface, reducing complexity and maintenance overhead.

Scalability and Flexibility

One of the key advantages of using Ceph with Proxmox VE is its ability to scale effortlessly. Whether you need to add more storage capacity to existing nodes or expand your cluster with additional nodes, Ceph's architecture supports both scale-up and scale-out strategies. This flexibility ensures that your storage infrastructure can grow alongside your organization's needs, without significant reconfiguration or downtime.

Hardware Requirements for Ceph

When setting up a Ceph cluster, choosing the right hardware is crucial for optimal performance. Ceph is designed to run on commodity hardware, making it flexible and cost-effective for building large-scale data clusters. However, to ensure high performance, certain hardware specifications should be considered:

CPU: Ceph services require varying levels of CPU resources. Object Storage Daemon (OSD) services are CPU-intensive and benefit from high base frequencies and multiple cores. It's recommended to allocate at least one CPU core per Ceph service for stable performance.
RAM: A minimum of 4 GB of RAM is recommended per node, but more may be needed depending on the cluster size and workload.
Storage Drives: SSDs are preferred for their speed, especially for the journal partition, although HDDs can be used for cost-effective storage. Ensure redundancy for critical components to maintain reliability.

Network Considerations

A robust network setup is essential for a performant Ceph cluster. Here are some network considerations to keep in mind:

Bandwidth: A minimum of 10 Gbps network bandwidth is recommended to handle Ceph's high data throughput and transaction requirements. For larger deployments, a 40 Gbps network may be necessary.
Network Segmentation: Separate networks for public and cluster traffic can reduce latency and increase throughput. It's advisable to physically separate the Ceph traffic from other network traffic to avoid interference.
Fault Tolerance: Implement a fault-tolerant network to ensure continuous communication between nodes, which is critical for maintaining cluster health and performance.

Additional Considerations for a Performant Setup

To achieve a high-performing Ceph setup, consider the following best practices:

Continuous Monitoring: Regularly monitor the cluster's health using tools like Ceph's built-in monitoring solutions to identify and address performance bottlenecks.
Performance Tuning: Adjust configurations such as OSD settings and placement group counts based on monitoring data to optimize performance.
Scalability Planning: Design your cluster with scalability in mind, allowing for easy addition of nodes or storage capacity without significant downtime.

In summary, Ceph's integration with Proxmox VE offers a powerful, scalable, and easy-to-manage storage solution. Its ability to handle large data volumes with high reliability makes it an excellent choice for modern virtualization environments.