You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Koppel, Jeremy" <Je...@cable.comcast.com> on 2014/06/05 23:58:51 UTC

Sizing a new Cassandra cluster

I have been able to find lots of general information about sizing each node in a new Cassandra cluster, but have not come across any specific recommendations about the total size and configuration of the cluster (the number of nodes required per data center, the number of data centers, throughput requirements between data centers, etc.).   I am currently in the process of sizing a new Cassandra cluster to support the following:

  *   Probably more write intensive than read, or at least 65% / 35%.
  *   Writes per day:  200,000,000 (~2315 per second).
  *   Data retention = 30 days.
  *   Replication Factor = 3.  (I anticipate reads and writes of CL = Quorum or Quorum Local.)
  *   My developers estimate a payload of ~300 bytes per record.
     *   Throughput per second (MiB):  (Records per second * Replication Facor * Event Payload) / 1024 / 1024 = 1.99 MiB/Sec.
     *   Storage required (TiB):  (Events per day * Event Payload * Replication Factor * Data Retention * 2) / 1024 / 1024 / 1024 / 1024 = 9.82 TiB.
        *   Size doubled to provide room for Compaction.

I’m wondering if I’m on the right track with my math, and if the following configuration would perform well, and leave a little overhead:

  *   2 Data Centers (they could co-exist with the application clusters).
  *   12 nodes (6 per data center) with:
     *   1 TiB storage capacity each.
        *   I’ve seen varying information for RAID usage / configuration.  Is a RAID 1 mirrored over 2x 1 TiB SSD drives performant?  That might be a good configuration for us, and provide some high availability so that we can lose a drive without having to repair a node.  Or is it better to buy an additional node for extra capacity, save the data to single SSDs and let it fail?  (Or stripe 2x 500 GiB SSD drives…)
        *   Do we need to store the CommitLog on a separate drive if we’re using SSD?  How much space do we leave for it?  Do we really need separate controllers?
     *   8 CPU cores.
     *   32GB RAM.

Thoughts?  Is this enough?  Overkill?

—Jeremy