You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by Apache Wiki <wi...@apache.org> on 2009/11/19 17:31:07 UTC

[Cassandra Wiki] Update of "CassandraHardware" by JonathanEllis

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification.

The "CassandraHardware" page has been changed by JonathanEllis.
http://wiki.apache.org/cassandra/CassandraHardware

--------------------------------------------------

New page:
=== Memory ===
The most recently written data resides in memory tables (aka [[MemtableThresholds|memtables]]), but older data that has been flushed to disk can be kept in the OS's file-system cache. In other words, ''the more memory, the better'', with 1GB being the minimum recommended.

=== CPU ===
Many workloads will actually be CPU-bound in Cassandra before being memory-bound.  Cassandra is highly concurrent and will make good use of however many cores you can give it.


=== Disk ===
The short answer here is, ''at least 2 disks'', one to keep your `CommitLogDirectory` on, the other to use in `DataFileDirectories`. The exact answer though depends a lot on your usage so it's important to understand what is going on here.

Cassandra persists data to disk for two very different purposes. The first, when a new write is made so that it can be replayed after a crash or system shutdown. The second when thresholds are exceeded and memtables are flushed to disk as SSTables.

Commit logs receive every write made to a Cassandra node and have the potential to block client operations, but they are only ever read on node start-up. SSTables writes on the other hand occur asynchronously, but are read to satisfy client look-ups. SSTables are also periodically merged and rewritten in a process called ''compaction''. Another important distinction is that commit logs are purged after the corresponding data has been flushed to disk as an SSTable, so `CommitLogDirectory` only holds uncommitted data while the directories in `DataFileDirectories` store all of the data written to a node.

So to summarize, use a different device for your `CommitLogDirectory`; it needn't be large, but it should be fast enough to receive all of your writes. Then, use one or more devices for `DataFileDirectories` and make sure they are both large enough to house all of your data, and fast enough to satisfy your reads and to keep up with flushing and compaction.