You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jan Algermissen <ja...@nordsc.com> on 2013/07/24 16:36:13 UTC

Cassandra and RAIDs

Hi,

second question:

is it recommended to set up Cassandra using 'RAID-ed' disks for per-node reliability or do people usually just rely on having the multiple nodes anyway - why bother with replicated disks?

Jan

Re: Cassandra and RAIDs

Posted by Andrew Cobley <a....@dundee.ac.uk>.
From:

http://www.datastax.com/docs/1.2/cluster_architecture/cluster_planning

  *   RAID on data disks: It is generally not necessary to use RAID for the following reasons:

  *   Data is replicated across the cluster based on the replication factor you've chosen.
  *   Starting in version 1.2, Cassandra includes takes care of disk management with the JBOD (Just a bunch of disks) support feature. Because Cassandra properly reacts to a disk failure, based on your availability/consistency requirements, either by stopping the affected node or by blacklisting the failed drive, this allows you to deploy Cassandra nodes with large disk arrays without the overhead of RAID 10.

  *   RAID on the commit log disk: Generally RAID is not needed for the commit log disk. Replication adequately prevents data loss. If you need the extra redundancy, use RAID 1.


Andy

On 24 Jul 2013, at 15:36, Jan Algermissen <ja...@nordsc.com>> wrote:

Hi,

second question:

is it recommended to set up Cassandra using 'RAID-ed' disks for per-node reliability or do people usually just rely on having the multiple nodes anyway - why bother with replicated disks?

Jan


The University of Dundee is a registered Scottish Charity, No: SC015096

Re: Cassandra and RAIDs

Posted by Richard Low <ri...@wentnet.com>.
On 24 July 2013 15:36, Jan Algermissen <ja...@nordsc.com> wrote:


> is it recommended to set up Cassandra using 'RAID-ed' disks for per-node
> reliability or do people usually just rely on having the multiple nodes
> anyway - why bother with replicated disks?
>

It's not necessary, due to replication as you say.  You can give Cassandra
your JBOD disks and it will split data between them and avoid a disk (or
fail the node, you can choose) if one fails.

There are some reasons to consider RAID though:

* It is probably quicker and places no load on the rest of the cluster to
do a RAID rebuild rather than a nodetool rebuild/repaid.  The importance of
this depends on how much data you have and the load on your cluster.  If
you don't have much data per node or if there is spare capacity then RAID
will offer no benefit here.
* Using JBOD, the largest SSTable you can have is limited to the size of
one disk.  This is unlikely to cause problems in most scenarios but an
erroneous nodetool compact could cause problems if your data size is
greater than can fit on any one disk.

Richard.