You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by André Cruz <an...@co.sapo.pt> on 2012/03/27 19:10:04 UTC

Advice on architecture

Hello.

I'm developing a system that will require me to store large (<=4MB) columns in Cassandra. Right now I'm storing 1 column per row, in a single CF. The machines I have at my disposal are 32GB RAM machines with 10 SATA drives each. I would prefer to have a larger number of smaller nodes, but this is what I have to work with. Some issues that I have are: RAID0 Vs separate data dirs, and SizeTiered compaction Vs Leveled compaction. I will have approximately 2 times more writes than reads.

RAID0 would help me use more efficiently the total disk space available at each node, but tests have shown that under write load it behaves much worse than using separate data dirs, one per disk. I used a 3-node cluster, and the node with RAID0 kept getting behind the other two nodes which had separate data dirs. The problem with separate data dirs is that it seems to be difficult for Cassandra to use the space efficiently due to the compactions. I first tried the new Leveled compactions scheme, which seemed promising since it would create "small" files that could be scattered by the data dirs, but the IO necessary for this compaction scheme is enormous under write load. It was constantly working and it affected the write throughput because it slowed the flushing of memtables. I then tried tiered compaction and it performed better, but as it tends to create large SSTables they cannot be split across the multiple data dirs.

What I'm thinking of doing now is using multiple data dirs, with tiered compaction, and dividing the input data in several (64) different CFs. This way smaller SSTables will be created and these can be split across the multiple data dirs. This will allow me to better use the available capacity and I will not need as much free space for compactions than I would if the SSTables were larger.

Am I missing something here? Is this the best way to deal with this (abnormal) use case?

Thanks and best regards,
André Cruz

Re: Advice on architecture

Posted by Radim Kolar <hs...@filez.com>.
>
> I'm also trying to evaluate different strategies for RAID0 as drive 
> for cassandra data storage. If I need 2T space to keep node tables, 
> which drive configuration is better: 1T x 2drives or 500G x 4drives? 
more drives is always better.
> Which stripe size is optimal?
smaller stripe sizes are better for reads, larger for writes. optimal 
stripe size depends on index sampling interval and average read size.
> Should I use hardware raid or linux raid is ok?
expensive ($500) HW raid is way better (about 2.5x faster)  if you can 
get it. Linux raid is better then cheap hw raid. Also expensive RAID 
cards are way easier to manage.

Re: Advice on architecture

Posted by Igor <ig...@4friends.od.ua>.
On 03/28/2012 02:04 PM, Radim Kolar wrote:
>
>> RAID0 would help me use more efficiently the total disk space 
>> available at each node, but tests have shown that under write load it 
>> behaves much worse than using separate data dirs, one per disk.
> there are different strategies how RAID0 splits reads, also changing 
> io scheduler and filesystem helps. I found that ZFS/ZRAID is best, 
> especially backups are very good. If you dont plan to do backups ext4 
> is not bad either, but compactions are rather slow on it.

I'm also trying to evaluate different strategies for RAID0 as drive for 
cassandra data storage. If I need 2T space to keep node tables, which 
drive configuration is better: 1T x 2drives or 500G x 4drives? Which 
stripe size is optimal? Should I use hardware raid or linux raid is ok? 
I mostly concerned with read performance.



Re: Advice on architecture

Posted by Radim Kolar <hs...@filez.com>.
> RAID0 would help me use more efficiently the total disk space available at each node, but tests have shown that under write load it behaves much worse than using separate data dirs, one per disk.
there are different strategies how RAID0 splits reads, also changing io 
scheduler and filesystem helps. I found that ZFS/ZRAID is best, 
especially backups are very good. If you dont plan to do backups ext4 is 
not bad either, but compactions are rather slow on it.
>   I used a 3-node cluster, and the node with RAID0 kept getting behind the other two nodes which had separate data dirs. The problem with separate data dirs is that it seems to be difficult for Cassandra to use the space efficiently due to the compactions.
If you need to think about disk free space on nodes, then you do not 
have enough storage. TB drives are cheap today, buy some. Cluster should 
not be designed - we will be lucky if all our data fits there and we 
will not run out of space during major compactions.
>   I first tried the new Leveled compactions scheme, which seemed promising since it would create "small" files that could be scattered by the data dirs, but the IO necessary for this compaction scheme is enormous under write load.
yes. its for mostly read only apps. but raising base table size to 
something larger like 50 MB helps.
> Am I missing something here? Is this the best way to deal with this (abnormal) use case?
It takes time to learn how to tune cassandra properly. If you do not 
have time, hire somebody who will do it for you. It took me few months 
to master and its kinda difficult to explain it over mail.