You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Adi <ad...@gmail.com> on 2010/10/29 20:25:02 UTC

Any suggestions tuning Cassandra

Hello Folks,
We are evaluating cassandra for one of our storage needs. I am running a
benchmark test to gauge cassandra's performance using
http://github.com/brianfrankcooper/YCSB/wiki
Setup for Cassandra is 5 node cluster, replication factor 3. CentOS55 on
amazon ec2
Sample test data single field of size 8 KB
Running 3-4 clients with 100 Threads each. Clients are also running in the
same network(availability zone) on ec2.
I have run tests ranging from 1 million to 12 million inserts. I am getting
a throughput of around 5 MB/s on the network and on the disk.

1) Is there any tuning I can do to improve the performance. I am trying to
figure out a way to max out the network and/or disk IO but for some reason
it always stays steady.

2) Another thing I notice is that the load does not get evenly distributed.
I tried setting the tokens using the formula suggested in Token Selection
section of Operations wiki page. That actually led to a more unbalanced load
distribution (Which the doc warned can happen if the key distribution is not
even).

Any suggestions/pointers are welcome. Thanks.

-Adi