You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Brayton Thompson <th...@grnoc.iu.edu> on 2010/11/16 22:56:22 UTC

Question about load balancing.

.7 beta 2 here
I've been reading about load balancing and some sites seem to imply that using the random partitioner will keeps your nodes fairly well balanced. I am
using a 3 node cluster. 1 seed and two others with AutoBootstrap on. 

Now i have read that autobootstrap can leave your nodes unbalanced, but doesn't that only effect existing data? So all new data should be distributed evenly from here on out? 

This is not what I am experiencing, so I must be wrong. Of the three nodes, one is twice as big as another and the third has less than 1% of total data. I have inserted roughly 1.5 million rows into a single CF totaling roughly 600 mb's in data. Is this too small to be a good test?

Thank you for your time.

Re: Question about load balancing.

Posted by Benjamin Black <b...@b3k.us>.

Random partitioner distributes keys approximately evenly across the
entire range of the ring (0-2**127-1).  This means that generally a
given section of the range will contain about the same number of keys.
 If you assign tokens equal-size ranges, they will have similar
numbers of keys.  This is why RP and evenly distributed tokens
(manually assigned) results in good balance of load between nodes.

Since OPP does not have this even distribution property over the
entire range (unless your keys do!), greater care must be exercised in
selecting tokens and in managing distribution of load.

http://www.slideshare.net/benjaminblack/cassandra-summit-2010-operations-troubleshooting-intro

b

On Tue, Nov 16, 2010 at 1:56 PM, Brayton Thompson <th...@grnoc.iu.edu> wrote:
> .7 beta 2 here
> I've been reading about load balancing and some sites seem to imply that using the random partitioner will keeps your nodes fairly well balanced. I am
> using a 3 node cluster. 1 seed and two others with AutoBootstrap on.
>
> Now i have read that autobootstrap can leave your nodes unbalanced, but doesn't that only effect existing data? So all new data should be distributed evenly from here on out?
>
> This is not what I am experiencing, so I must be wrong. Of the three nodes, one is twice as big as another and the third has less than 1% of total data. I have inserted roughly 1.5 million rows into a single CF totaling roughly 600 mb's in data. Is this too small to be a good test?
>
> Thank you for your time.

Re: Question about load balancing.

Posted by Aaron Morton <aa...@thelastpickle.com>.

Take a look at the sections on Load Balance and Token Selection here http://wiki.apache.org/cassandra/Operations

AFAIK the best approach is to list the initial tokens for your nodes in their cassandra.yaml. 

Nodes will choose random tokens with the Random Partitioner, which will not result in an even distribution. The best approach is to manually select them using approach linked above. 

A


On 17 Nov, 2010,at 10:56 AM, Brayton Thompson <th...@grnoc.iu.edu> wrote:

.7 beta 2 here
I've been reading about load balancing and some sites seem to imply that using the random partitioner will keeps your nodes fairly well balanced. I am
using a 3 node cluster. 1 seed and two others with AutoBootstrap on. 

Now i have read that autobootstrap can leave your nodes unbalanced, but doesn't that only effect existing data? So all new data should be distributed evenly from here on out? 

This is not what I am experiencing, so I must be wrong. Of the three nodes, one is twice as big as another and the third has less than 1% of total data. I have inserted roughly 1.5 million rows into a single CF totaling roughly 600 mb's in data. Is this too small to be a good test?

Thank you for your time.