You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ta...@accenture.com on 2011/07/19 02:23:37 UTC

Brisk Unbalanced Ring

We're running brisk v1 beta2 on 12 nodes - 8 cassandra in DC1 and 4 brisk in DC 2 in EC2. Wrote a few TBs of data to the cluster, and unfortunately the load is very unbalanced. Every key is the same size and we are using RandomPartitioner.

There are two replicas of data in DC1 and one replica in DC2. The load amount in DC2 makes sense (about 250GB per node). DC1 should also have about 250GB per node (since there is twice the data and twice the number of nodes), but as can be seen below two nodes have an inordinate amount of data and the other 6 have only about 128GB:

Address         DC          Rack        Status State   Load            Owns    Token
                                                                               148873535527910577765226390751398592512
10.2.206.127    DC1         RAC1        Up     Normal  901.6 GB        12.50%  0
10.116.230.151  DC2         RAC1        Up     Normal  258.23 GB       6.25%   10633823966279326983230456482242756608
10.110.6.237    DC1         RAC1        Up     Normal  129.08 GB       6.25%   21267647932558653966460912964485513216
10.2.38.43      DC1         RAC1        Up     Normal  128.51 GB       12.50%  42535295865117307932921825928971026432
10.114.39.110   DC2         RAC1        Up     Normal  257.32 GB       6.25%   53169119831396634916152282411213783040
10.210.27.208   DC1         RAC1        Up     Normal  128.67 GB       6.25%   63802943797675961899382738893456539648
10.207.39.230   DC1         RAC2        Up     Normal  643.14 GB       12.50%  85070591730234615865843651857942052864
10.85.157.77    DC2         RAC1        Up     Normal  256.78 GB       6.25%   95704415696513942849074108340184809472
10.2.209.240    DC1         RAC2        Up     Normal  128.96 GB       6.25%   106338239662793269832304564822427566080
10.96.74.213    DC1         RAC2        Up     Normal  128.3 GB        12.50%  127605887595351923798765477786913079296
10.194.205.155  DC2         RAC1        Up     Normal  257.15 GB       6.25%   138239711561631250781995934269155835904
10.201.194.16   DC1         RAC2        Up     Normal  129.46 GB       6.25%   148873535527910577765226390751398592512

I should also node that the first node used to have 640GB of load until the instance went down and we needed to run repair on a new instance in its place.

Any ideas why this may have happened?

Thanks,
Tamara

________________________________
This message is for the designated recipient only and may contain privileged, proprietary, or otherwise private information. If you have received it in error, please notify the sender immediately and delete the original. Any other use of the email by you is prohibited.

Re: Brisk Unbalanced Ring

Posted by Sameer Farooqui <ca...@gmail.com>.
FYI - This manual reordering of the DCs and RACs might make it easier to see
how the tokens are arranged. Pretty sure that the token ranges are picked
correctly. Ignore the Owns column, b/c it is not multi-datacenter aware (so
it thinks all of the nodes are in one ring as opposed to two (DC1 & DC2).

Here is what the nodetool ring output looked like before we replaced the 1st
node (643 GB) with new hardware. After running repair on it, for some
reason, to our dismay, it re-spawned as a 900+ GB node.

Address         DC          Rack        Status State   Load
Owns    Token

148873535527910577765226390751398592512
10.192.143.x       DC1         RAC1        Up     Normal  643.42 GB
12.50%  0
10.192.171.x    DC1         RAC1        Up     Normal  128.96 GB
6.25%   21267647932558653966460912964485513216
10.210.95.x       DC1         RAC1        Up     Normal  128.34 GB
12.50%  42535295865117307932921825928971026432
10.211.19.x        DC1         RAC1        Up     Normal  128.55 GB
6.25%   63802943797675961899382738893456539648
10.68.58.x         DC1         RAC2        Up     Normal  643.05 GB
12.50%  85070591730234615865843651857942052864
10.110.31.x        DC1         RAC2        Up     Normal  128.84 GB
6.25%   106338239662793269832304564822427566080
10.96.58.x        DC1         RAC2        Up     Normal  128.11 GB
12.50%  127605887595351923798765477786913079296
10.210.195.x       DC1         RAC2        Up     Normal  129.33 GB
6.25%   148873535527910577765226390751398592512
10.114.138.x      DC2         RAC1        Up     Normal  258.04 GB
6.25%   10633823966279326983230456482242756608
10.203.79.x       DC2         RAC1        Up     Normal  257.14 GB
6.25%   53169119831396634916152282411213783040
10.242.209.x      DC2         RAC1        Up     Normal  256.58 GB
6.25%   95704415696513942849074108340184809472
10.38.25.x        DC2         RAC1        Up     Normal  257.08 GB
6.25%   138239711561631250781995934269155835904


On Mon, Jul 18, 2011 at 5:23 PM, <ta...@accenture.com> wrote:

>  We’re running brisk v1 beta2 on 12 nodes – 8 cassandra in DC1 and 4 brisk
> in DC 2 in EC2. Wrote a few TBs of data to the cluster, and unfortunately
> the load is very unbalanced. Every key is the same size and we are using
> RandomPartitioner.****
>
> ** **
>
> There are two replicas of data in DC1 and one replica in DC2. The load
> amount in DC2 makes sense (about 250GB per node). DC1 should also have about
> 250GB per node (since there is twice the data and twice the number of
> nodes), but as can be seen below two nodes have an inordinate amount of data
> and the other 6 have only about 128GB:****
>
> ** **
>
> Address         DC          Rack        Status State   Load
> Owns    Token                                       ****
>
>                                                                                148873535527910577765226390751398592512
> ****
>
> 10.2.206.127    DC1         RAC1        Up     Normal  901.6 GB
> 12.50%  0                                           ****
>
> 10.116.230.151  DC2         RAC1        Up     Normal  258.23 GB
> 6.25%   10633823966279326983230456482242756608      ****
>
> 10.110.6.237    DC1         RAC1        Up     Normal  129.08 GB
> 6.25%   21267647932558653966460912964485513216      ****
>
> 10.2.38.43      DC1         RAC1        Up     Normal  128.51 GB
> 12.50%  42535295865117307932921825928971026432      ****
>
> 10.114.39.110   DC2         RAC1        Up     Normal  257.32 GB
> 6.25%   53169119831396634916152282411213783040      ****
>
> 10.210.27.208   DC1         RAC1        Up     Normal  128.67 GB
> 6.25%   63802943797675961899382738893456539648      ****
>
> 10.207.39.230   DC1         RAC2        Up     Normal  643.14 GB
> 12.50%  85070591730234615865843651857942052864      ****
>
> 10.85.157.77    DC2         RAC1        Up     Normal  256.78 GB
> 6.25%   95704415696513942849074108340184809472      ****
>
> 10.2.209.240    DC1         RAC2        Up     Normal  128.96 GB
> 6.25%   106338239662793269832304564822427566080     ****
>
> 10.96.74.213    DC1         RAC2        Up     Normal  128.3 GB
> 12.50%  127605887595351923798765477786913079296     ****
>
> 10.194.205.155  DC2         RAC1        Up     Normal  257.15 GB
> 6.25%   138239711561631250781995934269155835904     ****
>
> 10.201.194.16   DC1         RAC2        Up     Normal  129.46 GB
> 6.25%   148873535527910577765226390751398592512  ****
>
> ** **
>
> I should also node that the first node used to have 640GB of load until the
> instance went down and we needed to run repair on a new instance in its
> place.****
>
> ** **
>
> Any ideas why this may have happened?****
>
> ** **
>
> Thanks,****
>
> Tamara****
>
> ------------------------------
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information. If you have
> received it in error, please notify the sender immediately and delete the
> original. Any other use of the email by you is prohibited.
>