You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by St...@wellsfargo.com on 2013/02/05 20:40:42 UTC

unbalanced ring

So I have three nodes in a ring in one data center.  My configuration has num_tokens: 256 set and initial_token commented out.  When I look at the ring, it shows me all of the token ranges of course, and basically identical data for each range on each node.  Here is the Cliff's Notes version of what I see:

[root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

Datacenter: 28
==========
Replicas: 1

Address         Rack        Status State   Load            Owns                Token
                                                                               9187343239835811839
10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026347817059713363
10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026276684526453414
10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026205551993193465
  (etc)
10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9187343239835811840
10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9151314442816847872
10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9115285645797883904
  (etc)
10.28.205.127   205         Up     Normal  69.13 KB        66.30%              -9223372036854775808
10.28.205.127   205         Up     Normal  69.13 KB        66.30%              36028797018963967
10.28.205.127   205         Up     Normal  69.13 KB        66.30%              72057594037927935
  (etc)

So at this point I have a number of questions.   The biggest question is of Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 0.000069 GB?  These boxes are all comparable and all configured identically.

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

I'm sorry to ask so many questions - I'm having a hard time finding documentation that explains this stuff.

Stephen

Re: unbalanced ring

Posted by Edward Capriolo <ed...@gmail.com>.

I take that back. vnodes are useful for any size cluster, but I do not see
them as a day one requirement. It seems like many people are stumbling over
this.

On Tuesday, February 12, 2013, Edward Capriolo <ed...@gmail.com>
wrote:
>
> Are vnodes on by default. It seems that many on list are using this
feature with small clusters.
>
> I know these days anything named virtual is sexy, but they are not useful
for small clusters are they. I do not see why people are using them.
>
> On Monday, February 11, 2013, aaron morton <aa...@thelastpickle.com>
wrote:
>>  So when you say to do this with a “clean” setup, what are you asking me
to do?
>>
>> Yup
>> clear /var/lib/casssandra/data /commitlog /saved_caches
>> start the cluster
>> use nodetool ring
>> You may also want to play with https://github.com/pcmanus/ccm to create
a local 3 node cluster to see the difference. Note that the updateconfig
setting cannot remove a config setting, so you will need edit the yaml for
the nodes.
>> Cheers
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> New Zealand
>> @aaronmorton
>> http://www.thelastpickle.com
>> On 12/02/2013, at 7:57 AM, Stephen.M.Thompson@wellsfargo.com wrote:
>>
>> Aaron, thanks for your feedback.
>>
>> .125
>> num_tokens: 256
>> # initial_token:
>>
>> .126
>> num_tokens: 256
>> #initial_token:
>>
>> .127
>> num_tokens: 256
>> # initial_token:
>>
>> This all looks correct.  So when you say to do this with a “clean”
setup, what are you asking me to do?  Is it enough to blow away
/var/lib/cassandra and reload the data?  Also destroy my Cassandra install
(which is just un-tar) and reinstall from nothing?
>>
>> Stephen Thompson
>> Wells Fargo Corporation
>> Internet Authentication & Fraud Prevention
>> 704.427.3137 (W) | 704.807.3431 (C)
>>
>> This message may contain confidential and/or privileged information, and
is intended for the use of the addressee only. If you are not the addressee
or authorized to receive thi

Re: unbalanced ring

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Maybe people think that 1.2 = Vnodes, when Vnodes are actually not
mandatory and furthermore it is advised to upgrade and then, after a while,
when all is running smooth, eventually switch to vnodes...

2013/2/13 Brandon Williams <dr...@gmail.com>

> On Tue, Feb 12, 2013 at 6:13 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
> >
> > Are vnodes on by default. It seems that many on list are using this
> feature
> > with small clusters.
>
> They are not.
>
> -Brandon
>

Re: unbalanced ring

Posted by Brandon Williams <dr...@gmail.com>.

On Tue, Feb 12, 2013 at 6:13 PM, Edward Capriolo <ed...@gmail.com> wrote:
>
> Are vnodes on by default. It seems that many on list are using this feature
> with small clusters.

They are not.

-Brandon

Re: unbalanced ring

Posted by Edward Capriolo <ed...@gmail.com>.

Are vnodes on by default. It seems that many on list are using this feature
with small clusters.

I know these days anything named virtual is sexy, but they are not useful
for small clusters are they. I do not see why people are using them.

On Monday, February 11, 2013, aaron morton <aa...@thelastpickle.com> wrote:
>  So when you say to do this with a “clean” setup, what are you asking me
to do?
>
> Yup
> clear /var/lib/casssandra/data /commitlog /saved_caches
> start the cluster
> use nodetool ring
> You may also want to play with https://github.com/pcmanus/ccm to create a
local 3 node cluster to see the difference. Note that the updateconfig
setting cannot remove a config setting, so you will need edit the yaml for
the nodes.
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> @aaronmorton
> http://www.thelastpickle.com
> On 12/02/2013, at 7:57 AM, Stephen.M.Thompson@wellsfargo.com wrote:
>
> Aaron, thanks for your feedback.
>
> .125
> num_tokens: 256
> # initial_token:
>
> .126
> num_tokens: 256
> #initial_token:
>
> .127
> num_tokens: 256
> # initial_token:
>
> This all looks correct.  So when you say to do this with a “clean” setup,
what are you asking me to do?  Is it enough to blow away /var/lib/cassandra
and reload the data?  Also destroy my Cassandra install (which is just
un-tar) and reinstall from nothing?
>
> Stephen Thompson
> Wells Fargo Corporation
> Internet Authentication & Fraud Prevention
> 704.427.3137 (W) | 704.807.3431 (C)
>
> This message may contain confidential and/or privileged information, and
is intended for the use of the addressee only. If you are not the addressee
or authorized to receive thi

Re: unbalanced ring

Posted by aaron morton <aa...@thelastpickle.com>.

>  So when you say to do this with a “clean” setup, what are you asking me to do?
Yup
clear /var/lib/casssandra/data /commitlog /saved_caches
start the cluster
use nodetool ring 

You may also want to play with https://github.com/pcmanus/ccm to create a local 3 node cluster to see the difference. Note that the updateconfig setting cannot remove a config setting, so you will need edit the yaml for the nodes.

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/02/2013, at 7:57 AM, Stephen.M.Thompson@wellsfargo.com wrote:

> Aaron, thanks for your feedback.
>  
> .125
> num_tokens: 256
> # initial_token:
>  
> .126
> num_tokens: 256
> #initial_token:
>  
> .127
> num_tokens: 256
> # initial_token:
>  
> This all looks correct.  So when you say to do this with a “clean” setup, what are you asking me to do?  Is it enough to blow away /var/lib/cassandra and reload the data?  Also destroy my Cassandra install (which is just un-tar) and reinstall from nothing? 
>  
> Stephen Thompson
> Wells Fargo Corporation
> Internet Authentication & Fraud Prevention
> 704.427.3137 (W) | 704.807.3431 (C)
>  
> This message may contain confidential and/or privileged information, and is intended for the use of the addressee only. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.
>  
> From: aaron morton [mailto:aaron@thelastpickle.com] 
> Sent: Monday, February 11, 2013 12:51 PM
> To: user@cassandra.apache.org
> Subject: Re: unbalanced ring
>  
> The tokens are not right, not right at all. Some are too short and some are too tall. 
>  
> More technically they do not appear to be randomly arranged. The tokens for the .125 node all start with -3, the 126 node only has negative tokens and the 127 node mostly has positive tokens. 
>  
> Check that on each node the initial_token yaml setting is commented out, and that num_tokens is set to 256. 
>  
> If you can reproduce this fault with a clean setup please raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA
>  
> Cheers
>  
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>  
> @aaronmorton
> http://www.thelastpickle.com
>  
> On 8/02/2013, at 10:36 AM, Stephen.M.Thompson@wellsfargo.com wrote:
> 
> 
> I found when I tried to do queries after sending this that although it shows a ton of data, it would no longer return ANYTHING for any query ... always 0 rows.  So something was severely hosed.  I blew away the data and reloaded from database ... the data set is a little smaller than before.  It shows up somewhat more balanced, although I'm still curious why the third node is so much smaller than the first two.
>  
> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
> Datacenter: 28
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.28.205.125     994.89 MB  255     33.7%             3daab184-61f0-49a0-b076-863f10bc8c6c  205
> UN  10.28.205.126     966.17 MB  256     99.9%             55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
> UN  10.28.205.127     699.79 MB  257     66.4%             d240c91f-4901-40ad-bd66-d374a0ccf0b9  205
> [root@Config3482VM1 apache-cassandra-1.2.1]#
>  
> And yes, that is the entire content of the output from the status call, unedited.   I have attached the output from nodetool ring.  To answer a couple of the questions from below from Eric:
>  
> * One data center (28)?  One rack (205)? Three nodes?
>                 Yes, that’s right.  We’re just doing a proof of concept at the moment so this is three VMWare servers.
>  
> * How many keyspaces, and what are the replication strategies?
>                 There is one keyspace, and it has only one CF at this point.
>  
> [default@KEYSPACE_NAME] describe;
> Keyspace: KEYSPACE_NAME:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [28:2]
>  
> * TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.
>  
> I’m not sure what you mean by this.
>  
> Steve
>  
> -----Original Message-----
> From: Eric Evans [mailto:eevans@acunu.com] 
> Sent: Thursday, February 07, 2013 9:56 AM
> To: user@cassandra.apache.org
> Subject: Re: unbalanced ring
>  
> On Wed, Feb 6, 2013 at 2:02 PM,  <St...@wellsfargo.com> wrote:
> > Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and
> > compact on each of the nodes.
> > 
> > 
> > 
> > [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
> > 
> > Datacenter: 28
> > 
> > ==============
> > 
> > Status=Up/Down
> > 
> > |/ State=Normal/Leaving/Joining/Moving
> > 
> > --  Address           Load       Tokens  Owns (effective)  Host ID
> > Rack
> > 
> > UN  10.28.205.125     1.7 GB     255     33.7%
> > 3daab184-61f0-49a0-b076-863f10bc8c6c  205
> > 
> > UN  10.28.205.126     591.44 MB  256     99.9%
> > 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
> > 
> > UN  10.28.205.127     112.28 MB  257     66.4%
> > d240c91f-4901-40ad-bd66-d374a0ccf0b9  205
>  
> Sorry, I have to ask, Is this the complete output?  Have you perhaps sanitized it in some way?
>  
> It seems like there is some piece of missing context here.  Can you tell us:
>  
> * Is this a cluster that was upgraded to virtual nodes (that would include a 1.2.x cluster initialized with one token per node, and num_tokens set after the fact).  If so, what did the initial token map look like?
> * Was initial_token used at any point along the way (either to supply a single token, or csv list of them), on any or all of the nodes in this cluster, at any time?
> * One data center (28)?  One rack (205)? Three nodes?
> * How many keyspaces, and what are the replication strategies?
> * What does the full output of `nodetool ring' look like now?  Can you attach it?
>  
> > So this is a little better.  At last node 3 has some content, but they
> > are still far from balanced.  If I am understand this correctly, this
> > is the distribution I would expect if the tokens were set at 15/5/1
> > rather than equal.  As configured, I would expect roughly equal
> > amounts of data on each node. Is that right?  Do you have any
> > suggestions for what I can look at to get there?
>  
> Shuffle should only be required if you started out with 1-token-per-node.  In that case, your existing ranges are evenly divided num_tokens ways, and so should be exceptionally consistent with one another (assuming of course that the existing ranges were evenly sized).  The shuffle op merely "shuffles" the ranges you have to (random )other nodes in the cluster.
>  
> If this cluster were started from scratch with num_tokens = 256, then a total of 768 tokens would have been randomly generated from within the murmur3 hash-space.  Random assignment isn't perfect, but with 768 tokens (256 per), it should work out to be reasonably close on average.
>  
> TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.
>  
> > I have about 11M rows of data in this keyspace and none of them are
> > exceptionally long … it’s data pulled from Oracle and didn’t include
> > any BLOB, etc.
>  
> [ ... ]
>  
> > From: aaron morton [mailto:aaron@thelastpickle.com]
> > Sent: Tuesday, February 05, 2013 3:41 PM
> > To: user@cassandra.apache.org
> > Subject: Re: unbalanced ring
> > 
> > 
> > 
> > Use nodetool status with vnodes
> > http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnod
> > es
> > 
> > 
> > 
> > The different load can be caused by rack affinity, are all the nodes
> > in the same rack ? Another simple check is have you created some very big rows?
>  
> > On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com wrote:
> > 
> > 
> > 
> > So I have three nodes in a ring in one data center.  My configuration
> > has
> > num_tokens: 256 set andinitial_token commented out.  When I look at
> > the ring, it shows me all of the token ranges of course, and basically
> > identical data for each range on each node.  Here is the Cliff’s Notes
> > version of what I see:
> > 
> > 
> > 
> > [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring
> > 
> > 
> > 
> > Datacenter: 28
> > 
> > ==========
> > 
> > Replicas: 1
> > 
> > 
> > 
> > Address         Rack        Status State   Load            Owns
> > Token
> > 
> > 
> > 9187343239835811839
> > 
> > 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> > -3026347817059713363
> > 
> > 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> > -3026276684526453414
> > 
> > 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> > -3026205551993193465
> > 
> >   (etc)
> > 
> > 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> > -9187343239835811840
> > 
> > 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> > -9151314442816847872
> > 
> > 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> > -9115285645797883904
> > 
> >   (etc)
> > 
> > 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> > -9223372036854775808
> > 
> > 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> > 36028797018963967
> > 
> > 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> > 72057594037927935
> > 
> >   (etc)
> > 
> > 
> > 
> > So at this point I have a number of questions.   The biggest question is of
> > Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127
> > has only 0.000069 GB?  These boxes are all comparable and all
> > configured identically.
> > 
> > 
> > 
> > partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> > 
> > 
> > 
> > I’m sorry to ask so many questions – I’m having a hard time finding
> > documentation that explains this stuff.
>  
>  
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
> <ring.txt>

RE: unbalanced ring

Posted by St...@wellsfargo.com.

Aaron, thanks for your feedback.

.125
num_tokens: 256
# initial_token:

.126
num_tokens: 256
#initial_token:

.127
num_tokens: 256
# initial_token:

This all looks correct.  So when you say to do this with a "clean" setup, what are you asking me to do?  Is it enough to blow away /var/lib/cassandra and reload the data?  Also destroy my Cassandra install (which is just un-tar) and reinstall from nothing?

Stephen Thompson
Wells Fargo Corporation
Internet Authentication & Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

This message may contain confidential and/or privileged information, and is intended for the use of the addressee only. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Monday, February 11, 2013 12:51 PM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

The tokens are not right, not right at all. Some are too short and some are too tall.

More technically they do not appear to be randomly arranged. The tokens for the .125 node all start with -3, the 126 node only has negative tokens and the 127 node mostly has positive tokens.

Check that on each node the initial_token yaml setting is commented out, and that num_tokens is set to 256.

If you can reproduce this fault with a clean setup please raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/02/2013, at 10:36 AM, Stephen.M.Thompson@wellsfargo.com<ma...@wellsfargo.com> wrote:

I found when I tried to do queries after sending this that although it shows a ton of data, it would no longer return ANYTHING for any query ... always 0 rows.  So something was severely hosed.  I blew away the data and reloaded from database ... the data set is a little smaller than before.  It shows up somewhat more balanced, although I'm still curious why the third node is so much smaller than the first two.

[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
Datacenter: 28
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.28.205.125     994.89 MB  255     33.7%             3daab184-61f0-49a0-b076-863f10bc8c6c  205
UN  10.28.205.126     966.17 MB  256     99.9%             55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
UN  10.28.205.127     699.79 MB  257     66.4%             d240c91f-4901-40ad-bd66-d374a0ccf0b9  205
[root@Config3482VM1 apache-cassandra-1.2.1]#

And yes, that is the entire content of the output from the status call, unedited.   I have attached the output from nodetool ring.  To answer a couple of the questions from below from Eric:

* One data center (28)?  One rack (205)? Three nodes?
                Yes, that's right.  We're just doing a proof of concept at the moment so this is three VMWare servers.

* How many keyspaces, and what are the replication strategies?
                There is one keyspace, and it has only one CF at this point.

[default@KEYSPACE_NAME] describe;
Keyspace: KEYSPACE_NAME:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [28:2]

* TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.

I'm not sure what you mean by this.

Steve

-----Original Message-----
From: Eric Evans [mailto:eevans@acunu.com<http://acunu.com>]
Sent: Thursday, February 07, 2013 9:56 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: unbalanced ring

On Wed, Feb 6, 2013 at 2:02 PM,  <St...@wellsfargo.com>> wrote:
> Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and
> compact on each of the nodes.
>
>
>
> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
>
> Datacenter: 28
>
> ==============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address           Load       Tokens  Owns (effective)  Host ID
> Rack
>
> UN  10.28.205.125     1.7 GB     255     33.7%
> 3daab184-61f0-49a0-b076-863f10bc8c6c  205
>
> UN  10.28.205.126     591.44 MB  256     99.9%
> 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
>
> UN  10.28.205.127     112.28 MB  257     66.4%
> d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

Sorry, I have to ask, Is this the complete output?  Have you perhaps sanitized it in some way?

It seems like there is some piece of missing context here.  Can you tell us:

* Is this a cluster that was upgraded to virtual nodes (that would include a 1.2.x cluster initialized with one token per node, and num_tokens set after the fact).  If so, what did the initial token map look like?
* Was initial_token used at any point along the way (either to supply a single token, or csv list of them), on any or all of the nodes in this cluster, at any time?
* One data center (28)?  One rack (205)? Three nodes?
* How many keyspaces, and what are the replication strategies?
* What does the full output of `nodetool ring' look like now?  Can you attach it?

> So this is a little better.  At last node 3 has some content, but they
> are still far from balanced.  If I am understand this correctly, this
> is the distribution I would expect if the tokens were set at 15/5/1
> rather than equal.  As configured, I would expect roughly equal
> amounts of data on each node. Is that right?  Do you have any
> suggestions for what I can look at to get there?

Shuffle should only be required if you started out with 1-token-per-node.  In that case, your existing ranges are evenly divided num_tokens ways, and so should be exceptionally consistent with one another (assuming of course that the existing ranges were evenly sized).  The shuffle op merely "shuffles" the ranges you have to (random )other nodes in the cluster.

If this cluster were started from scratch with num_tokens = 256, then a total of 768 tokens would have been randomly generated from within the murmur3 hash-space.  Random assignment isn't perfect, but with 768 tokens (256 per), it should work out to be reasonably close on average.

TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.

> I have about 11M rows of data in this keyspace and none of them are
> exceptionally long ... it's data pulled from Oracle and didn't include
> any BLOB, etc.

[ ... ]

> From: aaron morton [mailto:aaron@thelastpickle.com]
> Sent: Tuesday, February 05, 2013 3:41 PM
> To: user@cassandra.apache.org<ma...@cassandra.apache.org>
> Subject: Re: unbalanced ring
>
>
>
> Use nodetool status with vnodes
> http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnod
> es
>
>
>
> The different load can be caused by rack affinity, are all the nodes
> in the same rack ? Another simple check is have you created some very big rows?

> On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com<ma...@wellsfargo.com> wrote:
>
>
>
> So I have three nodes in a ring in one data center.  My configuration
> has
> num_tokens: 256 set andinitial_token commented out.  When I look at
> the ring, it shows me all of the token ranges of course, and basically
> identical data for each range on each node.  Here is the Cliff's Notes
> version of what I see:
>
>
>
> [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring
>
>
>
> Datacenter: 28
>
> ==========
>
> Replicas: 1
>
>
>
> Address         Rack        Status State   Load            Owns
> Token
>
>
> 9187343239835811839
>
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> -3026347817059713363
>
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> -3026276684526453414
>
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> -3026205551993193465
>
>   (etc)
>
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> -9187343239835811840
>
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> -9151314442816847872
>
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> -9115285645797883904
>
>   (etc)
>
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> -9223372036854775808
>
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> 36028797018963967
>
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> 72057594037927935
>
>   (etc)
>
>
>
> So at this point I have a number of questions.   The biggest question is of
> Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127
> has only 0.000069 GB?  These boxes are all comparable and all
> configured identically.
>
>
>
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>
>
> I'm sorry to ask so many questions - I'm having a hard time finding
> documentation that explains this stuff.

--
Eric Evans
Acunu | http://www.acunu.com | @acunu
<ring.txt>

Re: unbalanced ring

Posted by aaron morton <aa...@thelastpickle.com>.

The tokens are not right, not right at all. Some are too short and some are too tall. 

More technically they do not appear to be randomly arranged. The tokens for the .125 node all start with -3, the 126 node only has negative tokens and the 127 node mostly has positive tokens. 

Check that on each node the initial_token yaml setting is commented out, and that num_tokens is set to 256. 

If you can reproduce this fault with a clean setup please raise a ticket at https://issues.apache.org/jira/browse/CASSANDRA

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 8/02/2013, at 10:36 AM, Stephen.M.Thompson@wellsfargo.com wrote:

> I found when I tried to do queries after sending this that although it shows a ton of data, it would no longer return ANYTHING for any query ... always 0 rows.  So something was severely hosed.  I blew away the data and reloaded from database ... the data set is a little smaller than before.  It shows up somewhat more balanced, although I'm still curious why the third node is so much smaller than the first two.
>  
> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
> Datacenter: 28
> ==============
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.28.205.125     994.89 MB  255     33.7%             3daab184-61f0-49a0-b076-863f10bc8c6c  205
> UN  10.28.205.126     966.17 MB  256     99.9%             55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
> UN  10.28.205.127     699.79 MB  257     66.4%             d240c91f-4901-40ad-bd66-d374a0ccf0b9  205
> [root@Config3482VM1 apache-cassandra-1.2.1]#
>  
> And yes, that is the entire content of the output from the status call, unedited.   I have attached the output from nodetool ring.  To answer a couple of the questions from below from Eric:
>  
> * One data center (28)?  One rack (205)? Three nodes?
>                 Yes, that’s right.  We’re just doing a proof of concept at the moment so this is three VMWare servers.
>  
> * How many keyspaces, and what are the replication strategies?
>                 There is one keyspace, and it has only one CF at this point.
>  
> [default@KEYSPACE_NAME] describe;
> Keyspace: KEYSPACE_NAME:
>   Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [28:2]
>  
> * TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.
>  
> I’m not sure what you mean by this.
>  
> Steve
>  
> -----Original Message-----
> From: Eric Evans [mailto:eevans@acunu.com] 
> Sent: Thursday, February 07, 2013 9:56 AM
> To: user@cassandra.apache.org
> Subject: Re: unbalanced ring
>  
> On Wed, Feb 6, 2013 at 2:02 PM,  <St...@wellsfargo.com> wrote:
> > Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and
> > compact on each of the nodes.
> > 
> > 
> > 
> > [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
> > 
> > Datacenter: 28
> > 
> > ==============
> > 
> > Status=Up/Down
> > 
> > |/ State=Normal/Leaving/Joining/Moving
> > 
> > --  Address           Load       Tokens  Owns (effective)  Host ID
> > Rack
> > 
> > UN  10.28.205.125     1.7 GB     255     33.7%
> > 3daab184-61f0-49a0-b076-863f10bc8c6c  205
> > 
> > UN  10.28.205.126     591.44 MB  256     99.9%
> > 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
> > 
> > UN  10.28.205.127     112.28 MB  257     66.4%
> > d240c91f-4901-40ad-bd66-d374a0ccf0b9  205
>  
> Sorry, I have to ask, Is this the complete output?  Have you perhaps sanitized it in some way?
>  
> It seems like there is some piece of missing context here.  Can you tell us:
>  
> * Is this a cluster that was upgraded to virtual nodes (that would include a 1.2.x cluster initialized with one token per node, and num_tokens set after the fact).  If so, what did the initial token map look like?
> * Was initial_token used at any point along the way (either to supply a single token, or csv list of them), on any or all of the nodes in this cluster, at any time?
> * One data center (28)?  One rack (205)? Three nodes?
> * How many keyspaces, and what are the replication strategies?
> * What does the full output of `nodetool ring' look like now?  Can you attach it?
>  
> > So this is a little better.  At last node 3 has some content, but they
> > are still far from balanced.  If I am understand this correctly, this
> > is the distribution I would expect if the tokens were set at 15/5/1
> > rather than equal.  As configured, I would expect roughly equal
> > amounts of data on each node. Is that right?  Do you have any
> > suggestions for what I can look at to get there?
>  
> Shuffle should only be required if you started out with 1-token-per-node.  In that case, your existing ranges are evenly divided num_tokens ways, and so should be exceptionally consistent with one another (assuming of course that the existing ranges were evenly sized).  The shuffle op merely "shuffles" the ranges you have to (random )other nodes in the cluster.
>  
> If this cluster were started from scratch with num_tokens = 256, then a total of 768 tokens would have been randomly generated from within the murmur3 hash-space.  Random assignment isn't perfect, but with 768 tokens (256 per), it should work out to be reasonably close on average.
>  
> TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.
>  
> > I have about 11M rows of data in this keyspace and none of them are
> > exceptionally long … it’s data pulled from Oracle and didn’t include
> > any BLOB, etc.
>  
> [ ... ]
>  
> > From: aaron morton [mailto:aaron@thelastpickle.com]
> > Sent: Tuesday, February 05, 2013 3:41 PM
> > To: user@cassandra.apache.org
> > Subject: Re: unbalanced ring
> > 
> > 
> > 
> > Use nodetool status with vnodes
> > http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnod
> > es
> > 
> > 
> > 
> > The different load can be caused by rack affinity, are all the nodes
> > in the same rack ? Another simple check is have you created some very big rows?
>  
> > On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com wrote:
> > 
> > 
> > 
> > So I have three nodes in a ring in one data center.  My configuration
> > has
> > num_tokens: 256 set andinitial_token commented out.  When I look at
> > the ring, it shows me all of the token ranges of course, and basically
> > identical data for each range on each node.  Here is the Cliff’s Notes
> > version of what I see:
> > 
> > 
> > 
> > [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring
> > 
> > 
> > 
> > Datacenter: 28
> > 
> > ==========
> > 
> > Replicas: 1
> > 
> > 
> > 
> > Address         Rack        Status State   Load            Owns
> > Token
> > 
> > 
> > 9187343239835811839
> > 
> > 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> > -3026347817059713363
> > 
> > 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> > -3026276684526453414
> > 
> > 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> > -3026205551993193465
> > 
> >   (etc)
> > 
> > 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> > -9187343239835811840
> > 
> > 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> > -9151314442816847872
> > 
> > 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> > -9115285645797883904
> > 
> >   (etc)
> > 
> > 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> > -9223372036854775808
> > 
> > 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> > 36028797018963967
> > 
> > 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> > 72057594037927935
> > 
> >   (etc)
> > 
> > 
> > 
> > So at this point I have a number of questions.   The biggest question is of
> > Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127
> > has only 0.000069 GB?  These boxes are all comparable and all
> > configured identically.
> > 
> > 
> > 
> > partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> > 
> > 
> > 
> > I’m sorry to ask so many questions – I’m having a hard time finding
> > documentation that explains this stuff.
>  
>  
> --
> Eric Evans
> Acunu | http://www.acunu.com | @acunu
> <ring.txt>

RE: unbalanced ring

Posted by St...@wellsfargo.com.

I found when I tried to do queries after sending this that although it shows a ton of data, it would no longer return ANYTHING for any query ... always 0 rows.  So something was severely hosed.  I blew away the data and reloaded from database ... the data set is a little smaller than before.  It shows up somewhat more balanced, although I'm still curious why the third node is so much smaller than the first two.

[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status

Datacenter: 28

==============

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack

UN  10.28.205.125     994.89 MB  255     33.7%             3daab184-61f0-49a0-b076-863f10bc8c6c  205

UN  10.28.205.126     966.17 MB  256     99.9%             55bbd4b1-8036-4e32-b975-c073a7f0f47f  205

UN  10.28.205.127     699.79 MB  257     66.4%             d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

[root@Config3482VM1 apache-cassandra-1.2.1]#

And yes, that is the entire content of the output from the status call, unedited.   I have attached the output from nodetool ring.  To answer a couple of the questions from below from Eric:

* One data center (28)?  One rack (205)? Three nodes?

                Yes, that's right.  We're just doing a proof of concept at the moment so this is three VMWare servers.

* How many keyspaces, and what are the replication strategies?

                There is one keyspace, and it has only one CF at this point.

[default@KEYSPACE_NAME] describe;

Keyspace: KEYSPACE_NAME:

  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy

  Durable Writes: true

    Options: [28:2]

* TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.

I'm not sure what you mean by this.

Steve

-----Original Message-----
From: Eric Evans [mailto:eevans@acunu.com]
Sent: Thursday, February 07, 2013 9:56 AM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

On Wed, Feb 6, 2013 at 2:02 PM,  <St...@wellsfargo.com>> wrote:

> Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and

> compact on each of the nodes.

>

>

>

> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status

>

> Datacenter: 28

>

> ==============

>

> Status=Up/Down

>

> |/ State=Normal/Leaving/Joining/Moving

>

> --  Address           Load       Tokens  Owns (effective)  Host ID

> Rack

>

> UN  10.28.205.125     1.7 GB     255     33.7%

> 3daab184-61f0-49a0-b076-863f10bc8c6c  205

>

> UN  10.28.205.126     591.44 MB  256     99.9%

> 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205

>

> UN  10.28.205.127     112.28 MB  257     66.4%

> d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

Sorry, I have to ask, Is this the complete output?  Have you perhaps sanitized it in some way?

It seems like there is some piece of missing context here.  Can you tell us:

* Is this a cluster that was upgraded to virtual nodes (that would include a 1.2.x cluster initialized with one token per node, and num_tokens set after the fact).  If so, what did the initial token map look like?

* Was initial_token used at any point along the way (either to supply a single token, or csv list of them), on any or all of the nodes in this cluster, at any time?

* One data center (28)?  One rack (205)? Three nodes?

* How many keyspaces, and what are the replication strategies?

* What does the full output of `nodetool ring' look like now?  Can you attach it?

> So this is a little better.  At last node 3 has some content, but they

> are still far from balanced.  If I am understand this correctly, this

> is the distribution I would expect if the tokens were set at 15/5/1

> rather than equal.  As configured, I would expect roughly equal

> amounts of data on each node. Is that right?  Do you have any

> suggestions for what I can look at to get there?

Shuffle should only be required if you started out with 1-token-per-node.  In that case, your existing ranges are evenly divided num_tokens ways, and so should be exceptionally consistent with one another (assuming of course that the existing ranges were evenly sized).  The shuffle op merely "shuffles" the ranges you have to (random )other nodes in the cluster.

If this cluster were started from scratch with num_tokens = 256, then a total of 768 tokens would have been randomly generated from within the murmur3 hash-space.  Random assignment isn't perfect, but with 768 tokens (256 per), it should work out to be reasonably close on average.

TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware replication, your allocation is suspicious.

> I have about 11M rows of data in this keyspace and none of them are

> exceptionally long ... it's data pulled from Oracle and didn't include

> any BLOB, etc.

[ ... ]

> From: aaron morton [mailto:aaron@thelastpickle.com]

> Sent: Tuesday, February 05, 2013 3:41 PM

> To: user@cassandra.apache.org<ma...@cassandra.apache.org>

> Subject: Re: unbalanced ring

>

>

>

> Use nodetool status with vnodes

> http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnod

> es

>

>

>

> The different load can be caused by rack affinity, are all the nodes

> in the same rack ? Another simple check is have you created some very big rows?

> On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com<ma...@wellsfargo.com> wrote:

>

>

>

> So I have three nodes in a ring in one data center.  My configuration

> has

> num_tokens: 256 set andinitial_token commented out.  When I look at

> the ring, it shows me all of the token ranges of course, and basically

> identical data for each range on each node.  Here is the Cliff's Notes

> version of what I see:

>

>

>

> [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

>

>

>

> Datacenter: 28

>

> ==========

>

> Replicas: 1

>

>

>

> Address         Rack        Status State   Load            Owns

> Token

>

>

> 9187343239835811839

>

> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%

> -3026347817059713363

>

> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%

> -3026276684526453414

>

> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%

> -3026205551993193465

>

>   (etc)

>

> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%

> -9187343239835811840

>

> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%

> -9151314442816847872

>

> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%

> -9115285645797883904

>

>   (etc)

>

> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%

> -9223372036854775808

>

> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%

> 36028797018963967

>

> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%

> 72057594037927935

>

>   (etc)

>

>

>

> So at this point I have a number of questions.   The biggest question is of

> Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127

> has only 0.000069 GB?  These boxes are all comparable and all

> configured identically.

>

>

>

> partitioner: org.apache.cassandra.dht.Murmur3Partitioner

>

>

>

> I'm sorry to ask so many questions - I'm having a hard time finding

> documentation that explains this stuff.

--

Eric Evans

Acunu | http://www.acunu.com | @acunu

Re: unbalanced ring

Posted by Eric Evans <ee...@acunu.com>.

On Wed, Feb 6, 2013 at 2:02 PM,  <St...@wellsfargo.com> wrote:
> Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and compact
> on each of the nodes.
>
>
>
> [root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
>
> Datacenter: 28
>
> ==============
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address           Load       Tokens  Owns (effective)  Host ID
> Rack
>
> UN  10.28.205.125     1.7 GB     255     33.7%
> 3daab184-61f0-49a0-b076-863f10bc8c6c  205
>
> UN  10.28.205.126     591.44 MB  256     99.9%
> 55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
>
> UN  10.28.205.127     112.28 MB  257     66.4%
> d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

Sorry, I have to ask, Is this the complete output?  Have you perhaps
sanitized it in some way?

It seems like there is some piece of missing context here.  Can you tell us:

* Is this a cluster that was upgraded to virtual nodes (that would
include a 1.2.x cluster initialized with one token per node, and
num_tokens set after the fact).  If so, what did the initial token map
look like?
* Was initial_token used at any point along the way (either to supply
a single token, or csv list of them), on any or all of the nodes in
this cluster, at any time?
* One data center (28)?  One rack (205)? Three nodes?
* How many keyspaces, and what are the replication strategies?
* What does the full output of `nodetool ring' look like now?  Can you
attach it?

> So this is a little better.  At last node 3 has some content, but they are
> still far from balanced.  If I am understand this correctly, this is the
> distribution I would expect if the tokens were set at 15/5/1 rather than
> equal.  As configured, I would expect roughly equal amounts of data on each
> node. Is that right?  Do you have any suggestions for what I can look at to
> get there?

Shuffle should only be required if you started out with
1-token-per-node.  In that case, your existing ranges are evenly
divided num_tokens ways, and so should be exceptionally consistent
with one another (assuming of course that the existing ranges were
evenly sized).  The shuffle op merely "shuffles" the ranges you have
to (random )other nodes in the cluster.

If this cluster were started from scratch with num_tokens = 256, then
a total of 768 tokens would have been randomly generated from within
the murmur3 hash-space.  Random assignment isn't perfect, but with 768
tokens (256 per), it should work out to be reasonably close on
average.

TL;DR  What Aaron Said(tm)  In the absence of rack/dc aware
replication, your allocation is suspicious.

> I have about 11M rows of data in this keyspace and none of them are
> exceptionally long … it’s data pulled from Oracle and didn’t include any
> BLOB, etc.

[ ... ]

> From: aaron morton [mailto:aaron@thelastpickle.com]
> Sent: Tuesday, February 05, 2013 3:41 PM
> To: user@cassandra.apache.org
> Subject: Re: unbalanced ring
>
>
>
> Use nodetool status with vnodes
> http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes
>
>
>
> The different load can be caused by rack affinity, are all the nodes in the
> same rack ? Another simple check is have you created some very big rows?

> On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com wrote:
>
>
>
> So I have three nodes in a ring in one data center.  My configuration has
> num_tokens: 256 set andinitial_token commented out.  When I look at the
> ring, it shows me all of the token ranges of course, and basically identical
> data for each range on each node.  Here is the Cliff’s Notes version of what
> I see:
>
>
>
> [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring
>
>
>
> Datacenter: 28
>
> ==========
>
> Replicas: 1
>
>
>
> Address         Rack        Status State   Load            Owns
> Token
>
>
> 9187343239835811839
>
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> -3026347817059713363
>
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> -3026276684526453414
>
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%
> -3026205551993193465
>
>   (etc)
>
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> -9187343239835811840
>
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> -9151314442816847872
>
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%
> -9115285645797883904
>
>   (etc)
>
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> -9223372036854775808
>
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> 36028797018963967
>
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%
> 72057594037927935
>
>   (etc)
>
>
>
> So at this point I have a number of questions.   The biggest question is of
> Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has
> only 0.000069 GB?  These boxes are all comparable and all configured
> identically.
>
>
>
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>
>
>
> I’m sorry to ask so many questions – I’m having a hard time finding
> documentation that explains this stuff.


--
Eric Evans
Acunu | http://www.acunu.com | @acunu

RE: unbalanced ring

Posted by St...@wellsfargo.com.

Thanks Aaron.  I ran the cassandra-shuffle job and did a rebuild and compact on each of the nodes.

[root@Config3482VM1 apache-cassandra-1.2.1]# bin/nodetool status
Datacenter: 28
==============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.28.205.125     1.7 GB     255     33.7%             3daab184-61f0-49a0-b076-863f10bc8c6c  205
UN  10.28.205.126     591.44 MB  256     99.9%             55bbd4b1-8036-4e32-b975-c073a7f0f47f  205
UN  10.28.205.127     112.28 MB  257     66.4%             d240c91f-4901-40ad-bd66-d374a0ccf0b9  205

So this is a little better.  At last node 3 has some content, but they are still far from balanced.  If I am understand this correctly, this is the distribution I would expect if the tokens were set at 15/5/1 rather than equal.  As configured, I would expect roughly equal amounts of data on each node. Is that right?  Do you have any suggestions for what I can look at to get there?

I have about 11M rows of data in this keyspace and none of them are exceptionally long ... it's data pulled from Oracle and didn't include any BLOB, etc.

Stephen Thompson
Wells Fargo Corporation
Internet Authentication & Fraud Prevention
704.427.3137 (W) | 704.807.3431 (C)

This message may contain confidential and/or privileged information, and is intended for the use of the addressee only. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.

From: aaron morton [mailto:aaron@thelastpickle.com]
Sent: Tuesday, February 05, 2013 3:41 PM
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

Use nodetool status with vnodes http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes

The different load can be caused by rack affinity, are all the nodes in the same rack ? Another simple check is have you created some very big rows?
Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com<ma...@wellsfargo.com> wrote:


So I have three nodes in a ring in one data center.  My configuration has num_tokens: 256 set andinitial_token commented out.  When I look at the ring, it shows me all of the token ranges of course, and basically identical data for each range on each node.  Here is the Cliff's Notes version of what I see:

[root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring

Datacenter: 28
==========
Replicas: 1

Address         Rack        Status State   Load            Owns                Token
                                                                               9187343239835811839
10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026347817059713363
10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026276684526453414
10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026205551993193465
  (etc)
10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9187343239835811840
10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9151314442816847872
10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9115285645797883904
  (etc)
10.28.205.127   205         Up     Normal  69.13 KB        66.30%              -9223372036854775808
10.28.205.127   205         Up     Normal  69.13 KB        66.30%              36028797018963967
10.28.205.127   205         Up     Normal  69.13 KB        66.30%              72057594037927935
  (etc)

So at this point I have a number of questions.   The biggest question is of Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 0.000069 GB?  These boxes are all comparable and all configured identically.

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

I'm sorry to ask so many questions - I'm having a hard time finding documentation that explains this stuff.

Stephen

Re: unbalanced ring

Posted by aaron morton <aa...@thelastpickle.com>.

Use nodetool status with vnodes http://www.datastax.com/dev/blog/upgrading-an-existing-cluster-to-vnodes

The different load can be caused by rack affinity, are all the nodes in the same rack ? Another simple check is have you created some very big rows?
Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 6/02/2013, at 8:40 AM, Stephen.M.Thompson@wellsfargo.com wrote:

> So I have three nodes in a ring in one data center.  My configuration has num_tokens: 256 set andinitial_token commented out.  When I look at the ring, it shows me all of the token ranges of course, and basically identical data for each range on each node.  Here is the Cliff’s Notes version of what I see:
>  
> [root@Config3482VM2 apache-cassandra-1.2.0]# bin/nodetool ring
>  
> Datacenter: 28
> ==========
> Replicas: 1
>  
> Address         Rack        Status State   Load            Owns                Token
>                                                                                9187343239835811839
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026347817059713363
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026276684526453414
> 10.28.205.125   205         Up     Normal  2.85 GB         33.69%              -3026205551993193465
>   (etc)
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9187343239835811840
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9151314442816847872
> 10.28.205.126   205         Up     Normal  1.15 GB         100.00%             -9115285645797883904
>   (etc)
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%              -9223372036854775808
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%              36028797018963967
> 10.28.205.127   205         Up     Normal  69.13 KB        66.30%              72057594037927935
>   (etc)
>  
> So at this point I have a number of questions.   The biggest question is of Load.  Why does the .125 node have 2.85 GB, .126 has 1.15 GB, and .127 has only 0.000069 GB?  These boxes are all comparable and all configured identically.
>  
> partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>  
> I’m sorry to ask so many questions – I’m having a hard time finding documentation that explains this stuff.
>  
> Stephen