You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Loic Lambiel <lo...@exoscale.ch> on 2017/07/10 13:02:03 UTC

Unbalanced cluster

Hi,

One of our clusters is becoming somehow unbalanced, at least some of the
nodes:

(output edited to remove unnecessary information)
--  Address         Load       Tokens  Owns (effective)   Rack
UN  192.168.1.22   2.99 TB    32      10.6%               RACK1
UN  192.168.1.23   3.35 TB    32      11.7%               RACK1
UN  192.168.1.20   3.22 TB    32      11.3%               RACK1
UN  192.168.1.21   3.21 TB    32      11.2%               RACK1
UN  192.168.1.18   2.87 TB    32      10.3%               RACK1
UN  192.168.1.19   3.49 TB    32      12.0%               RACK1
UN  192.168.1.16   5.32 TB    32      12.9%               RACK1
UN  192.168.1.17   3.77 TB    32      12.0%               RACK1
UN  192.168.1.26   4.46 TB    32      11.2%               RACK1
UN  192.168.1.24   3.24 TB    32      11.4%               RACK1
UN  192.168.1.25   3.31 TB    32      11.2%               RACK1
UN  192.168.1.134  2.75 TB    18      7.2%                RACK1
UN  192.168.1.135  2.52 TB    18      6.0%                RACK1
UN  192.168.1.132  1.85 TB    18      6.8%                RACK1
UN  192.168.1.133  2.41 TB    18      5.7%                RACK1
UN  192.168.1.130  2.95 TB    18      7.1%                RACK1
UN  192.168.1.131  2.82 TB    18      6.7%                RACK1
UN  192.168.1.128  3.04 TB    18      7.1%                RACK1
UN  192.168.1.129  2.47 TB    18      7.2%                RACK1
UN  192.168.1.14   5.63 TB    32      13.4%               RACK1
UN  192.168.1.15   2.95 TB    32      10.4%               RACK1
UN  192.168.1.12   3.83 TB    32      12.4%               RACK1
UN  192.168.1.13   2.71 TB    32      9.5%                RACK1
UN  192.168.1.10   3.51 TB    32      11.9%               RACK1
UN  192.168.1.11   2.96 TB    32      10.3%               RACK1
UN  192.168.1.126  2.48 TB    18      6.7%                RACK1
UN  192.168.1.127  2.23 TB    18      5.5%                RACK1
UN  192.168.1.124  2.05 TB    18      5.5%                RACK1
UN  192.168.1.125  2.33 TB    18      5.8%                RACK1
UN  192.168.1.122  1.99 TB    18      5.1%                RACK1
UN  192.168.1.123  2.44 TB    18      5.7%                RACK1
UN  192.168.1.120  3.58 TB    28      11.4%               RACK1
UN  192.168.1.121  2.33 TB    18      6.8%                RACK1

Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
They both have 32 tokens.

The cluster is running:

* Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
enabled)
* RF=3 with single DC and single rack. LCS as the compaction strategy,
JBOD storage
* Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
* Node cleanup performed on all nodes

Almost all of the cluster load comes from a single CF:

CREATE TABLE blobstore.block (
    inode uuid,
    version timeuuid,
    block bigint,
    offset bigint,
    chunksize int,
    payload blob,
    PRIMARY KEY ((inode, version, block), offset)
) WITH CLUSTERING ORDER BY (offset ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'tombstone_threshold': '0.1',
'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
'false', 'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'sstable_compression':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 172000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

The payload column is almost the same size in each record.

I understand that an unbalanced cluster may be the result of a bad
Primary key, which I believe isn't the case here.

Any clue on what could be the cause ? How can I re-balance it without
any decommission ?

My understanding is that nodetool move may only be used when not using
the vnodes feature.

Any help appreciated, thanks !

----
Loic Lambiel

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Unbalanced cluster

Posted by kurt greaves <ku...@instaclustr.com>.

the reason for the default of 256 vnodes is because at that many tokens the
random distribution of tokens is enough to balance out each nodes token
allocation almost evenly. any less and some nodes will get far more
unbalanced, as Avi has shown. In 3.0 there is a new token allocating
algorithm however it requires configuring prior to adding a node and also
only really works well if your RF=# of racks, or you only use 1 rack. have
a look around for the allocate_token_keyspace option for more details.

Re: Unbalanced cluster

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Awesome utility Avi! Thanks for sharing.
On Tue, Jul 11, 2017 at 10:57 AM Avi Kivity <av...@scylladb.com> wrote:

> There is now a readme with some examples and a build file.
>
> On 07/11/2017 11:53 AM, Avi Kivity wrote:
>
> Yeah, posting a github link carries an implied undertaking to write a
> README file and make it easily buildable. I'll see what I can do.
>
>
>
>
> On 07/11/2017 06:25 AM, Nate McCall wrote:
>
> You wouldnt have a build file laying around for that, would you?
>
> On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall <na...@thelastpickle.com>
> wrote:
>
>> On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity <av...@scylladb.com> wrote:
>>
>>>
>>>
>>>
>>> [1] https://github.com/avikivity/shardsim
>>
>>
>> Avi, that's super handy - thanks for posting.
>>
>
>
>
> --
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
>

Re: Unbalanced cluster

Posted by Avi Kivity <av...@scylladb.com>.

There is now a readme with some examples and a build file.


On 07/11/2017 11:53 AM, Avi Kivity wrote:
>
> Yeah, posting a github link carries an implied undertaking to write a 
> README file and make it easily buildable. I'll see what I can do.
>
>
>
>
> On 07/11/2017 06:25 AM, Nate McCall wrote:
>> You wouldnt have a build file laying around for that, would you?
>>
>> On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall <nate@thelastpickle.com 
>> <ma...@thelastpickle.com>> wrote:
>>
>>     On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity <avi@scylladb.com
>>     <ma...@scylladb.com>> wrote:
>>
>>
>>
>>
>>         [1] https://github.com/avikivity/shardsim
>>         <https://github.com/avikivity/shardsim>
>>
>>
>>     Avi, that's super handy - thanks for posting.
>>
>>
>>
>>
>> -- 
>> -----------------
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>

Re: Unbalanced cluster

Posted by Avi Kivity <av...@scylladb.com>.

Yeah, posting a github link carries an implied undertaking to write a 
README file and make it easily buildable. I'll see what I can do.




On 07/11/2017 06:25 AM, Nate McCall wrote:
> You wouldnt have a build file laying around for that, would you?
>
> On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall <nate@thelastpickle.com 
> <ma...@thelastpickle.com>> wrote:
>
>     On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity <avi@scylladb.com
>     <ma...@scylladb.com>> wrote:
>
>
>
>
>         [1] https://github.com/avikivity/shardsim
>         <https://github.com/avikivity/shardsim>
>
>
>     Avi, that's super handy - thanks for posting.
>
>
>
>
> -- 
> -----------------
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com

Re: Unbalanced cluster

Posted by Nate McCall <na...@thelastpickle.com>.

You wouldnt have a build file laying around for that, would you?

On Tue, Jul 11, 2017 at 3:23 PM, Nate McCall <na...@thelastpickle.com> wrote:

> On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity <av...@scylladb.com> wrote:
>
>>
>>
>>
>> [1] https://github.com/avikivity/shardsim
>>
>
> Avi, that's super handy - thanks for posting.
>

-- 
-----------------
Nate McCall
Wellington, NZ
@zznate

CTO
Apache Cassandra Consulting
http://www.thelastpickle.com

Re: Unbalanced cluster

Posted by Nate McCall <na...@thelastpickle.com>.

On Tue, Jul 11, 2017 at 3:20 AM, Avi Kivity <av...@scylladb.com> wrote:

>
>
>
> [1] https://github.com/avikivity/shardsim
>

Avi, that's super handy - thanks for posting.

Re: Unbalanced cluster

Posted by Avi Kivity <av...@scylladb.com>.

It is ScyllaDB specific. Scylla divides data not only among nodes, but 
also internally within a node among cores (=shards in our terminology). 
In the past we had problems with shards being over- and under-utilized 
(just like your cluster), so this simulator was developed to validate 
the solution.


On 07/11/2017 10:27 AM, Loic Lambiel wrote:
> Thanks for the hint and tool !
>
> By the way, what does the --shards parameter means ?
>
> Thanks
>
> Loic
>
> On 07/10/2017 05:20 PM, Avi Kivity wrote:
>> 32 tokens is too few for 33 nodes. I have a sharding simulator [1] and
>> it shows
>>
>>
>> $ ./shardsim --vnodes 32 --nodes 33 --shards 1
>> 33 nodes, 32 vnodes, 1 shards
>> maximum node overcommit:  1.42642
>> maximum shard overcommit: 1.426417
>>
>>
>> So 40% overcommit over the average. Since some nodes can be
>> undercommitted, this easily explains the 2X difference (40% overcommit +
>> 30% undercommit = 2X).
>>
>>
>> Newer versions of Cassandra have better token selection and will suffer
>> less from this.
>>
>>
>>
>> [1] https://github.com/avikivity/shardsim
>>
>>
>> On 07/10/2017 04:02 PM, Loic Lambiel wrote:
>>> Hi,
>>>
>>> One of our clusters is becoming somehow unbalanced, at least some of the
>>> nodes:
>>>
>>> (output edited to remove unnecessary information)
>>> --  Address         Load       Tokens  Owns (effective)   Rack
>>> UN  192.168.1.22   2.99 TB    32      10.6%               RACK1
>>> UN  192.168.1.23   3.35 TB    32      11.7%               RACK1
>>> UN  192.168.1.20   3.22 TB    32      11.3%               RACK1
>>> UN  192.168.1.21   3.21 TB    32      11.2%               RACK1
>>> UN  192.168.1.18   2.87 TB    32      10.3%               RACK1
>>> UN  192.168.1.19   3.49 TB    32      12.0%               RACK1
>>> UN  192.168.1.16   5.32 TB    32      12.9%               RACK1
>>> UN  192.168.1.17   3.77 TB    32      12.0%               RACK1
>>> UN  192.168.1.26   4.46 TB    32      11.2%               RACK1
>>> UN  192.168.1.24   3.24 TB    32      11.4%               RACK1
>>> UN  192.168.1.25   3.31 TB    32      11.2%               RACK1
>>> UN  192.168.1.134  2.75 TB    18      7.2%                RACK1
>>> UN  192.168.1.135  2.52 TB    18      6.0%                RACK1
>>> UN  192.168.1.132  1.85 TB    18      6.8%                RACK1
>>> UN  192.168.1.133  2.41 TB    18      5.7%                RACK1
>>> UN  192.168.1.130  2.95 TB    18      7.1%                RACK1
>>> UN  192.168.1.131  2.82 TB    18      6.7%                RACK1
>>> UN  192.168.1.128  3.04 TB    18      7.1%                RACK1
>>> UN  192.168.1.129  2.47 TB    18      7.2%                RACK1
>>> UN  192.168.1.14   5.63 TB    32      13.4%               RACK1
>>> UN  192.168.1.15   2.95 TB    32      10.4%               RACK1
>>> UN  192.168.1.12   3.83 TB    32      12.4%               RACK1
>>> UN  192.168.1.13   2.71 TB    32      9.5%                RACK1
>>> UN  192.168.1.10   3.51 TB    32      11.9%               RACK1
>>> UN  192.168.1.11   2.96 TB    32      10.3%               RACK1
>>> UN  192.168.1.126  2.48 TB    18      6.7%                RACK1
>>> UN  192.168.1.127  2.23 TB    18      5.5%                RACK1
>>> UN  192.168.1.124  2.05 TB    18      5.5%                RACK1
>>> UN  192.168.1.125  2.33 TB    18      5.8%                RACK1
>>> UN  192.168.1.122  1.99 TB    18      5.1%                RACK1
>>> UN  192.168.1.123  2.44 TB    18      5.7%                RACK1
>>> UN  192.168.1.120  3.58 TB    28      11.4%               RACK1
>>> UN  192.168.1.121  2.33 TB    18      6.8%                RACK1
>>>
>>> Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
>>> 192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
>>> They both have 32 tokens.
>>>
>>> The cluster is running:
>>>
>>> * Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
>>> enabled)
>>> * RF=3 with single DC and single rack. LCS as the compaction strategy,
>>> JBOD storage
>>> * Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>>> * Node cleanup performed on all nodes
>>>
>>> Almost all of the cluster load comes from a single CF:
>>>
>>> CREATE TABLE blobstore.block (
>>>       inode uuid,
>>>       version timeuuid,
>>>       block bigint,
>>>       offset bigint,
>>>       chunksize int,
>>>       payload blob,
>>>       PRIMARY KEY ((inode, version, block), offset)
>>> ) WITH CLUSTERING ORDER BY (offset ASC)
>>>       AND bloom_filter_fp_chance = 0.01
>>>       AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>>       AND comment = ''
>>>       AND compaction = {'tombstone_threshold': '0.1',
>>> 'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
>>> 'false', 'class':
>>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>>       AND compression = {'sstable_compression':
>>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>>       AND dclocal_read_repair_chance = 0.1
>>>       AND default_time_to_live = 0
>>>       AND gc_grace_seconds = 172000
>>>       AND max_index_interval = 2048
>>>       AND memtable_flush_period_in_ms = 0
>>>       AND min_index_interval = 128
>>>       AND read_repair_chance = 0.0
>>>       AND speculative_retry = '99.0PERCENTILE';
>>>
>>> The payload column is almost the same size in each record.
>>>
>>> I understand that an unbalanced cluster may be the result of a bad
>>> Primary key, which I believe isn't the case here.
>>>
>>> Any clue on what could be the cause ? How can I re-balance it without
>>> any decommission ?
>>>
>>> My understanding is that nodetool move may only be used when not using
>>> the vnodes feature.
>>>
>>> Any help appreciated, thanks !
>>>
>>> ----
>>> Loic Lambiel
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>>> For additional commands, e-mail: user-help@cassandra.apache.org
>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Unbalanced cluster

Posted by Loic Lambiel <lo...@exoscale.ch>.

Thanks for the hint and tool !

By the way, what does the --shards parameter means ?

Thanks

Loic

On 07/10/2017 05:20 PM, Avi Kivity wrote:
> 32 tokens is too few for 33 nodes. I have a sharding simulator [1] and
> it shows
> 
> 
> $ ./shardsim --vnodes 32 --nodes 33 --shards 1
> 33 nodes, 32 vnodes, 1 shards
> maximum node overcommit:  1.42642
> maximum shard overcommit: 1.426417
> 
> 
> So 40% overcommit over the average. Since some nodes can be
> undercommitted, this easily explains the 2X difference (40% overcommit +
> 30% undercommit = 2X).
> 
> 
> Newer versions of Cassandra have better token selection and will suffer
> less from this.
> 
> 
> 
> [1] https://github.com/avikivity/shardsim
> 
> 
> On 07/10/2017 04:02 PM, Loic Lambiel wrote:
>> Hi,
>>
>> One of our clusters is becoming somehow unbalanced, at least some of the
>> nodes:
>>
>> (output edited to remove unnecessary information)
>> --  Address         Load       Tokens  Owns (effective)   Rack
>> UN  192.168.1.22   2.99 TB    32      10.6%               RACK1
>> UN  192.168.1.23   3.35 TB    32      11.7%               RACK1
>> UN  192.168.1.20   3.22 TB    32      11.3%               RACK1
>> UN  192.168.1.21   3.21 TB    32      11.2%               RACK1
>> UN  192.168.1.18   2.87 TB    32      10.3%               RACK1
>> UN  192.168.1.19   3.49 TB    32      12.0%               RACK1
>> UN  192.168.1.16   5.32 TB    32      12.9%               RACK1
>> UN  192.168.1.17   3.77 TB    32      12.0%               RACK1
>> UN  192.168.1.26   4.46 TB    32      11.2%               RACK1
>> UN  192.168.1.24   3.24 TB    32      11.4%               RACK1
>> UN  192.168.1.25   3.31 TB    32      11.2%               RACK1
>> UN  192.168.1.134  2.75 TB    18      7.2%                RACK1
>> UN  192.168.1.135  2.52 TB    18      6.0%                RACK1
>> UN  192.168.1.132  1.85 TB    18      6.8%                RACK1
>> UN  192.168.1.133  2.41 TB    18      5.7%                RACK1
>> UN  192.168.1.130  2.95 TB    18      7.1%                RACK1
>> UN  192.168.1.131  2.82 TB    18      6.7%                RACK1
>> UN  192.168.1.128  3.04 TB    18      7.1%                RACK1
>> UN  192.168.1.129  2.47 TB    18      7.2%                RACK1
>> UN  192.168.1.14   5.63 TB    32      13.4%               RACK1
>> UN  192.168.1.15   2.95 TB    32      10.4%               RACK1
>> UN  192.168.1.12   3.83 TB    32      12.4%               RACK1
>> UN  192.168.1.13   2.71 TB    32      9.5%                RACK1
>> UN  192.168.1.10   3.51 TB    32      11.9%               RACK1
>> UN  192.168.1.11   2.96 TB    32      10.3%               RACK1
>> UN  192.168.1.126  2.48 TB    18      6.7%                RACK1
>> UN  192.168.1.127  2.23 TB    18      5.5%                RACK1
>> UN  192.168.1.124  2.05 TB    18      5.5%                RACK1
>> UN  192.168.1.125  2.33 TB    18      5.8%                RACK1
>> UN  192.168.1.122  1.99 TB    18      5.1%                RACK1
>> UN  192.168.1.123  2.44 TB    18      5.7%                RACK1
>> UN  192.168.1.120  3.58 TB    28      11.4%               RACK1
>> UN  192.168.1.121  2.33 TB    18      6.8%                RACK1
>>
>> Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
>> 192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
>> They both have 32 tokens.
>>
>> The cluster is running:
>>
>> * Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
>> enabled)
>> * RF=3 with single DC and single rack. LCS as the compaction strategy,
>> JBOD storage
>> * Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
>> * Node cleanup performed on all nodes
>>
>> Almost all of the cluster load comes from a single CF:
>>
>> CREATE TABLE blobstore.block (
>>      inode uuid,
>>      version timeuuid,
>>      block bigint,
>>      offset bigint,
>>      chunksize int,
>>      payload blob,
>>      PRIMARY KEY ((inode, version, block), offset)
>> ) WITH CLUSTERING ORDER BY (offset ASC)
>>      AND bloom_filter_fp_chance = 0.01
>>      AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>>      AND comment = ''
>>      AND compaction = {'tombstone_threshold': '0.1',
>> 'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
>> 'false', 'class':
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>      AND compression = {'sstable_compression':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>      AND dclocal_read_repair_chance = 0.1
>>      AND default_time_to_live = 0
>>      AND gc_grace_seconds = 172000
>>      AND max_index_interval = 2048
>>      AND memtable_flush_period_in_ms = 0
>>      AND min_index_interval = 128
>>      AND read_repair_chance = 0.0
>>      AND speculative_retry = '99.0PERCENTILE';
>>
>> The payload column is almost the same size in each record.
>>
>> I understand that an unbalanced cluster may be the result of a bad
>> Primary key, which I believe isn't the case here.
>>
>> Any clue on what could be the cause ? How can I re-balance it without
>> any decommission ?
>>
>> My understanding is that nodetool move may only be used when not using
>> the vnodes feature.
>>
>> Any help appreciated, thanks !
>>
>> ----
>> Loic Lambiel
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
> 

-- 
Loic Lambiel
Head of Operations
Tel : +41 78 649 53 93
loic.lambiel@exoscale.ch
❬❱ https://www.exoscale.ch

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Unbalanced cluster

Posted by Avi Kivity <av...@scylladb.com>.

32 tokens is too few for 33 nodes. I have a sharding simulator [1] and 
it shows


$ ./shardsim --vnodes 32 --nodes 33 --shards 1
33 nodes, 32 vnodes, 1 shards
maximum node overcommit:  1.42642
maximum shard overcommit: 1.426417


So 40% overcommit over the average. Since some nodes can be 
undercommitted, this easily explains the 2X difference (40% overcommit + 
30% undercommit = 2X).


Newer versions of Cassandra have better token selection and will suffer 
less from this.



[1] https://github.com/avikivity/shardsim


On 07/10/2017 04:02 PM, Loic Lambiel wrote:
> Hi,
>
> One of our clusters is becoming somehow unbalanced, at least some of the
> nodes:
>
> (output edited to remove unnecessary information)
> --  Address         Load       Tokens  Owns (effective)   Rack
> UN  192.168.1.22   2.99 TB    32      10.6%               RACK1
> UN  192.168.1.23   3.35 TB    32      11.7%               RACK1
> UN  192.168.1.20   3.22 TB    32      11.3%               RACK1
> UN  192.168.1.21   3.21 TB    32      11.2%               RACK1
> UN  192.168.1.18   2.87 TB    32      10.3%               RACK1
> UN  192.168.1.19   3.49 TB    32      12.0%               RACK1
> UN  192.168.1.16   5.32 TB    32      12.9%               RACK1
> UN  192.168.1.17   3.77 TB    32      12.0%               RACK1
> UN  192.168.1.26   4.46 TB    32      11.2%               RACK1
> UN  192.168.1.24   3.24 TB    32      11.4%               RACK1
> UN  192.168.1.25   3.31 TB    32      11.2%               RACK1
> UN  192.168.1.134  2.75 TB    18      7.2%                RACK1
> UN  192.168.1.135  2.52 TB    18      6.0%                RACK1
> UN  192.168.1.132  1.85 TB    18      6.8%                RACK1
> UN  192.168.1.133  2.41 TB    18      5.7%                RACK1
> UN  192.168.1.130  2.95 TB    18      7.1%                RACK1
> UN  192.168.1.131  2.82 TB    18      6.7%                RACK1
> UN  192.168.1.128  3.04 TB    18      7.1%                RACK1
> UN  192.168.1.129  2.47 TB    18      7.2%                RACK1
> UN  192.168.1.14   5.63 TB    32      13.4%               RACK1
> UN  192.168.1.15   2.95 TB    32      10.4%               RACK1
> UN  192.168.1.12   3.83 TB    32      12.4%               RACK1
> UN  192.168.1.13   2.71 TB    32      9.5%                RACK1
> UN  192.168.1.10   3.51 TB    32      11.9%               RACK1
> UN  192.168.1.11   2.96 TB    32      10.3%               RACK1
> UN  192.168.1.126  2.48 TB    18      6.7%                RACK1
> UN  192.168.1.127  2.23 TB    18      5.5%                RACK1
> UN  192.168.1.124  2.05 TB    18      5.5%                RACK1
> UN  192.168.1.125  2.33 TB    18      5.8%                RACK1
> UN  192.168.1.122  1.99 TB    18      5.1%                RACK1
> UN  192.168.1.123  2.44 TB    18      5.7%                RACK1
> UN  192.168.1.120  3.58 TB    28      11.4%               RACK1
> UN  192.168.1.121  2.33 TB    18      6.8%                RACK1
>
> Notice the node 192.168.1.14 owns 13.4%  / 5.63TB while node
> 192.168.1.13 owns only 9.5% / 2.71TB, which is almost twice the load.
> They both have 32 tokens.
>
> The cluster is running:
>
> * Cassandra 2.1.16 (initially bootstrapped running 2.1.2, with vnodes
> enabled)
> * RF=3 with single DC and single rack. LCS as the compaction strategy,
> JBOD storage
> * Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
> * Node cleanup performed on all nodes
>
> Almost all of the cluster load comes from a single CF:
>
> CREATE TABLE blobstore.block (
>      inode uuid,
>      version timeuuid,
>      block bigint,
>      offset bigint,
>      chunksize int,
>      payload blob,
>      PRIMARY KEY ((inode, version, block), offset)
> ) WITH CLUSTERING ORDER BY (offset ASC)
>      AND bloom_filter_fp_chance = 0.01
>      AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
>      AND comment = ''
>      AND compaction = {'tombstone_threshold': '0.1',
> 'tombstone_compaction_interval': '60', 'unchecked_tombstone_compaction':
> 'false', 'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>      AND compression = {'sstable_compression':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>      AND dclocal_read_repair_chance = 0.1
>      AND default_time_to_live = 0
>      AND gc_grace_seconds = 172000
>      AND max_index_interval = 2048
>      AND memtable_flush_period_in_ms = 0
>      AND min_index_interval = 128
>      AND read_repair_chance = 0.0
>      AND speculative_retry = '99.0PERCENTILE';
>
> The payload column is almost the same size in each record.
>
> I understand that an unbalanced cluster may be the result of a bad
> Primary key, which I believe isn't the case here.
>
> Any clue on what could be the cause ? How can I re-balance it without
> any decommission ?
>
> My understanding is that nodetool move may only be used when not using
> the vnodes feature.
>
> Any help appreciated, thanks !
>
> ----
> Loic Lambiel
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org