You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by learner dba <ca...@yahoo.com.INVALID> on 2018/06/18 18:06:09 UTC

Cluster is unbalanced

Hi,
Data volume varies a lot in our two DC cluster:

 Load       Tokens       Owns  

 20.01 GiB  256          ?     

 65.32 GiB  256          ?     

 60.09 GiB  256          ?     

 46.95 GiB  256          ?     

 50.73 GiB  256          ?     

kaiprodv2

=========

/Leaving/Joining/Moving

 Load       Tokens       Owns  

 25.19 GiB  256          ?     

 30.26 GiB  256          ?     

 9.82 GiB   256          ?     

 20.54 GiB  256          ?     

 9.7 GiB    256          ?     

I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:

Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992
Number of partitions (estimate): 15839280
How can I diagnose this imbalance further?

Re: RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by Joshua Galbraith <jg...@newrelic.com.INVALID>.

>Also, our partition keys are not distributed evenly as I had pasted output
earlier.

Thanks, I see that now. Can you share the full output of nodetool tablestats
 and nodetool tablehistograms?

Out of curiosity, are you running repairs on this cluster? If so, what type
of repairs are you running and how often?

One way you might differentiate between a server-side/configuration issue
or a client/data model issue is to write a script that populates a test
keyspace with uniformly distributed partitions and see if that keyspace
also exhibits a similar imbalance of partitions per node. You might be able
to use a heavily-throttled cassandra-stress invocation to handle this.

On Wed, Jun 20, 2018 at 12:32 PM, learner dba <
cassandradba@yahoo.com.invalid> wrote:

>
> Hi Joshua,
>
> Okay, that string appears to be a base64-encoded version 4 UUID.
> Why not use Cassandra's UUID data type to store that directly rather than
> storing the longer base64 string as text?  --> It's an old application and
> the person who coded it, has left the company.
> What does the UUID represent? --> Unique account id.
> Is it identifying a unique product, an image, or some other type of
> object? --> yes
> When and how is the underlying UUID being generated by the application?
> --> Not sure about it.
>
> I assume you're using the default partitioner, but just in case, can you
> confirm which partitioner you're using in your cassandra.yaml file (e.g.
> Murmer3, Random, ByteOrdered)? --> partitioner: org.apache.cassandra.dht.
> Murmur3Partitioner
>
>
> Mentioned Jiras are from much older version than ours "3.11.2"; Also, our
> partition keys are not distributed evenly as I had pasted output earlier.
> Which means none of the Jiras apply in our case :(
>
>
> On Wednesday, June 20, 2018, 12:18:28 PM EDT, Joshua Galbraith <
> jgalbraith@newrelic.com.INVALID> wrote:
>
>
> Okay, that string appears to be a base64-encoded version 4 UUID. Why not
> use Cassandra's UUID data type to store that directly rather than storing
> the longer base64 string as text? What does the UUID represent? Is it
> identifying a unique product, an image, or some other type of object? When
> and how is the underlying UUID being generated by the application?
>
> I assume you're using the default partitioner, but just in case, can you
> confirm which partitioner you're using in your cassandra.yaml file (e.g.
> Murmer3, Random, ByteOrdered)?
>
> Also, please have a look at these two issues and verify you're not
> experiencing either:
>
> * https://issues.apache.org/jira/browse/CASSANDRA-7032
> * https://issues.apache.org/jira/browse/CASSANDRA-10430
>
>
> On Wed, Jun 20, 2018 at 9:59 AM, learner dba <cassandradba@yahoo.com.
> invalid> wrote:
>
> Partition key has value as:
>
> MWY4MmI0MTQtYTk2YS00YmRjLTkxND MtOWU0MjM1OWU2NzUy other column is blob.
>
> On Tuesday, June 19, 2018, 6:07:59 PM EDT, Joshua Galbraith <
> jgalbraith@newrelic.com. INVALID> wrote:
>
>
> > id text PRIMARY KEY
>
> What values are written to this id field? Can you give us some examples or
> explain the general use case?
>
> On Tue, Jun 19, 2018 at 1:18 PM, learner dba <cassandradba@yahoo.com.
> invalid <ca...@yahoo.com.invalid>> wrote:
>
> Hi Sean,
>
> Here is create table:
>
> CREATE TABLE ks.cf (
>
>     id text PRIMARY KEY,
>
>     accessdata blob
>
> ) WITH bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class': 'org.apache.cassandra.db. compaction.
> SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io. compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 0
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
> Nodetool status:
>
> Datacenter: dc1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/ Moving
>
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                       Rack
>
> UN  xxxxx   20.66 GiB  256          61.4%             f4f54949-83c9-419b-9a43-
> cb630b36d8c2  RAC1
>
> UN  xxxxx  65.77 GiB  256          59.3%             3db430ae-45ef-4746-a273-
> bc1f66ac8981  RAC1
>
> UN  xxxxxx  60.58 GiB  256          58.4%             1f23e869-1823-4b75-8d3e-
> f9b32acba9a6  RAC1
>
> UN  xxxxx  47.08 GiB  256          57.5%             7aca9a36-823f-4185-be44-
> c1464a799084  RAC1
>
> UN  xxxxx  51.47 GiB  256          63.4%             18cff010-9b83-4cf8-9dc2-
> f05ac63df402  RAC1
>
> Datacenter: dc2
>
> ========================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/ Moving
>
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                       Rack
>
> UN  xxxx   24.37 GiB  256          59.5%             1b694180-210a-4b75-8f2a-
> 748f4a5b6a3d  RAC1
>
> UN  xxxxx 30.76 GiB  256          56.7%             597bac04-c57a-4487-8924-
> 72e171e45514  RAC1
>
> UN  xxxx  10.73 GiB  256          63.9%             6e7e474e-e292-4433-afd4-
> 372d30e0f3e1  RAC1
>
> UN  xxxxxx 19.77 GiB  256          61.5%             58751418-7b76-40f7-8b8f-
> a5bf8fe7d9a2  RAC1
>
> UN  xxxxx  10.33 GiB  256          58.4%             6d58d006-2095-449c-8c67-
> 50e8cbdfe7a7  RAC1
>
>
> cassandra-rackdc.properties:
>
> dc=dc1
> rack=RAC1 --> same in all nodes
>
> cassandra.yaml:
> num_tokens: 256
>
> endpoint_snitch: GossipingPropertyFileSnitch
> I can see cassandra-topology.properties, I believe it shouldn't be there
> with GossipPropertyFileSnitch. Can this file be causing any trouble in data
> distribution.
>
> cat /opt/cassandra/conf/cassandra- topology.properties
>
> # Licensed to the Apache Software Foundation (ASF) under one
>
> # or more contributor license agreements.  See the NOTICE file
>
> # distributed with this work for additional information
>
> # regarding copyright ownership.  The ASF licenses this file
>
> # to you under the Apache License, Version 2.0 (the
>
> # "License"); you may not use this file except in compliance
>
> # with the License.  You may obtain a copy of the License at
>
> #
>
> #     http://www.apache.org/ licenses/LICENSE-2.0
> <http://www.apache.org/licenses/LICENSE-2.0>
>
> #
>
> # Unless required by applicable law or agreed to in writing, software
>
> # distributed under the License is distributed on an "AS IS" BASIS,
>
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>
> # See the License for the specific language governing permissions and
>
> # limitations under the License.
>
>
> # Cassandra Node IP=Data Center:Rack
>
> 192.168.1.100=DC1:RAC1
>
> 192.168.2.200=DC2:RAC2
>
>
> 10.0.0.10=DC1:RAC1
>
> 10.0.0.11=DC1:RAC1
>
> 10.0.0.12=DC1:RAC2
>
>
> 10.20.114.10=DC2:RAC1
>
> 10.20.114.11=DC2:RAC1
>
>
> 10.21.119.13=DC3:RAC1
>
> 10.21.119.10=DC3:RAC1
>
>
> 10.0.0.13=DC1:RAC2
>
> 10.21.119.14=DC3:RAC2
>
> 10.20.114.15=DC2:RAC2
>
>
> # default for unknown nodes
>
> default=DC1:r1
>
>
> # Native IPv6 is supported, however you must escape the colon in the IPv6
> Address
>
> # Also be sure to comment out JVM_OPTS="$JVM_OPTS
> -Djava.net.preferIPv4Stack= true"
>
> # in cassandra-env.sh
>
> fe80\:0\:0\:0\:202\:b3ff\: fe1e\:8329=DC1:RAC3
>
>
>
>
> On Tuesday, June 19, 2018, 12:51:34 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
> You are correct that the cluster decides where data goes (based on the
> hash of the partition key). However, if you choose a “bad” partition key,
> you may not get good distribution of the data, because the hash is
> deterministic (it always goes to the same nodes/replicas). For example, if
> you have a partition key of a datetime, it is possible that there is more
> data written for a certain time period – thus a larger partition and an
> imbalance across the cluster. Choosing a “good” partition key is one of the
> most important decisions for a Cassandra table.
>
>
>
> Also, I have seen the use of racks in the topology cause an imbalance in
> the “first” node of the rack.
>
>
>
> To help you more, we would need the create table statement(s) for your
> keyspace and the topology of the cluster (like with nodetool status).
>
>
>
>
>
> Sean Durity
>
> *From:* learner dba <cassandradba@yahoo.com. INVALID>
> *Sent:* Tuesday, June 19, 2018 9:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: RE: [EXTERNAL] Cluster is unbalanced
>
>
>
> We do not chose the node where partition will go. I thought it is snitch's
> role to chose replica nodes. Even the partition size does not vary on our
> largest column family:
>
> Percentile  SSTables     Write Latency      Read Latency    Partition Size
>       Cell Count
>
>                               (micros)          (micros)           (bytes)
>
>
> 50%             0.00             17.08             61.21              3311
>                 1
>
> 75%             0.00             20.50             88.15              3973
>                 1
>
> 95%             0.00             35.43            105.78              3973
>                 1
>
> 98%             0.00             42.51            126.93              3973
>                 1
>
> 99%             0.00             51.01            126.93              3973
>                 1
>
> Min             0.00              3.97             17.09                61
>
>
> Max             0.00             73.46            126.93             11864
>                 1
>
>
>
> We are kinda stuck here to identify, what could be causing this un-balance.
>
>
>
> On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <
> jgalbraith@newrelic.com. INVALID> wrote:
>
>
>
>
>
> >If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
> I'm not sure about that, is it possible you're writing more new partitions
> to some nodes even though each node owns the same number of tokens?
>
> [image: Image removed by sender.]
>
>
>
> On Mon, Jun 18, 2018 at 6:07 PM, learner dba <cassandradba@yahoo.com.
> invalid <ca...@yahoo.com.invalid>> wrote:
>
> Hi Sean,
>
>
>
> Are you using any rack aware topology? --> we are using gossip file
>
> Are you using any rack aware topology? --> we are using gossip file
>
>  What are your partition keys? --> Partition key is uniq
>
> Is it possible that your partition keys do not divide up as cleanly as you
> would like across the cluster because the data is not evenly distributed
> (by partition key)?  --> No, we verified it.
>
>
>
> If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
>
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
>
>
>
>
>
>
>
>
> On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
>
>
>
> Are you using any rack aware topology? What are your partition keys? Is it
> possible that your partition keys do not divide up as cleanly as you would
> like across the cluster because the data is not evenly distributed (by
> partition key)?
>
>
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> MTC 2250
>
> #cassandra - for the latest news and updates
>
>
>
> *From:* learner dba <cassandradba@yahoo.com. INVALID>
> *Sent:* Monday, June 18, 2018 2:06 PM
> *To:* User cassandra.apache.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=8q4p6nWedWQJ9gpXCnoa6KR4HRmSf3B1whdYKNFub6M&s=TmzIaVextVyZy81p9JuU7R6PFv84RfhgtEezCe063V0&e=>
> <us...@cassandra.apache.org>
> *Subject:* [EXTERNAL] Cluster is unbalanced
>
>
>
> Hi,
>
>
>
> Data volume varies a lot in our two DC cluster:
>
>  Load       Tokens       Owns
>
>  20.01 GiB  256          ?
>
>  65.32 GiB  256          ?
>
>  60.09 GiB  256          ?
>
>  46.95 GiB  256          ?
>
>  50.73 GiB  256          ?
>
> kaiprodv2
>
> =========
>
> /Leaving/Joining/Moving
>
>  Load       Tokens       Owns
>
>  25.19 GiB  256          ?
>
>  30.26 GiB  256          ?
>
>  9.82 GiB   256          ?
>
>  20.54 GiB  256          ?
>
>  9.7 GiB    256          ?
>
>
>
> I ran clearsnapshot, garbagecollect and cleanup, but it increased the size
> on heavier nodes instead of decreasing. Based on nodetool cfstats, I can
> see partition keys on each node varies a lot:
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
> How can I diagnose this imbalance further?
>
>
>
>
>
>
>
> --
>
> *Joshua Galbraith *| Senior Software Engineer | New Relic
> C: 907-209-1208 | jgalbraith@ newrelic.com <jg...@newrelic.com>
>
>
>
>
> --
> *Joshua Galbraith *| Senior Software Engineer | New Relic
>
>
>
>
> --
> *Joshua Galbraith *| Senior Software Engineer | New Relic
>



-- 
*Joshua Galbraith *| Senior Software Engineer | New Relic

Re: RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by learner dba <ca...@yahoo.com.INVALID>.

Hi Joshua,
Okay, that string appears to be a base64-encoded version 4 UUID. Why not use Cassandra's UUID data type to store that directly rather than storing the longer base64 string as text?  --> It's an old application and the person who coded it, has left the company.What does the UUID represent? --> Unique account id.Is it identifying a unique product, an image, or some other type of object? --> yesWhen and how is the underlying UUID being generated by the application? --> Not sure about it. 

I assume you're using the default partitioner, but just in case, can you confirm which partitioner you're using in your cassandra.yaml file (e.g. Murmer3, Random, ByteOrdered)? --> partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Mentioned Jiras are from much older version than ours "3.11.2"; Also, our partition keys are not distributed evenly as I had pasted output earlier. Which means none of the Jiras apply in our case :(

    On Wednesday, June 20, 2018, 12:18:28 PM EDT, Joshua Galbraith <jg...@newrelic.com.INVALID> wrote:  

 Okay, that string appears to be a base64-encoded version 4 UUID. Why not use Cassandra's UUID data type to store that directly rather than storing the longer base64 string as text? What does the UUID represent? Is it identifying a unique product, an image, or some other type of object? When and how is the underlying UUID being generated by the application?

I assume you're using the default partitioner, but just in case, can you confirm which partitioner you're using in your cassandra.yaml file (e.g. Murmer3, Random, ByteOrdered)?

Also, please have a look at these two issues and verify you're not experiencing either:

* https://issues.apache.org/jira/browse/CASSANDRA-7032
* https://issues.apache.org/jira/browse/CASSANDRA-10430

On Wed, Jun 20, 2018 at 9:59 AM, learner dba <ca...@yahoo.com.invalid> wrote:

 Partition key has value as: 
MWY4MmI0MTQtYTk2YS00YmRjLTkxND MtOWU0MjM1OWU2NzUy other column is blob.

    On Tuesday, June 19, 2018, 6:07:59 PM EDT, Joshua Galbraith <jgalbraith@newrelic.com. INVALID> wrote:  

 > id text PRIMARY KEY

What values are written to this id field? Can you give us some examples or explain the general use case?
On Tue, Jun 19, 2018 at 1:18 PM, learner dba <cassandradba@yahoo.com. invalid> wrote:

 Hi Sean,
Here is create table:

CREATE TABLE ks.cf (

    id text PRIMARY KEY,

    accessdata blob

) WITH bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db. compaction. SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io. compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';
Nodetool status: 
Datacenter: dc1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/ Moving

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack

UN  xxxxx   20.66 GiB  256          61.4%             f4f54949-83c9-419b-9a43- cb630b36d8c2  RAC1

UN  xxxxx  65.77 GiB  256          59.3%             3db430ae-45ef-4746-a273- bc1f66ac8981  RAC1

UN  xxxxxx  60.58 GiB  256          58.4%             1f23e869-1823-4b75-8d3e- f9b32acba9a6  RAC1

UN  xxxxx  47.08 GiB  256          57.5%             7aca9a36-823f-4185-be44- c1464a799084  RAC1

UN  xxxxx  51.47 GiB  256          63.4%             18cff010-9b83-4cf8-9dc2- f05ac63df402  RAC1

Datacenter: dc2

========================

Status=Up/Down

|/ State=Normal/Leaving/Joining/ Moving

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack

UN  xxxx   24.37 GiB  256          59.5%             1b694180-210a-4b75-8f2a- 748f4a5b6a3d  RAC1

UN  xxxxx 30.76 GiB  256          56.7%             597bac04-c57a-4487-8924- 72e171e45514  RAC1

UN  xxxx  10.73 GiB  256          63.9%             6e7e474e-e292-4433-afd4- 372d30e0f3e1  RAC1

UN  xxxxxx 19.77 GiB  256          61.5%             58751418-7b76-40f7-8b8f- a5bf8fe7d9a2  RAC1

UN  xxxxx  10.33 GiB  256          58.4%             6d58d006-2095-449c-8c67- 50e8cbdfe7a7  RAC1

cassandra-rackdc.properties:

dc=dc1
rack=RAC1 --> same in all nodes
cassandra.yaml:
num_tokens: 256
endpoint_snitch: GossipingPropertyFileSnitch
I can see cassandra-topology.properties, I believe it shouldn't be there with GossipPropertyFileSnitch. Can this file be causing any trouble in data distribution.

cat /opt/cassandra/conf/cassandra- topology.properties 

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/ licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Cassandra Node IP=Data Center:Rack

192.168.1.100=DC1:RAC1

192.168.2.200=DC2:RAC2

10.0.0.10=DC1:RAC1

10.0.0.11=DC1:RAC1

10.0.0.12=DC1:RAC2

10.20.114.10=DC2:RAC1

10.20.114.11=DC2:RAC1

10.21.119.13=DC3:RAC1

10.21.119.10=DC3:RAC1

10.0.0.13=DC1:RAC2

10.21.119.14=DC3:RAC2

10.20.114.15=DC2:RAC2

# default for unknown nodes

default=DC1:r1

# Native IPv6 is supported, however you must escape the colon in the IPv6 Address

# Also be sure to comment out JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack= true"

# in cassandra-env.sh

fe80\:0\:0\:0\:202\:b3ff\: fe1e\:8329=DC1:RAC3

    On Tuesday, June 19, 2018, 12:51:34 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:  

You are correct that the cluster decides where data goes (based on the hash of the partition key). However, if you choose a “bad” partition key, you may not get good distribution of the data, because the hash is deterministic (it always goes to the same nodes/replicas). For example, if you have a partition key of a datetime, it is possible that there is more data written for a certain time period – thus a larger partition and an imbalance across the cluster. Choosing a “good” partition key is one of the most important decisions for a Cassandra table.

Also, I have seen the use of racks in the topology cause an imbalance in the “first” node of the rack.

To help you more, we would need the create table statement(s) for your keyspace and the topology of the cluster (like with nodetool status).

Sean Durity

From: learner dba <cassandradba@yahoo.com. INVALID>
Sent: Tuesday, June 19, 2018 9:50 AM
To: user@cassandra.apache.org
Subject: Re: RE: [EXTERNAL] Cluster is unbalanced

We do not chose the node where partition will go. I thought it is snitch's role to chose replica nodes. Even the partition size does not vary on our largest column family:

Percentile SSTables   Write Latency     Read Latency   Partition Size       Cell Count

                             (micros)         (micros)         (bytes)                 

50%           0.00           17.08           61.21             3311               1

75%           0.00           20.50           88.15             3973               1

95%           0.00           35.43           105.78             3973               1

98%           0.00           42.51           126.93             3973               1

99%           0.00           51.01           126.93             3973               1

Min           0.00             3.97           17.09               61               

Max           0.00           73.46           126.93           11864               1

We are kinda stuck here to identify, what could be causing this un-balance.

On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <jgalbraith@newrelic.com. INVALID> wrote:

>If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

I'm not sure about that, is it possible you're writing more new partitions to some nodes even though each node owns the same number of tokens?

On Mon, Jun 18, 2018 at 6:07 PM, learner dba <cassandradba@yahoo.com. invalid> wrote:

Hi Sean,

Are you using any rack aware topology? --> we are using gossip file

Are you using any rack aware topology? --> we are using gossip file

 What are your partition keys? --> Partition key is uniq

Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?  --> No, we verified it.

If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992

Number of partitions (estimate): 15839280

On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:

Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

MTC 2250

#cassandra - for the latest news and updates

From: learner dba <cassandradba@yahoo.com. INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Cluster is unbalanced

Hi,

Data volume varies a lot in our two DC cluster:

 Load     Tokens     Owns 

 20.01 GiB 256          ?     

 65.32 GiB 256          ?     

 60.09 GiB 256          ?     

 46.95 GiB 256          ?     

 50.73 GiB 256          ?     

kaiprodv2

=========

/Leaving/Joining/Moving

 Load     Tokens     Owns 

 25.19 GiB 256          ?     

 30.26 GiB 256          ?     

 9.82 GiB  256          ?     

 20.54 GiB 256          ?     

 9.7 GiB   256          ?     

I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:

Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992

Number of partitions (estimate): 15839280

How can I diagnose this imbalance further?

--

Joshua Galbraith | Senior Software Engineer | New Relic
C: 907-209-1208 | jgalbraith@ newrelic.com

-- 
Joshua Galbraith | Senior Software Engineer | New Relic

-- 
Joshua Galbraith | Senior Software Engineer | New Relic

Re: RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by Joshua Galbraith <jg...@newrelic.com.INVALID>.

Okay, that string appears to be a base64-encoded version 4 UUID. Why not
use Cassandra's UUID data type to store that directly rather than storing
the longer base64 string as text? What does the UUID represent? Is it
identifying a unique product, an image, or some other type of object? When
and how is the underlying UUID being generated by the application?

I assume you're using the default partitioner, but just in case, can you
confirm which partitioner you're using in your cassandra.yaml file (e.g.
Murmer3, Random, ByteOrdered)?

Also, please have a look at these two issues and verify you're not
experiencing either:

* https://issues.apache.org/jira/browse/CASSANDRA-7032
* https://issues.apache.org/jira/browse/CASSANDRA-10430


On Wed, Jun 20, 2018 at 9:59 AM, learner dba <cassandradba@yahoo.com.invalid
> wrote:

> Partition key has value as:
>
> MWY4MmI0MTQtYTk2YS00YmRjLTkxNDMtOWU0MjM1OWU2NzUy other column is blob.
>
> On Tuesday, June 19, 2018, 6:07:59 PM EDT, Joshua Galbraith <
> jgalbraith@newrelic.com.INVALID> wrote:
>
>
> > id text PRIMARY KEY
>
> What values are written to this id field? Can you give us some examples or
> explain the general use case?
>
> On Tue, Jun 19, 2018 at 1:18 PM, learner dba <cassandradba@yahoo.com.
> invalid> wrote:
>
> Hi Sean,
>
> Here is create table:
>
> CREATE TABLE ks.cf (
>
>     id text PRIMARY KEY,
>
>     accessdata blob
>
> ) WITH bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class': 'org.apache.cassandra.db. compaction.
> SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io. compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 0
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
> Nodetool status:
>
> Datacenter: dc1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/ Moving
>
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                       Rack
>
> UN  xxxxx   20.66 GiB  256          61.4%             f4f54949-83c9-419b-9a43-
> cb630b36d8c2  RAC1
>
> UN  xxxxx  65.77 GiB  256          59.3%             3db430ae-45ef-4746-a273-
> bc1f66ac8981  RAC1
>
> UN  xxxxxx  60.58 GiB  256          58.4%             1f23e869-1823-4b75-8d3e-
> f9b32acba9a6  RAC1
>
> UN  xxxxx  47.08 GiB  256          57.5%             7aca9a36-823f-4185-be44-
> c1464a799084  RAC1
>
> UN  xxxxx  51.47 GiB  256          63.4%             18cff010-9b83-4cf8-9dc2-
> f05ac63df402  RAC1
>
> Datacenter: dc2
>
> ========================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/ Moving
>
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                       Rack
>
> UN  xxxx   24.37 GiB  256          59.5%             1b694180-210a-4b75-8f2a-
> 748f4a5b6a3d  RAC1
>
> UN  xxxxx 30.76 GiB  256          56.7%             597bac04-c57a-4487-8924-
> 72e171e45514  RAC1
>
> UN  xxxx  10.73 GiB  256          63.9%             6e7e474e-e292-4433-afd4-
> 372d30e0f3e1  RAC1
>
> UN  xxxxxx 19.77 GiB  256          61.5%             58751418-7b76-40f7-8b8f-
> a5bf8fe7d9a2  RAC1
>
> UN  xxxxx  10.33 GiB  256          58.4%             6d58d006-2095-449c-8c67-
> 50e8cbdfe7a7  RAC1
>
>
> cassandra-rackdc.properties:
>
> dc=dc1
> rack=RAC1 --> same in all nodes
>
> cassandra.yaml:
> num_tokens: 256
>
> endpoint_snitch: GossipingPropertyFileSnitch
> I can see cassandra-topology.properties, I believe it shouldn't be there
> with GossipPropertyFileSnitch. Can this file be causing any trouble in data
> distribution.
>
> cat /opt/cassandra/conf/cassandra- topology.properties
>
> # Licensed to the Apache Software Foundation (ASF) under one
>
> # or more contributor license agreements.  See the NOTICE file
>
> # distributed with this work for additional information
>
> # regarding copyright ownership.  The ASF licenses this file
>
> # to you under the Apache License, Version 2.0 (the
>
> # "License"); you may not use this file except in compliance
>
> # with the License.  You may obtain a copy of the License at
>
> #
>
> #     http://www.apache.org/ licenses/LICENSE-2.0
> <http://www.apache.org/licenses/LICENSE-2.0>
>
> #
>
> # Unless required by applicable law or agreed to in writing, software
>
> # distributed under the License is distributed on an "AS IS" BASIS,
>
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>
> # See the License for the specific language governing permissions and
>
> # limitations under the License.
>
>
> # Cassandra Node IP=Data Center:Rack
>
> 192.168.1.100=DC1:RAC1
>
> 192.168.2.200=DC2:RAC2
>
>
> 10.0.0.10=DC1:RAC1
>
> 10.0.0.11=DC1:RAC1
>
> 10.0.0.12=DC1:RAC2
>
>
> 10.20.114.10=DC2:RAC1
>
> 10.20.114.11=DC2:RAC1
>
>
> 10.21.119.13=DC3:RAC1
>
> 10.21.119.10=DC3:RAC1
>
>
> 10.0.0.13=DC1:RAC2
>
> 10.21.119.14=DC3:RAC2
>
> 10.20.114.15=DC2:RAC2
>
>
> # default for unknown nodes
>
> default=DC1:r1
>
>
> # Native IPv6 is supported, however you must escape the colon in the IPv6
> Address
>
> # Also be sure to comment out JVM_OPTS="$JVM_OPTS
> -Djava.net.preferIPv4Stack= true"
>
> # in cassandra-env.sh
>
> fe80\:0\:0\:0\:202\:b3ff\: fe1e\:8329=DC1:RAC3
>
>
>
>
> On Tuesday, June 19, 2018, 12:51:34 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
> You are correct that the cluster decides where data goes (based on the
> hash of the partition key). However, if you choose a “bad” partition key,
> you may not get good distribution of the data, because the hash is
> deterministic (it always goes to the same nodes/replicas). For example, if
> you have a partition key of a datetime, it is possible that there is more
> data written for a certain time period – thus a larger partition and an
> imbalance across the cluster. Choosing a “good” partition key is one of the
> most important decisions for a Cassandra table.
>
>
>
> Also, I have seen the use of racks in the topology cause an imbalance in
> the “first” node of the rack.
>
>
>
> To help you more, we would need the create table statement(s) for your
> keyspace and the topology of the cluster (like with nodetool status).
>
>
>
>
>
> Sean Durity
>
> *From:* learner dba <cassandradba@yahoo.com. INVALID>
> *Sent:* Tuesday, June 19, 2018 9:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: RE: [EXTERNAL] Cluster is unbalanced
>
>
>
> We do not chose the node where partition will go. I thought it is snitch's
> role to chose replica nodes. Even the partition size does not vary on our
> largest column family:
>
> Percentile  SSTables     Write Latency      Read Latency    Partition Size
>       Cell Count
>
>                               (micros)          (micros)           (bytes)
>
>
> 50%             0.00             17.08             61.21              3311
>                 1
>
> 75%             0.00             20.50             88.15              3973
>                 1
>
> 95%             0.00             35.43            105.78              3973
>                 1
>
> 98%             0.00             42.51            126.93              3973
>                 1
>
> 99%             0.00             51.01            126.93              3973
>                 1
>
> Min             0.00              3.97             17.09                61
>
>
> Max             0.00             73.46            126.93             11864
>                 1
>
>
>
> We are kinda stuck here to identify, what could be causing this un-balance.
>
>
>
> On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <
> jgalbraith@newrelic.com. INVALID> wrote:
>
>
>
>
>
> >If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
> I'm not sure about that, is it possible you're writing more new partitions
> to some nodes even though each node owns the same number of tokens?
>
> [image: Image removed by sender.]
>
>
>
> On Mon, Jun 18, 2018 at 6:07 PM, learner dba <cassandradba@yahoo.com.
> invalid <ca...@yahoo.com.invalid>> wrote:
>
> Hi Sean,
>
>
>
> Are you using any rack aware topology? --> we are using gossip file
>
> Are you using any rack aware topology? --> we are using gossip file
>
>  What are your partition keys? --> Partition key is uniq
>
> Is it possible that your partition keys do not divide up as cleanly as you
> would like across the cluster because the data is not evenly distributed
> (by partition key)?  --> No, we verified it.
>
>
>
> If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
>
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
>
>
>
>
>
>
>
>
> On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
>
>
>
> Are you using any rack aware topology? What are your partition keys? Is it
> possible that your partition keys do not divide up as cleanly as you would
> like across the cluster because the data is not evenly distributed (by
> partition key)?
>
>
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> MTC 2250
>
> #cassandra - for the latest news and updates
>
>
>
> *From:* learner dba <cassandradba@yahoo.com. INVALID>
> *Sent:* Monday, June 18, 2018 2:06 PM
> *To:* User cassandra.apache.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=8q4p6nWedWQJ9gpXCnoa6KR4HRmSf3B1whdYKNFub6M&s=TmzIaVextVyZy81p9JuU7R6PFv84RfhgtEezCe063V0&e=>
> <us...@cassandra.apache.org>
> *Subject:* [EXTERNAL] Cluster is unbalanced
>
>
>
> Hi,
>
>
>
> Data volume varies a lot in our two DC cluster:
>
>  Load       Tokens       Owns
>
>  20.01 GiB  256          ?
>
>  65.32 GiB  256          ?
>
>  60.09 GiB  256          ?
>
>  46.95 GiB  256          ?
>
>  50.73 GiB  256          ?
>
> kaiprodv2
>
> =========
>
> /Leaving/Joining/Moving
>
>  Load       Tokens       Owns
>
>  25.19 GiB  256          ?
>
>  30.26 GiB  256          ?
>
>  9.82 GiB   256          ?
>
>  20.54 GiB  256          ?
>
>  9.7 GiB    256          ?
>
>
>
> I ran clearsnapshot, garbagecollect and cleanup, but it increased the size
> on heavier nodes instead of decreasing. Based on nodetool cfstats, I can
> see partition keys on each node varies a lot:
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
> How can I diagnose this imbalance further?
>
>
>
>
>
>
>
> --
>
> *Joshua Galbraith *| Senior Software Engineer | New Relic
> C: 907-209-1208 | jgalbraith@ newrelic.com <jg...@newrelic.com>
>
>
>
>
> --
> *Joshua Galbraith *| Senior Software Engineer | New Relic
>



-- 
*Joshua Galbraith *| Senior Software Engineer | New Relic

Re: RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by learner dba <ca...@yahoo.com.INVALID>.

 Partition key has value as: 
MWY4MmI0MTQtYTk2YS00YmRjLTkxNDMtOWU0MjM1OWU2NzUy other column is blob.

    On Tuesday, June 19, 2018, 6:07:59 PM EDT, Joshua Galbraith <jg...@newrelic.com.INVALID> wrote:  

 > id text PRIMARY KEY

What values are written to this id field? Can you give us some examples or explain the general use case?
On Tue, Jun 19, 2018 at 1:18 PM, learner dba <ca...@yahoo.com.invalid> wrote:

 Hi Sean,
Here is create table:

CREATE TABLE ks.cf (

    id text PRIMARY KEY,

    accessdata blob

) WITH bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db. compaction. SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io. compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';
Nodetool status: 
Datacenter: dc1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/ Moving

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack

UN  xxxxx   20.66 GiB  256          61.4%             f4f54949-83c9-419b-9a43- cb630b36d8c2  RAC1

UN  xxxxx  65.77 GiB  256          59.3%             3db430ae-45ef-4746-a273- bc1f66ac8981  RAC1

UN  xxxxxx  60.58 GiB  256          58.4%             1f23e869-1823-4b75-8d3e- f9b32acba9a6  RAC1

UN  xxxxx  47.08 GiB  256          57.5%             7aca9a36-823f-4185-be44- c1464a799084  RAC1

UN  xxxxx  51.47 GiB  256          63.4%             18cff010-9b83-4cf8-9dc2- f05ac63df402  RAC1

Datacenter: dc2

========================

Status=Up/Down

|/ State=Normal/Leaving/Joining/ Moving

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack

UN  xxxx   24.37 GiB  256          59.5%             1b694180-210a-4b75-8f2a- 748f4a5b6a3d  RAC1

UN  xxxxx 30.76 GiB  256          56.7%             597bac04-c57a-4487-8924- 72e171e45514  RAC1

UN  xxxx  10.73 GiB  256          63.9%             6e7e474e-e292-4433-afd4- 372d30e0f3e1  RAC1

UN  xxxxxx 19.77 GiB  256          61.5%             58751418-7b76-40f7-8b8f- a5bf8fe7d9a2  RAC1

UN  xxxxx  10.33 GiB  256          58.4%             6d58d006-2095-449c-8c67- 50e8cbdfe7a7  RAC1

cassandra-rackdc.properties:

dc=dc1
rack=RAC1 --> same in all nodes
cassandra.yaml:
num_tokens: 256
endpoint_snitch: GossipingPropertyFileSnitch
I can see cassandra-topology.properties, I believe it shouldn't be there with GossipPropertyFileSnitch. Can this file be causing any trouble in data distribution.

cat /opt/cassandra/conf/cassandra- topology.properties 

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/ licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Cassandra Node IP=Data Center:Rack

192.168.1.100=DC1:RAC1

192.168.2.200=DC2:RAC2

10.0.0.10=DC1:RAC1

10.0.0.11=DC1:RAC1

10.0.0.12=DC1:RAC2

10.20.114.10=DC2:RAC1

10.20.114.11=DC2:RAC1

10.21.119.13=DC3:RAC1

10.21.119.10=DC3:RAC1

10.0.0.13=DC1:RAC2

10.21.119.14=DC3:RAC2

10.20.114.15=DC2:RAC2

# default for unknown nodes

default=DC1:r1

# Native IPv6 is supported, however you must escape the colon in the IPv6 Address

# Also be sure to comment out JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack= true"

# in cassandra-env.sh

fe80\:0\:0\:0\:202\:b3ff\: fe1e\:8329=DC1:RAC3

    On Tuesday, June 19, 2018, 12:51:34 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:  

You are correct that the cluster decides where data goes (based on the hash of the partition key). However, if you choose a “bad” partition key, you may not get good distribution of the data, because the hash is deterministic (it always goes to the same nodes/replicas). For example, if you have a partition key of a datetime, it is possible that there is more data written for a certain time period – thus a larger partition and an imbalance across the cluster. Choosing a “good” partition key is one of the most important decisions for a Cassandra table.

Also, I have seen the use of racks in the topology cause an imbalance in the “first” node of the rack.

To help you more, we would need the create table statement(s) for your keyspace and the topology of the cluster (like with nodetool status).

Sean Durity

From: learner dba <cassandradba@yahoo.com. INVALID>
Sent: Tuesday, June 19, 2018 9:50 AM
To: user@cassandra.apache.org
Subject: Re: RE: [EXTERNAL] Cluster is unbalanced

We do not chose the node where partition will go. I thought it is snitch's role to chose replica nodes. Even the partition size does not vary on our largest column family:

Percentile SSTables   Write Latency     Read Latency   Partition Size       Cell Count

                             (micros)         (micros)         (bytes)                 

50%           0.00           17.08           61.21             3311               1

75%           0.00           20.50           88.15             3973               1

95%           0.00           35.43           105.78             3973               1

98%           0.00           42.51           126.93             3973               1

99%           0.00           51.01           126.93             3973               1

Min           0.00             3.97           17.09               61               

Max           0.00           73.46           126.93           11864               1

We are kinda stuck here to identify, what could be causing this un-balance.

On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <jgalbraith@newrelic.com. INVALID> wrote:

>If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

I'm not sure about that, is it possible you're writing more new partitions to some nodes even though each node owns the same number of tokens?

On Mon, Jun 18, 2018 at 6:07 PM, learner dba <cassandradba@yahoo.com. invalid> wrote:

Hi Sean,

Are you using any rack aware topology? --> we are using gossip file

Are you using any rack aware topology? --> we are using gossip file

 What are your partition keys? --> Partition key is uniq

Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?  --> No, we verified it.

If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992

Number of partitions (estimate): 15839280

On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:

Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?

Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

MTC 2250

#cassandra - for the latest news and updates

From: learner dba <cassandradba@yahoo.com. INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Cluster is unbalanced

Hi,

Data volume varies a lot in our two DC cluster:

 Load     Tokens     Owns 

 20.01 GiB 256          ?     

 65.32 GiB 256          ?     

 60.09 GiB 256          ?     

 46.95 GiB 256          ?     

 50.73 GiB 256          ?     

kaiprodv2

=========

/Leaving/Joining/Moving

 Load     Tokens     Owns 

 25.19 GiB 256          ?     

 30.26 GiB 256          ?     

 9.82 GiB  256          ?     

 20.54 GiB 256          ?     

 9.7 GiB   256          ?     

I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:

Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992

Number of partitions (estimate): 15839280

How can I diagnose this imbalance further?

--

Joshua Galbraith | Senior Software Engineer | New Relic
C: 907-209-1208 | jgalbraith@ newrelic.com

-- 
Joshua Galbraith | Senior Software Engineer | New Relic

Re: RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by Joshua Galbraith <jg...@newrelic.com.INVALID>.

> id text PRIMARY KEY

What values are written to this id field? Can you give us some examples or
explain the general use case?

On Tue, Jun 19, 2018 at 1:18 PM, learner dba <cassandradba@yahoo.com.invalid
> wrote:

> Hi Sean,
>
> Here is create table:
>
> CREATE TABLE ks.cf (
>
>     id text PRIMARY KEY,
>
>     accessdata blob
>
> ) WITH bloom_filter_fp_chance = 0.01
>
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>
>     AND comment = ''
>
>     AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
> 'max_threshold': '32', 'min_threshold': '4'}
>
>     AND compression = {'chunk_length_in_kb': '64', 'class': '
> org.apache.cassandra.io.compress.LZ4Compressor'}
>
>     AND crc_check_chance = 1.0
>
>     AND dclocal_read_repair_chance = 0.1
>
>     AND default_time_to_live = 0
>
>     AND gc_grace_seconds = 864000
>
>     AND max_index_interval = 2048
>
>     AND memtable_flush_period_in_ms = 0
>
>     AND min_index_interval = 128
>
>     AND read_repair_chance = 0.0
>
>     AND speculative_retry = '99PERCENTILE';
> Nodetool status:
>
> Datacenter: dc1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                       Rack
>
> UN  xxxxx   20.66 GiB  256          61.4%
> f4f54949-83c9-419b-9a43-cb630b36d8c2  RAC1
>
> UN  xxxxx  65.77 GiB  256          59.3%
> 3db430ae-45ef-4746-a273-bc1f66ac8981  RAC1
>
> UN  xxxxxx  60.58 GiB  256          58.4%
> 1f23e869-1823-4b75-8d3e-f9b32acba9a6  RAC1
>
> UN  xxxxx  47.08 GiB  256          57.5%
> 7aca9a36-823f-4185-be44-c1464a799084  RAC1
>
> UN  xxxxx  51.47 GiB  256          63.4%
> 18cff010-9b83-4cf8-9dc2-f05ac63df402  RAC1
>
> Datacenter: dc2
>
> ========================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address     Load       Tokens       Owns (effective)  Host ID
>                       Rack
>
> UN  xxxx   24.37 GiB  256          59.5%
> 1b694180-210a-4b75-8f2a-748f4a5b6a3d  RAC1
>
> UN  xxxxx 30.76 GiB  256          56.7%
> 597bac04-c57a-4487-8924-72e171e45514  RAC1
>
> UN  xxxx  10.73 GiB  256          63.9%
> 6e7e474e-e292-4433-afd4-372d30e0f3e1  RAC1
>
> UN  xxxxxx 19.77 GiB  256          61.5%
> 58751418-7b76-40f7-8b8f-a5bf8fe7d9a2  RAC1
>
> UN  xxxxx  10.33 GiB  256          58.4%
> 6d58d006-2095-449c-8c67-50e8cbdfe7a7  RAC1
>
>
> cassandra-rackdc.properties:
>
> dc=dc1
> rack=RAC1 --> same in all nodes
>
> cassandra.yaml:
> num_tokens: 256
>
> endpoint_snitch: GossipingPropertyFileSnitch
> I can see cassandra-topology.properties, I believe it shouldn't be there
> with GossipPropertyFileSnitch. Can this file be causing any trouble in data
> distribution.
>
> cat /opt/cassandra/conf/cassandra-topology.properties
>
> # Licensed to the Apache Software Foundation (ASF) under one
>
> # or more contributor license agreements.  See the NOTICE file
>
> # distributed with this work for additional information
>
> # regarding copyright ownership.  The ASF licenses this file
>
> # to you under the Apache License, Version 2.0 (the
>
> # "License"); you may not use this file except in compliance
>
> # with the License.  You may obtain a copy of the License at
>
> #
>
> #     http://www.apache.org/licenses/LICENSE-2.0
>
> #
>
> # Unless required by applicable law or agreed to in writing, software
>
> # distributed under the License is distributed on an "AS IS" BASIS,
>
> # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
>
> # See the License for the specific language governing permissions and
>
> # limitations under the License.
>
>
> # Cassandra Node IP=Data Center:Rack
>
> 192.168.1.100=DC1:RAC1
>
> 192.168.2.200=DC2:RAC2
>
>
> 10.0.0.10=DC1:RAC1
>
> 10.0.0.11=DC1:RAC1
>
> 10.0.0.12=DC1:RAC2
>
>
> 10.20.114.10=DC2:RAC1
>
> 10.20.114.11=DC2:RAC1
>
>
> 10.21.119.13=DC3:RAC1
>
> 10.21.119.10=DC3:RAC1
>
>
> 10.0.0.13=DC1:RAC2
>
> 10.21.119.14=DC3:RAC2
>
> 10.20.114.15=DC2:RAC2
>
>
> # default for unknown nodes
>
> default=DC1:r1
>
>
> # Native IPv6 is supported, however you must escape the colon in the IPv6
> Address
>
> # Also be sure to comment out JVM_OPTS="$JVM_OPTS
> -Djava.net.preferIPv4Stack=true"
>
> # in cassandra-env.sh
>
> fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3
>
>
>
>
> On Tuesday, June 19, 2018, 12:51:34 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
> You are correct that the cluster decides where data goes (based on the
> hash of the partition key). However, if you choose a “bad” partition key,
> you may not get good distribution of the data, because the hash is
> deterministic (it always goes to the same nodes/replicas). For example, if
> you have a partition key of a datetime, it is possible that there is more
> data written for a certain time period – thus a larger partition and an
> imbalance across the cluster. Choosing a “good” partition key is one of the
> most important decisions for a Cassandra table.
>
>
>
> Also, I have seen the use of racks in the topology cause an imbalance in
> the “first” node of the rack.
>
>
>
> To help you more, we would need the create table statement(s) for your
> keyspace and the topology of the cluster (like with nodetool status).
>
>
>
>
>
> Sean Durity
>
> *From:* learner dba <ca...@yahoo.com.INVALID>
> *Sent:* Tuesday, June 19, 2018 9:50 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: RE: [EXTERNAL] Cluster is unbalanced
>
>
>
> We do not chose the node where partition will go. I thought it is snitch's
> role to chose replica nodes. Even the partition size does not vary on our
> largest column family:
>
> Percentile  SSTables     Write Latency      Read Latency    Partition Size
>       Cell Count
>
>                               (micros)          (micros)           (bytes)
>
>
> 50%             0.00             17.08             61.21              3311
>                 1
>
> 75%             0.00             20.50             88.15              3973
>                 1
>
> 95%             0.00             35.43            105.78              3973
>                 1
>
> 98%             0.00             42.51            126.93              3973
>                 1
>
> 99%             0.00             51.01            126.93              3973
>                 1
>
> Min             0.00              3.97             17.09                61
>
>
> Max             0.00             73.46            126.93             11864
>                 1
>
>
>
> We are kinda stuck here to identify, what could be causing this un-balance.
>
>
>
> On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <
> jgalbraith@newrelic.com.INVALID> wrote:
>
>
>
>
>
> >If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
> I'm not sure about that, is it possible you're writing more new partitions
> to some nodes even though each node owns the same number of tokens?
>
> [image: Image removed by sender.]
>
>
>
> On Mon, Jun 18, 2018 at 6:07 PM, learner dba <cassandradba@yahoo.com.
> invalid> wrote:
>
> Hi Sean,
>
>
>
> Are you using any rack aware topology? --> we are using gossip file
>
> Are you using any rack aware topology? --> we are using gossip file
>
>  What are your partition keys? --> Partition key is uniq
>
> Is it possible that your partition keys do not divide up as cleanly as you
> would like across the cluster because the data is not evenly distributed
> (by partition key)?  --> No, we verified it.
>
>
>
> If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
>
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
>
>
>
>
>
>
>
>
> On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
>
>
>
> Are you using any rack aware topology? What are your partition keys? Is it
> possible that your partition keys do not divide up as cleanly as you would
> like across the cluster because the data is not evenly distributed (by
> partition key)?
>
>
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> MTC 2250
>
> #cassandra - for the latest news and updates
>
>
>
> *From:* learner dba <cassandradba@yahoo.com. INVALID>
> *Sent:* Monday, June 18, 2018 2:06 PM
> *To:* User cassandra.apache.org
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=8q4p6nWedWQJ9gpXCnoa6KR4HRmSf3B1whdYKNFub6M&s=TmzIaVextVyZy81p9JuU7R6PFv84RfhgtEezCe063V0&e=>
> <us...@cassandra.apache.org>
> *Subject:* [EXTERNAL] Cluster is unbalanced
>
>
>
> Hi,
>
>
>
> Data volume varies a lot in our two DC cluster:
>
>  Load       Tokens       Owns
>
>  20.01 GiB  256          ?
>
>  65.32 GiB  256          ?
>
>  60.09 GiB  256          ?
>
>  46.95 GiB  256          ?
>
>  50.73 GiB  256          ?
>
> kaiprodv2
>
> =========
>
> /Leaving/Joining/Moving
>
>  Load       Tokens       Owns
>
>  25.19 GiB  256          ?
>
>  30.26 GiB  256          ?
>
>  9.82 GiB   256          ?
>
>  20.54 GiB  256          ?
>
>  9.7 GiB    256          ?
>
>
>
> I ran clearsnapshot, garbagecollect and cleanup, but it increased the size
> on heavier nodes instead of decreasing. Based on nodetool cfstats, I can
> see partition keys on each node varies a lot:
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
> How can I diagnose this imbalance further?
>
>
>
>
>
>
>
> --
>
> *Joshua Galbraith *| Senior Software Engineer | New Relic
> C: 907-209-1208 | jgalbraith@newrelic.com
>



-- 
*Joshua Galbraith *| Senior Software Engineer | New Relic

Re: RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by learner dba <ca...@yahoo.com.INVALID>.

 Hi Sean,
Here is create table:

CREATE TABLE ks.cf (

    id text PRIMARY KEY,

    accessdata blob

) WITH bloom_filter_fp_chance = 0.01

    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

    AND comment = ''

    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

    AND crc_check_chance = 1.0

    AND dclocal_read_repair_chance = 0.1

    AND default_time_to_live = 0

    AND gc_grace_seconds = 864000

    AND max_index_interval = 2048

    AND memtable_flush_period_in_ms = 0

    AND min_index_interval = 128

    AND read_repair_chance = 0.0

    AND speculative_retry = '99PERCENTILE';
Nodetool status: 
Datacenter: dc1

=======================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack

UN  xxxxx   20.66 GiB  256          61.4%             f4f54949-83c9-419b-9a43-cb630b36d8c2  RAC1

UN  xxxxx  65.77 GiB  256          59.3%             3db430ae-45ef-4746-a273-bc1f66ac8981  RAC1

UN  xxxxxx  60.58 GiB  256          58.4%             1f23e869-1823-4b75-8d3e-f9b32acba9a6  RAC1

UN  xxxxx  47.08 GiB  256          57.5%             7aca9a36-823f-4185-be44-c1464a799084  RAC1

UN  xxxxx  51.47 GiB  256          63.4%             18cff010-9b83-4cf8-9dc2-f05ac63df402  RAC1

Datacenter: dc2

========================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack

UN  xxxx   24.37 GiB  256          59.5%             1b694180-210a-4b75-8f2a-748f4a5b6a3d  RAC1

UN  xxxxx 30.76 GiB  256          56.7%             597bac04-c57a-4487-8924-72e171e45514  RAC1

UN  xxxx  10.73 GiB  256          63.9%             6e7e474e-e292-4433-afd4-372d30e0f3e1  RAC1

UN  xxxxxx 19.77 GiB  256          61.5%             58751418-7b76-40f7-8b8f-a5bf8fe7d9a2  RAC1

UN  xxxxx  10.33 GiB  256          58.4%             6d58d006-2095-449c-8c67-50e8cbdfe7a7  RAC1


cassandra-rackdc.properties:

dc=dc1
rack=RAC1 --> same in all nodes
cassandra.yaml:
num_tokens: 256
endpoint_snitch: GossipingPropertyFileSnitch
I can see cassandra-topology.properties, I believe it shouldn't be there with GossipPropertyFileSnitch. Can this file be causing any trouble in data distribution.

cat /opt/cassandra/conf/cassandra-topology.properties 

# Licensed to the Apache Software Foundation (ASF) under one

# or more contributor license agreements.  See the NOTICE file

# distributed with this work for additional information

# regarding copyright ownership.  The ASF licenses this file

# to you under the Apache License, Version 2.0 (the

# "License"); you may not use this file except in compliance

# with the License.  You may obtain a copy of the License at

#

#     http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.




# Cassandra Node IP=Data Center:Rack

192.168.1.100=DC1:RAC1

192.168.2.200=DC2:RAC2




10.0.0.10=DC1:RAC1

10.0.0.11=DC1:RAC1

10.0.0.12=DC1:RAC2




10.20.114.10=DC2:RAC1

10.20.114.11=DC2:RAC1




10.21.119.13=DC3:RAC1

10.21.119.10=DC3:RAC1




10.0.0.13=DC1:RAC2

10.21.119.14=DC3:RAC2

10.20.114.15=DC2:RAC2




# default for unknown nodes

default=DC1:r1




# Native IPv6 is supported, however you must escape the colon in the IPv6 Address

# Also be sure to comment out JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"

# in cassandra-env.sh

fe80\:0\:0\:0\:202\:b3ff\:fe1e\:8329=DC1:RAC3






    On Tuesday, June 19, 2018, 12:51:34 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:  
 
  <!--#yiv9256601716 _filtered #yiv9256601716 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv9256601716 {font-family:"Cambria Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv9256601716 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv9256601716 {font-family:Verdana;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv9256601716 #yiv9256601716 p.yiv9256601716MsoNormal, #yiv9256601716 li.yiv9256601716MsoNormal, #yiv9256601716 div.yiv9256601716MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv9256601716 a:link, #yiv9256601716 span.yiv9256601716MsoHyperlink {color:blue;text-decoration:underline;}#yiv9256601716 a:visited, #yiv9256601716 span.yiv9256601716MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv9256601716 p {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv9256601716 p.yiv9256601716msonormal0, #yiv9256601716 li.yiv9256601716msonormal0, #yiv9256601716 div.yiv9256601716msonormal0 {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv9256601716 p.yiv9256601716ydp6d6ff768p1, #yiv9256601716 li.yiv9256601716ydp6d6ff768p1, #yiv9256601716 div.yiv9256601716ydp6d6ff768p1 {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv9256601716 span.yiv9256601716ydp6d6ff768s1 {}#yiv9256601716 span.yiv9256601716ydp6d6ff768apple-converted-space {}#yiv9256601716 span {}#yiv9256601716 p.yiv9256601716m3058740547892170937yiv2571733579msonormal, #yiv9256601716 li.yiv9256601716m3058740547892170937yiv2571733579msonormal, #yiv9256601716 div.yiv9256601716m3058740547892170937yiv2571733579msonormal {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv9256601716 p.yiv9256601716m3058740547892170937yiv2571733579ydpe3237830p1, #yiv9256601716 li.yiv9256601716m3058740547892170937yiv2571733579ydpe3237830p1, #yiv9256601716 div.yiv9256601716m3058740547892170937yiv2571733579ydpe3237830p1 {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:"Times New Roman", serif;}#yiv9256601716 span.yiv9256601716m3058740547892170937yiv2571733579ydpe3237830apple-converted-space {}#yiv9256601716 span.yiv9256601716m3058740547892170937yiv2571733579ydpe3237830s1 {}#yiv9256601716 span.yiv9256601716EmailStyle27 {font-family:"Calibri", sans-serif;color:#1F497D;}#yiv9256601716 .yiv9256601716MsoChpDefault {font-size:10.0pt;} _filtered #yiv9256601716 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv9256601716 div.yiv9256601716WordSection1 {}-->
You are correct that the cluster decides where data goes (based on the hash of the partition key). However, if you choose a “bad” partition key, you may not get good distribution of the data, because the hash is deterministic (it always goes to the same nodes/replicas). For example, if you have a partition key of a datetime, it is possible that there is more data written for a certain time period – thus a larger partition and an imbalance across the cluster. Choosing a “good” partition key is one of the most important decisions for a Cassandra table.
 
  
 
Also, I have seen the use of racks in the topology cause an imbalance in the “first” node of the rack.
 
  
 
To help you more, we would need the create table statement(s) for your keyspace and the topology of the cluster (like with nodetool status).
 
  
 
  
 
Sean Durity
 
From: learner dba <ca...@yahoo.com.INVALID>
Sent: Tuesday, June 19, 2018 9:50 AM
To: user@cassandra.apache.org
Subject: Re: RE: [EXTERNAL] Cluster is unbalanced
 
  
 
We do not chose the node where partition will go. I thought it is snitch's role to chose replica nodes. Even the partition size does not vary on our largest column family:
 
Percentile SSTables   Write Latency     Read Latency   Partition Size       Cell Count
 
                             (micros)         (micros)         (bytes)                 
 
50%           0.00           17.08           61.21             3311               1
 
75%           0.00           20.50           88.15             3973               1
 
95%           0.00           35.43           105.78             3973               1
 
98%           0.00           42.51           126.93             3973               1
 
99%           0.00           51.01           126.93             3973               1
 
Min           0.00             3.97           17.09               61               
 
Max           0.00           73.46           126.93           11864               1
 
  
 
We are kinda stuck here to identify, what could be causing this un-balance.
 
  
 
On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <jg...@newrelic.com.INVALID> wrote:
 
  
 
  
 
>If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

I'm not sure about that, is it possible you're writing more new partitions to some nodes even though each node owns the same number of tokens?


 

 
  
 
On Mon, Jun 18, 2018 at 6:07 PM, learner dba <ca...@yahoo.com.invalid> wrote:
 

Hi Sean,
 
  
 
Are you using any rack aware topology? --> we are using gossip file
 
Are you using any rack aware topology? --> we are using gossip file
 
 What are your partition keys? --> Partition key is uniq
 
Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?  --> No, we verified it.
 
  
 
If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.
 
  
 
  
 
Number of partitions (estimate): 3142552
 
Number of partitions (estimate): 15625442
 
Number of partitions (estimate): 15244021
 
Number of partitions (estimate): 9592992
 
Number of partitions (estimate): 15839280
 
  
 
  
 
  
 
  
 
  
 
On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:
 
  
 
  
 
Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?
 
 
 
 
 
Sean Durity
 
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
 
MTC 2250
 
#cassandra - for the latest news and updates
 
 
 
From: learner dba <cassandradba@yahoo.com. INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Cluster is unbalanced
 
 
 
Hi,
 
 
 
Data volume varies a lot in our two DC cluster:
 
 Load     Tokens     Owns 
 
 20.01 GiB 256          ?     
 
 65.32 GiB 256          ?     
 
 60.09 GiB 256          ?     
 
 46.95 GiB 256          ?     
 
 50.73 GiB 256          ?     
 
kaiprodv2
 
=========
 
/Leaving/Joining/Moving
 
 Load     Tokens     Owns 
 
 25.19 GiB 256          ?     
 
 30.26 GiB 256          ?     
 
 9.82 GiB  256          ?     
 
 20.54 GiB 256          ?     
 
 9.7 GiB   256          ?     
 
 
 
I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:
 
 
 
Number of partitions (estimate): 3142552
 
Number of partitions (estimate): 15625442
 
Number of partitions (estimate): 15244021
 
Number of partitions (estimate): 9592992
 
Number of partitions (estimate): 15839280
 
 
 
How can I diagnose this imbalance further?
 
 
 




 
  
 
--
 
Joshua Galbraith | Senior Software Engineer | New Relic
C: 907-209-1208 | jgalbraith@newrelic.com

RE: RE: [EXTERNAL] Cluster is unbalanced

Posted by "Durity, Sean R" <SE...@homedepot.com>.

You are correct that the cluster decides where data goes (based on the hash of the partition key). However, if you choose a “bad” partition key, you may not get good distribution of the data, because the hash is deterministic (it always goes to the same nodes/replicas). For example, if you have a partition key of a datetime, it is possible that there is more data written for a certain time period – thus a larger partition and an imbalance across the cluster. Choosing a “good” partition key is one of the most important decisions for a Cassandra table.

Also, I have seen the use of racks in the topology cause an imbalance in the “first” node of the rack.

To help you more, we would need the create table statement(s) for your keyspace and the topology of the cluster (like with nodetool status).


Sean Durity
From: learner dba <ca...@yahoo.com.INVALID>
Sent: Tuesday, June 19, 2018 9:50 AM
To: user@cassandra.apache.org
Subject: Re: RE: [EXTERNAL] Cluster is unbalanced

We do not chose the node where partition will go. I thought it is snitch's role to chose replica nodes. Even the partition size does not vary on our largest column family:

Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count

                              (micros)          (micros)           (bytes)

50%             0.00             17.08             61.21              3311                 1

75%             0.00             20.50             88.15              3973                 1

95%             0.00             35.43            105.78              3973                 1

98%             0.00             42.51            126.93              3973                 1

99%             0.00             51.01            126.93              3973                 1

Min             0.00              3.97             17.09                61

Max             0.00             73.46            126.93             11864                 1

We are kinda stuck here to identify, what could be causing this un-balance.

On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <jg...@newrelic.com.INVALID> wrote:


>If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

I'm not sure about that, is it possible you're writing more new partitions to some nodes even though each node owns the same number of tokens?

[Image removed by sender.]

On Mon, Jun 18, 2018 at 6:07 PM, learner dba <ca...@yahoo.com.invalid>> wrote:
Hi Sean,

Are you using any rack aware topology? --> we are using gossip file
Are you using any rack aware topology? --> we are using gossip file
 What are your partition keys? --> Partition key is uniq
Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?  --> No, we verified it.

If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.


Number of partitions (estimate): 3142552
Number of partitions (estimate): 15625442
Number of partitions (estimate): 15244021
Number of partitions (estimate): 9592992
Number of partitions (estimate): 15839280





On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <SE...@homedepot.com>> wrote:



Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?





Sean Durity

lord of the (C*) rings (Staff Systems Engineer – Cassandra)

MTC 2250

#cassandra - for the latest news and updates



From: learner dba <ca...@yahoo.com>. INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org&d=DwMFaQ&c=MtgQEAMQGqekjTjiAhkudQ&r=aC_gxC6z_4f9GLlbWiKzHm1vucZTtVYWDDvyLkh8IaQ&m=8q4p6nWedWQJ9gpXCnoa6KR4HRmSf3B1whdYKNFub6M&s=TmzIaVextVyZy81p9JuU7R6PFv84RfhgtEezCe063V0&e=> <us...@cassandra.apache.org>>
Subject: [EXTERNAL] Cluster is unbalanced



Hi,



Data volume varies a lot in our two DC cluster:

 Load       Tokens       Owns

 20.01 GiB  256          ?

 65.32 GiB  256          ?

 60.09 GiB  256          ?

 46.95 GiB  256          ?

 50.73 GiB  256          ?

kaiprodv2

=========

/Leaving/Joining/Moving

 Load       Tokens       Owns

 25.19 GiB  256          ?

 30.26 GiB  256          ?

 9.82 GiB   256          ?

 20.54 GiB  256          ?

 9.7 GiB    256          ?



I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:



Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992

Number of partitions (estimate): 15839280



How can I diagnose this imbalance further?





--
Joshua Galbraith | Senior Software Engineer | New Relic
C: 907-209-1208 | jgalbraith@newrelic.com<ma...@newrelic.com>

Re: RE: [EXTERNAL] Cluster is unbalanced

Posted by learner dba <ca...@yahoo.com.INVALID>.

 We do not chose the node where partition will go. I thought it is snitch's role to chose replica nodes. Even the partition size does not vary on our largest column family:


Percentile  SSTables     Write Latency      Read Latency    Partition Size        Cell Count

                              (micros)          (micros)           (bytes)                  

50%             0.00             17.08             61.21              3311                 1

75%             0.00             20.50             88.15              3973                 1

95%             0.00             35.43            105.78              3973                 1

98%             0.00             42.51            126.93              3973                 1

99%             0.00             51.01            126.93              3973                 1

Min             0.00              3.97             17.09                61                 0

Max             0.00             73.46            126.93             11864                 1

We are kinda stuck here to identify, what could be causing this un-balance.
    On Tuesday, June 19, 2018, 7:15:28 AM EDT, Joshua Galbraith <jg...@newrelic.com.INVALID> wrote:  
 
 >If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

I'm not sure about that, is it possible you're writing more new partitions to some nodes even though each node owns the same number of tokens?

On Mon, Jun 18, 2018 at 6:07 PM, learner dba <ca...@yahoo.com.invalid> wrote:

 Hi Sean,
Are you using any rack aware topology? --> we are using gossip file
 Are you using any rack aware topology? --> we are using gossip file What are your partition keys? --> Partition key is uniqIs it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?  --> No, we verified it.
If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

Number of partitions (estimate): 3142552Number of partitions (estimate): 15625442Number of partitions (estimate): 15244021Number of partitions (estimate): 9592992Number of partitions (estimate): 15839280




    On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:  
 
 
Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?
 
  
 
  
 
Sean Durity
 
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
 
MTC 2250
 
#cassandra - for the latest news and updates
 
  
 
From: learner dba <cassandradba@yahoo.com. INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Cluster is unbalanced
 
  
 
Hi,
 
  
 
Data volume varies a lot in our two DC cluster:
 
 Load     Tokens     Owns 
 
 20.01 GiB 256         ?     
 
 65.32 GiB 256         ?     
 
 60.09 GiB 256         ?     
 
 46.95 GiB 256         ?     
 
 50.73 GiB 256         ?     
 
kaiprodv2
 
=========
 
/Leaving/Joining/Moving
 
 Load     Tokens     Owns 
 
 25.19 GiB 256         ?     
 
 30.26 GiB 256         ?     
 
 9.82 GiB 256         ?     
 
 20.54 GiB 256         ?     
 
 9.7 GiB    256         ?     
 
  
 
I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:
 
  
 
Number of partitions (estimate): 3142552
 
Number of partitions (estimate): 15625442
 
Number of partitions (estimate): 15244021
 
Number of partitions (estimate): 9592992
 
Number of partitions (estimate): 15839280
 
  
 
How can I diagnose this imbalance further?
 
  
   



-- 
Joshua Galbraith | Senior Software Engineer | New Relic
C: 907-209-1208 | jgalbraith@newrelic.com

Re: RE: [EXTERNAL] Cluster is unbalanced

Posted by Joshua Galbraith <jg...@newrelic.com.INVALID>.

>If it was partition key issue, we would see similar number of partition
keys across nodes. If we look closely number of keys across nodes vary a
lot.

I'm not sure about that, is it possible you're writing more new partitions
to some nodes even though each node owns the same number of tokens?

On Mon, Jun 18, 2018 at 6:07 PM, learner dba <cassandradba@yahoo.com.invalid
> wrote:

> Hi Sean,
>
> Are you using any rack aware topology? --> we are using gossip file
> Are you using any rack aware topology? --> we are using gossip file
>  What are your partition keys? --> Partition key is uniq
> Is it possible that your partition keys do not divide up as cleanly as you
> would like across the cluster because the data is not evenly distributed
> (by partition key)?  --> No, we verified it.
>
> If it was partition key issue, we would see similar number of partition
> keys across nodes. If we look closely number of keys across nodes vary a
> lot.
>
>
> Number of partitions (estimate): 3142552
> Number of partitions (estimate): 15625442
> Number of partitions (estimate): 15244021
> Number of partitions (estimate): 9592992
> Number of partitions (estimate): 15839280
>
>
>
>
>
> On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <
> SEAN_R_DURITY@homedepot.com> wrote:
>
>
> Are you using any rack aware topology? What are your partition keys? Is it
> possible that your partition keys do not divide up as cleanly as you would
> like across the cluster because the data is not evenly distributed (by
> partition key)?
>
>
>
>
>
> Sean Durity
>
> lord of the (C*) rings (Staff Systems Engineer – Cassandra)
>
> MTC 2250
>
> #cassandra - for the latest news and updates
>
>
>
> *From:* learner dba <ca...@yahoo.com.INVALID>
> *Sent:* Monday, June 18, 2018 2:06 PM
> *To:* User cassandra.apache.org <us...@cassandra.apache.org>
> *Subject:* [EXTERNAL] Cluster is unbalanced
>
>
>
> Hi,
>
>
>
> Data volume varies a lot in our two DC cluster:
>
>  Load       Tokens       Owns
>
>  20.01 GiB  256          ?
>
>  65.32 GiB  256          ?
>
>  60.09 GiB  256          ?
>
>  46.95 GiB  256          ?
>
>  50.73 GiB  256          ?
>
> kaiprodv2
>
> =========
>
> /Leaving/Joining/Moving
>
>  Load       Tokens       Owns
>
>  25.19 GiB  256          ?
>
>  30.26 GiB  256          ?
>
>  9.82 GiB   256          ?
>
>  20.54 GiB  256          ?
>
>  9.7 GiB    256          ?
>
>
>
> I ran clearsnapshot, garbagecollect and cleanup, but it increased the size
> on heavier nodes instead of decreasing. Based on nodetool cfstats, I can
> see partition keys on each node varies a lot:
>
>
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
>
> Number of partitions (estimate): 15839280
>
>
>
> How can I diagnose this imbalance further?
>
>
>



-- 
*Joshua Galbraith *| Senior Software Engineer | New Relic
C: 907-209-1208 | jgalbraith@newrelic.com

Re: RE: [EXTERNAL] Cluster is unbalanced

Posted by learner dba <ca...@yahoo.com.INVALID>.

 Hi Sean,
Are you using any rack aware topology? --> we are using gossip file
 Are you using any rack aware topology? --> we are using gossip file What are your partition keys? --> Partition key is uniqIs it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?  --> No, we verified it.
If it was partition key issue, we would see similar number of partition keys across nodes. If we look closely number of keys across nodes vary a lot.

Number of partitions (estimate): 3142552Number of partitions (estimate): 15625442Number of partitions (estimate): 15244021Number of partitions (estimate): 9592992Number of partitions (estimate): 15839280




    On Monday, June 18, 2018, 5:39:08 PM EDT, Durity, Sean R <SE...@homedepot.com> wrote:  
 
 #yiv2571733579 #yiv2571733579 -- _filtered #yiv2571733579 {font-family:Helvetica;panose-1:2 11 6 4 2 2 2 2 2 4;} _filtered #yiv2571733579 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv2571733579 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv2571733579 #yiv2571733579 p.yiv2571733579MsoNormal, #yiv2571733579 li.yiv2571733579MsoNormal, #yiv2571733579 div.yiv2571733579MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:New serif;}#yiv2571733579 a:link, #yiv2571733579 span.yiv2571733579MsoHyperlink {color:#0563C1;text-decoration:underline;}#yiv2571733579 a:visited, #yiv2571733579 span.yiv2571733579MsoHyperlinkFollowed {color:#954F72;text-decoration:underline;}#yiv2571733579 p {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:New serif;}#yiv2571733579 p.yiv2571733579msonormal0, #yiv2571733579 li.yiv2571733579msonormal0, #yiv2571733579 div.yiv2571733579msonormal0 {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:New serif;}#yiv2571733579 p.yiv2571733579ydpe3237830p1, #yiv2571733579 li.yiv2571733579ydpe3237830p1, #yiv2571733579 div.yiv2571733579ydpe3237830p1 {margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:New serif;}#yiv2571733579 span.yiv2571733579ydpe3237830s1 {}#yiv2571733579 span.yiv2571733579ydpe3237830apple-converted-space {}#yiv2571733579 span.yiv2571733579EmailStyle22 {font-family:sans-serif;color:#1F497D;}#yiv2571733579 .yiv2571733579MsoChpDefault {font-size:10.0pt;} _filtered #yiv2571733579 {margin:1.0in 1.0in 1.0in 1.0in;}#yiv2571733579 div.yiv2571733579WordSection1 {}#yiv2571733579 
Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?
 
  
 
  
 
Sean Durity
 
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
 
MTC 2250
 
#cassandra - for the latest news and updates
 
  
 
From: learner dba <ca...@yahoo.com.INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Cluster is unbalanced
 
  
 
Hi,
 
  
 
Data volume varies a lot in our two DC cluster:
 
 Load     Tokens     Owns 
 
 20.01 GiB 256         ?     
 
 65.32 GiB 256         ?     
 
 60.09 GiB 256         ?     
 
 46.95 GiB 256         ?     
 
 50.73 GiB 256         ?     
 
kaiprodv2
 
=========
 
/Leaving/Joining/Moving
 
 Load     Tokens     Owns 
 
 25.19 GiB 256         ?     
 
 30.26 GiB 256         ?     
 
 9.82 GiB 256         ?     
 
 20.54 GiB 256         ?     
 
 9.7 GiB    256         ?     
 
  
 
I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:
 
  
 
Number of partitions (estimate): 3142552
 
Number of partitions (estimate): 15625442
 
Number of partitions (estimate): 15244021
 
Number of partitions (estimate): 9592992
 
Number of partitions (estimate): 15839280
 
  
 
How can I diagnose this imbalance further?

RE: [EXTERNAL] Cluster is unbalanced

Posted by "Durity, Sean R" <SE...@homedepot.com>.

Are you using any rack aware topology? What are your partition keys? Is it possible that your partition keys do not divide up as cleanly as you would like across the cluster because the data is not evenly distributed (by partition key)?


Sean Durity
lord of the (C*) rings (Staff Systems Engineer – Cassandra)
MTC 2250
#cassandra - for the latest news and updates

From: learner dba <ca...@yahoo.com.INVALID>
Sent: Monday, June 18, 2018 2:06 PM
To: User cassandra.apache.org <us...@cassandra.apache.org>
Subject: [EXTERNAL] Cluster is unbalanced

Hi,

Data volume varies a lot in our two DC cluster:

 Load       Tokens       Owns

 20.01 GiB  256          ?

 65.32 GiB  256          ?

 60.09 GiB  256          ?

 46.95 GiB  256          ?

 50.73 GiB  256          ?

kaiprodv2

=========

/Leaving/Joining/Moving

 Load       Tokens       Owns

 25.19 GiB  256          ?

 30.26 GiB  256          ?

 9.82 GiB   256          ?

 20.54 GiB  256          ?

 9.7 GiB    256          ?

I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:


Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992
Number of partitions (estimate): 15839280

How can I diagnose this imbalance further?

Re: Cluster is unbalanced

Posted by learner dba <ca...@yahoo.com.INVALID>.

 
CREATE KEYSPACE data WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '3', 'dc2': '3'}  AND durable_writes = true;


cqlsh> select * from system_schema.keyspaces ;




 keyspace_name      | durable_writes | replication

--------------------+----------------+------------------------------------------------------------------------------------------------------------

               apim |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

        system_auth |           True |                        {'class': 'org.apache.cassandra.locator.SimpleStrategy', 'replication_factor': '1'}

         eventspace |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

      system_schema |           True |                                                    {'class': 'org.apache.cassandra.locator.LocalStrategy'}

            metrics |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

                sim |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

            billing |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '1', 'maikaiprodv2': '1'}

 system_distributed |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

             system |           True |                                                    {'class': 'org.apache.cassandra.locator.LocalStrategy'}

              audit |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

          dakota_ks |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

        credentials |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

      system_traces |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

               data |           True | {'class': 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'iwakaluaint': '3', 'maikaiprodv2': '3'}

Keyspaces with replica 1, are unused keyspaces. Most space is occupied by data and eventspace keyspaces.
    On Wednesday, June 20, 2018, 12:24:18 AM EDT, anil patimidi <ar...@gmail.com> wrote:  
 
 what is your keyspace configuration. Do you have all the keyspaces configured for both DCs?
can you run below query from cqlsh and see if the keyspace is configured to use both DCs 
select * from system.schema_keyspaces;   # if your cluster is on 2.1 or lessselect * from system_schema.keyspaces    # for 3.0 clusters

- Anil

On Mon, Jun 18, 2018 at 11:06 AM, learner dba <ca...@yahoo.com.invalid> wrote:

Hi,
Data volume varies a lot in our two DC cluster:

 Load       Tokens       Owns  

 20.01 GiB  256          ?     

 65.32 GiB  256          ?     

 60.09 GiB  256          ?     

 46.95 GiB  256          ?     

 50.73 GiB  256          ?     

kaiprodv2

=========

/Leaving/Joining/Moving

 Load       Tokens       Owns  

 25.19 GiB  256          ?     

 30.26 GiB  256          ?     

 9.82 GiB   256          ?     

 20.54 GiB  256          ?     

 9.7 GiB    256          ?     

I ran clearsnapshot, garbagecollect and cleanup, but it increased the size on heavier nodes instead of decreasing. Based on nodetool cfstats, I can see partition keys on each node varies a lot:

Number of partitions (estimate): 3142552

Number of partitions (estimate): 15625442

Number of partitions (estimate): 15244021

Number of partitions (estimate): 9592992
Number of partitions (estimate): 15839280
How can I diagnose this imbalance further?

Re: Cluster is unbalanced

Posted by anil patimidi <ar...@gmail.com>.

what is your keyspace configuration. Do you have all the keyspaces
configured for both DCs?

can you run below query from cqlsh and see if the keyspace is configured to
use both DCs

select * from system.schema_keyspaces;   # if your cluster is on 2.1 or less
select * from system_schema.keyspaces    # for 3.0 clusters

- Anil


On Mon, Jun 18, 2018 at 11:06 AM, learner dba <
cassandradba@yahoo.com.invalid> wrote:

> Hi,
>
> Data volume varies a lot in our two DC cluster:
>
>  Load       Tokens       Owns
>
>  20.01 GiB  256          ?
>
>  65.32 GiB  256          ?
>
>  60.09 GiB  256          ?
>
>  46.95 GiB  256          ?
>
>  50.73 GiB  256          ?
>
> kaiprodv2
>
> =========
>
> /Leaving/Joining/Moving
>
>  Load       Tokens       Owns
>
>  25.19 GiB  256          ?
>
>  30.26 GiB  256          ?
>
>  9.82 GiB   256          ?
>
>  20.54 GiB  256          ?
>
>  9.7 GiB    256          ?
>
> I ran clearsnapshot, garbagecollect and cleanup, but it increased the size
> on heavier nodes instead of decreasing. Based on nodetool cfstats, I can
> see partition keys on each node varies a lot:
>
> Number of partitions (estimate): 3142552
>
> Number of partitions (estimate): 15625442
>
> Number of partitions (estimate): 15244021
>
> Number of partitions (estimate): 9592992
> Number of partitions (estimate): 15839280
>
> How can I diagnose this imbalance further?
>
>