You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ashish Tyagi <ty...@gmail.com> on 2013/11/01 10:43:56 UTC

Re: High loads only on one node in the cluster

Hi Evan,

The clients connect to all nodes. We tried shutting the thrift server on
the affected node. Loads did not come down.



On Fri, Nov 1, 2013 at 12:59 AM, Evan Weaver <ev...@fauna.org> wrote:

> Are all your clients only connecting to your first node? I would
> probably strace it and compare the trace to one from a lightly loaded
> node.
>
> On Thu, Oct 31, 2013 at 7:12 PM, Ashish Tyagi <ty...@gmail.com>
> wrote:
> > We have a 9 node cluster. 6 nodes are in one data-center and 3 nodes in
> the
> > other. All machines are Amazon M1.XLarge configuration.
> >
> > Datacenter: DC1
> > ==========
> > Address         Rack        Status State   Load            Owns
> > Token
> >
> > ip11  1b          Up     Normal  76.46 GB        16.67%              0
> > ip12  1b          Up     Normal  44.66 GB        16.67%
> > 28356863910078205288614550619314017621
> > ip13  1c          Up     Normal  85.94 GB        16.67%
> > 56713727820156410577229101238628035241
> > ip14  1c          Up     Normal  17.55 GB        16.67%
> > 85070591730234615865843651857942052863
> > ip15  1d          Up     Normal  80.74 GB        16.67%
> > 113427455640312821154458202477256070484
> > ip16  1d          Up     Normal  20.88 GB        16.67%
> > 141784319550391026443072753096570088105
> >
> > Datacenter: DC2
> > ==========
> > Address         Rack        Status State   Load            Owns
> > Token
> >
> > ip21  1a          Up     Normal  78.32 GB        0.00%               1001
> > ip22  1b          Up     Normal  71.23 GB        0.00%
> > 56713727820156410577229101238628036241
> > ip23  1b          Up     Normal  53.49 GB        0.00%
> > 113427455640312821154458202477256071484
> >
> > Problem is that node with ip address: ip11 often has 5-10 times more load
> > than any other node. Most of the operations are on counters. The primary
> > column family (which receives most writes) has a replication factor of 2
> in
> > DataCenter DC1 and also in DataCenter DC2. The traffic is write heavy
> (reads
> > are less than 10% of total requests). We are using size-tiered
> compaction.
> > Both writes and reads happen with a consistency factor of LOCAL_QUORUM.
> >
> > More information:
> >
> > 1. cassandra.yaml - http://pastebin.com/u344fA6z
> > 2. Jmap heap when node under high loads - http://pastebin.com/ib3D0Pa
> > 3. Nodetool tpstats - http://pastebin.com/s0AS7bGd
> > 4. Cassandra-env.sh - http://pastebin.com/ubp4cGUx
> > 5. GC log lines -  http://pastebin.com/Y0TKphsm
> >
> > Am I doing anything wrong. Any pointers will be appreciated.
> >
> > Thanks in advance,
> > Ashish
>

Re: High loads only on one node in the cluster

Posted by Rakesh Rajan <ra...@gmail.com>.

Tyler,

Thanks for the explanation. The objective is not to have a perfectly
balanced US-East and SG DC clusters. SG DC cluster is just a backup cluster
and hence has lesser nodes than US-East cluster. What we are trying to
figure out is the imbalance between the 6 nodes within US-East itself. I'll
try to correct the 6 nodes with US-East to proper racks and check.

In addition, as I mentioned earlier, do you see any issues with the dynamic
snitch attribute score? I see that node has high score, but what value of
dynamic_snitch_badness_threshold should I set so that other replicas can
get the traffic? ( that node has >50% higher score than all other nodes )

On Fri, Nov 1, 2013 at 10:04 PM, Tyler Hobbs <ty...@datastax.com> wrote:

>
> On Fri, Nov 1, 2013 at 5:07 AM, Rakesh Rajan <ra...@gmail.com> wrote:
>
>>
>> 1) By alternating racks, do you mean to alternate racks between all nodes
>> in a single DC v/s multiple DCs? AWS EastCoast has 4 AZs
>> and Singapore has 2 AZs. So is the final solution something like this:
>> ip11 - East Coast - m1.xlarge / us-east-1b         - Token: 0
>> ip21 - Singapore  - m1.xlarge / ap-southeast-1a - Token: 1001
>> ip12 - East Coast - m1.xlarge / us-east-*1c*         -
>> Token: 28356863910078205288614550619314017621
>> ip13 - East Coast - m1.xlarge / us-east-*1d*         -
>> Token: 56713727820156410577229101238628035241
>> ip22 - Singapore  - m1.xlarge / ap-southeast-1b -
>> Token: 56713727820156410577229101238628036241
>> ip14 - East Coast - m1.xlarge / us-east-*1a*         -
>> Token: 85070591730234615865843651857942052863
>> ip15 - East Coast - m1.xlarge / us-east-*1b*         -
>> Token: 113427455640312821154458202477256070484
>> ip23 - Singapore  - m1.xlarge / ap-southeast-*1a* -
>> Token: 113427455640312821154458202477256071484
>> ip16 - East Coast - m1.xlarge / us-east-*1c*         -
>> Token: 141784319550391026443072753096570088105
>>
>> Is this what you had suggested?
>>
>
> That would be more balanced than your current setup, but it would still be
> unbalanced, especially the ap-southeast DC.  To have a perfectly balanced
> cluster with multiple racks, you need to a) have the same number of nodes
> on each rack, and b) alternate racks within each DC.  Your new layout would
> meet requirement (b), but not (a).  This is why I suggest using the same
> rack for all nodes.
>
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>

Re: High loads only on one node in the cluster

Posted by Tyler Hobbs <ty...@datastax.com>.

On Fri, Nov 1, 2013 at 5:07 AM, Rakesh Rajan <ra...@gmail.com> wrote:

>
> 1) By alternating racks, do you mean to alternate racks between all nodes
> in a single DC v/s multiple DCs? AWS EastCoast has 4 AZs
> and Singapore has 2 AZs. So is the final solution something like this:
> ip11 - East Coast - m1.xlarge / us-east-1b         - Token: 0
> ip21 - Singapore  - m1.xlarge / ap-southeast-1a - Token: 1001
> ip12 - East Coast - m1.xlarge / us-east-*1c*         -
> Token: 28356863910078205288614550619314017621
> ip13 - East Coast - m1.xlarge / us-east-*1d*         -
> Token: 56713727820156410577229101238628035241
> ip22 - Singapore  - m1.xlarge / ap-southeast-1b -
> Token: 56713727820156410577229101238628036241
> ip14 - East Coast - m1.xlarge / us-east-*1a*         -
> Token: 85070591730234615865843651857942052863
> ip15 - East Coast - m1.xlarge / us-east-*1b*         -
> Token: 113427455640312821154458202477256070484
> ip23 - Singapore  - m1.xlarge / ap-southeast-*1a* -
> Token: 113427455640312821154458202477256071484
> ip16 - East Coast - m1.xlarge / us-east-*1c*         -
> Token: 141784319550391026443072753096570088105
>
> Is this what you had suggested?
>

That would be more balanced than your current setup, but it would still be
unbalanced, especially the ap-southeast DC.  To have a perfectly balanced
cluster with multiple racks, you need to a) have the same number of nodes
on each rack, and b) alternate racks within each DC.  Your new layout would
meet requirement (b), but not (a).  This is why I suggest using the same
rack for all nodes.


-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: High loads only on one node in the cluster

Posted by Rakesh Rajan <ra...@gmail.com>.

Forgot to mention: All 9 nodes on Cassandra 1.2.9. Also, tpstats on the
high CPU node indicate:


   1. Pool Name                    Active   Pending      Completed
   Blocked  All time blocked
   2. ReadStage                        32      6600     3420385815
   0                 0
   3. RequestResponseStage              0         0     2094235864
   0                 0
   4. MutationStage                     0         0     3102461222
   0                 0
   5. ReadRepairStage                   0         0         438089
   0                 0
   6. *ReplicateOnWriteStage             0         0      253180440
   0          23703996*
   7. GossipStage                       0         0        5917301
   0                 0
   8. AntiEntropyStage                  0         0           1486
   0                 0
   9. MigrationStage                    0         0            143
   0                 0
   10. MemtablePostFlusher               0         0          39070
   0                 0
   11. FlushWriter                       0         0           7452
   0               927
   12. MiscStage                         0         0            257
   0                 0
   13. commitlog_archiver                0         0              0
   0                 0
   14. AntiEntropySessions               0         0              1
   0                 0
   15. InternalResponseStage             0         0             62
   0                 0
   16. HintedHandoff                     0         0           1961
   0                 0
   17.
   18. Message type           Dropped
   19. RANGE_SLICE               1681
   20. READ_REPAIR               3921
   21. BINARY                       0
   22. READ                   4103953
   23. MUTATION               2651071
   24. _TRACE                       0
   25. REQUEST_RESPONSE          3229


On Fri, Nov 1, 2013 at 3:37 PM, Rakesh Rajan <ra...@gmail.com> wrote:

> @Tyler / @Rob,
>
> As Ashish mentioned earlier, we have 9 nodes on AWS - 6 on EastCoast and 3
> on Singapore. All 9 nodes uses EC2Snitch. The current ring ( across all
> nodes in 2 DC ) looks like this:
>
> ip11 - East Coast - m1.xlarge / us-east-1b         - Size: 83 GB - Token:
> 0
> ip21 - Singapore  - m1.xlarge / ap-southeast-1a - Size: 88 GB - Token:
> 1001
> ip12 - East Coast - m1.xlarge / us-east-1b         - Size: 45 GB -
> Token: 28356863910078205288614550619314017621
> ip13 - East Coast - m1.xlarge / us-east-1c         - Size: 93 GB -
> Token: 56713727820156410577229101238628035241
> ip22 - Singapore  - m1.xlarge / ap-southeast-1b - Size: 73 GB -
> Token: 56713727820156410577229101238628036241
> ip14 - East Coast - m1.xlarge / us-east-1c         - Size: 20 GB -
> Token: 85070591730234615865843651857942052863
> ip15 - East Coast - m1.xlarge / us-east-1d         - Size: 89 GB -
> Token: 113427455640312821154458202477256070484
> ip23 - Singapore  - m1.xlarge / ap-southeast-1b - Size: 56 GB -
> Token: 113427455640312821154458202477256071484
> ip16 - East Coast - m1.xlarge / us-east-1d         - Size: 25 GB -
> Token: 141784319550391026443072753096570088105
>
> Regarding alternating racks solution, I've the following queries:
>
> 1) By alternating racks, do you mean to alternate racks between all nodes
> in a single DC v/s multiple DCs? AWS EastCoast has 4 AZs
> and Singapore has 2 AZs. So is the final solution something like this:
> ip11 - East Coast - m1.xlarge / us-east-1b         - Token: 0
> ip21 - Singapore  - m1.xlarge / ap-southeast-1a - Token: 1001
> ip12 - East Coast - m1.xlarge / us-east-*1c*         -
> Token: 28356863910078205288614550619314017621
> ip13 - East Coast - m1.xlarge / us-east-*1d*         -
> Token: 56713727820156410577229101238628035241
> ip22 - Singapore  - m1.xlarge / ap-southeast-1b -
> Token: 56713727820156410577229101238628036241
> ip14 - East Coast - m1.xlarge / us-east-*1a*         -
> Token: 85070591730234615865843651857942052863
> ip15 - East Coast - m1.xlarge / us-east-*1b*         -
> Token: 113427455640312821154458202477256070484
> ip23 - Singapore  - m1.xlarge / ap-southeast-*1a* -
> Token: 113427455640312821154458202477256071484
> ip16 - East Coast - m1.xlarge / us-east-*1c*         -
> Token: 141784319550391026443072753096570088105
>
> Is this what you had suggested?
>
>  2) How does dynamic_snitch_badness_threshold: 0.1 effect the CPU load? On
> the node ( ip11 ) which was high CPU ( system load > 30 ), I checked the
> attribute score ( via JMX
> bean org.apache.cassandra.db:type=DynamicEndpointSnitch ) and saw the
> following:
> EastCoast:
>     *ip11 = 1.6813321647677475*
>     ip12 = 1.0003505696757231
>     ip13 = 1.1324160525509974
>     ip14 = 1.000350569675723
>     ip15 = 1.0007011393514456
>     ip16 = 1.0005258545135842
> Singapore:
>     ip21 = 1.095880806310253
>     ip22 = 1.4100000000000001
>     ip23 = 1.0953549517966696
>
> So ip11 node is indeed having higher score - but not sure why traffic is
> still going to that replica as opposed to some other node?
>
> Thanks!
>
>
>
> On Fri, Nov 1, 2013 at 3:13 PM, Ashish Tyagi <ty...@gmail.com> wrote:
>
>> Hi Evan,
>>
>> The clients connect to all nodes. We tried shutting the thrift server on
>> the affected node. Loads did not come down.
>>
>>
>>
>> On Fri, Nov 1, 2013 at 12:59 AM, Evan Weaver <ev...@fauna.org> wrote:
>>
>>> Are all your clients only connecting to your first node? I would
>>> probably strace it and compare the trace to one from a lightly loaded
>>> node.
>>>
>>> On Thu, Oct 31, 2013 at 7:12 PM, Ashish Tyagi <ty...@gmail.com>
>>> wrote:
>>> > We have a 9 node cluster. 6 nodes are in one data-center and 3 nodes
>>> in the
>>> > other. All machines are Amazon M1.XLarge configuration.
>>> >
>>> > Datacenter: DC1
>>> > ==========
>>> > Address         Rack        Status State   Load            Owns
>>> > Token
>>> >
>>> > ip11  1b          Up     Normal  76.46 GB        16.67%              0
>>> > ip12  1b          Up     Normal  44.66 GB        16.67%
>>> > 28356863910078205288614550619314017621
>>> > ip13  1c          Up     Normal  85.94 GB        16.67%
>>> > 56713727820156410577229101238628035241
>>> > ip14  1c          Up     Normal  17.55 GB        16.67%
>>> > 85070591730234615865843651857942052863
>>> > ip15  1d          Up     Normal  80.74 GB        16.67%
>>> > 113427455640312821154458202477256070484
>>> > ip16  1d          Up     Normal  20.88 GB        16.67%
>>> > 141784319550391026443072753096570088105
>>> >
>>> > Datacenter: DC2
>>> > ==========
>>> > Address         Rack        Status State   Load            Owns
>>> > Token
>>> >
>>> > ip21  1a          Up     Normal  78.32 GB        0.00%
>>> 1001
>>> > ip22  1b          Up     Normal  71.23 GB        0.00%
>>> > 56713727820156410577229101238628036241
>>> > ip23  1b          Up     Normal  53.49 GB        0.00%
>>> > 113427455640312821154458202477256071484
>>> >
>>> > Problem is that node with ip address: ip11 often has 5-10 times more
>>> load
>>> > than any other node. Most of the operations are on counters. The
>>> primary
>>> > column family (which receives most writes) has a replication factor of
>>> 2 in
>>> > DataCenter DC1 and also in DataCenter DC2. The traffic is write heavy
>>> (reads
>>> > are less than 10% of total requests). We are using size-tiered
>>> compaction.
>>> > Both writes and reads happen with a consistency factor of LOCAL_QUORUM.
>>> >
>>> > More information:
>>> >
>>> > 1. cassandra.yaml - http://pastebin.com/u344fA6z
>>> > 2. Jmap heap when node under high loads - http://pastebin.com/ib3D0Pa
>>> > 3. Nodetool tpstats - http://pastebin.com/s0AS7bGd
>>> > 4. Cassandra-env.sh - http://pastebin.com/ubp4cGUx
>>> > 5. GC log lines -  http://pastebin.com/Y0TKphsm
>>> >
>>> > Am I doing anything wrong. Any pointers will be appreciated.
>>> >
>>> > Thanks in advance,
>>> > Ashish
>>>
>>
>>
>

Re: High loads only on one node in the cluster

Posted by Rakesh Rajan <ra...@gmail.com>.

@Tyler / @Rob,

As Ashish mentioned earlier, we have 9 nodes on AWS - 6 on EastCoast and 3
on Singapore. All 9 nodes uses EC2Snitch. The current ring ( across all
nodes in 2 DC ) looks like this:

ip11 - East Coast - m1.xlarge / us-east-1b         - Size: 83 GB - Token: 0
ip21 - Singapore  - m1.xlarge / ap-southeast-1a - Size: 88 GB - Token: 1001
ip12 - East Coast - m1.xlarge / us-east-1b         - Size: 45 GB -
Token: 28356863910078205288614550619314017621
ip13 - East Coast - m1.xlarge / us-east-1c         - Size: 93 GB -
Token: 56713727820156410577229101238628035241
ip22 - Singapore  - m1.xlarge / ap-southeast-1b - Size: 73 GB -
Token: 56713727820156410577229101238628036241
ip14 - East Coast - m1.xlarge / us-east-1c         - Size: 20 GB -
Token: 85070591730234615865843651857942052863
ip15 - East Coast - m1.xlarge / us-east-1d         - Size: 89 GB -
Token: 113427455640312821154458202477256070484
ip23 - Singapore  - m1.xlarge / ap-southeast-1b - Size: 56 GB -
Token: 113427455640312821154458202477256071484
ip16 - East Coast - m1.xlarge / us-east-1d         - Size: 25 GB -
Token: 141784319550391026443072753096570088105

Regarding alternating racks solution, I've the following queries:

1) By alternating racks, do you mean to alternate racks between all nodes
in a single DC v/s multiple DCs? AWS EastCoast has 4 AZs
and Singapore has 2 AZs. So is the final solution something like this:
ip11 - East Coast - m1.xlarge / us-east-1b         - Token: 0
ip21 - Singapore  - m1.xlarge / ap-southeast-1a - Token: 1001
ip12 - East Coast - m1.xlarge / us-east-*1c*         -
Token: 28356863910078205288614550619314017621
ip13 - East Coast - m1.xlarge / us-east-*1d*         -
Token: 56713727820156410577229101238628035241
ip22 - Singapore  - m1.xlarge / ap-southeast-1b -
Token: 56713727820156410577229101238628036241
ip14 - East Coast - m1.xlarge / us-east-*1a*         -
Token: 85070591730234615865843651857942052863
ip15 - East Coast - m1.xlarge / us-east-*1b*         -
Token: 113427455640312821154458202477256070484
ip23 - Singapore  - m1.xlarge / ap-southeast-*1a* -
Token: 113427455640312821154458202477256071484
ip16 - East Coast - m1.xlarge / us-east-*1c*         -
Token: 141784319550391026443072753096570088105

Is this what you had suggested?

2) How does dynamic_snitch_badness_threshold: 0.1 effect the CPU load? On
the node ( ip11 ) which was high CPU ( system load > 30 ), I checked the
attribute score ( via JMX
bean org.apache.cassandra.db:type=DynamicEndpointSnitch ) and saw the
following:
EastCoast:
    *ip11 = 1.6813321647677475*
    ip12 = 1.0003505696757231
    ip13 = 1.1324160525509974
    ip14 = 1.000350569675723
    ip15 = 1.0007011393514456
    ip16 = 1.0005258545135842
Singapore:
    ip21 = 1.095880806310253
    ip22 = 1.4100000000000001
    ip23 = 1.0953549517966696

So ip11 node is indeed having higher score - but not sure why traffic is
still going to that replica as opposed to some other node?

Thanks!



On Fri, Nov 1, 2013 at 3:13 PM, Ashish Tyagi <ty...@gmail.com> wrote:

> Hi Evan,
>
> The clients connect to all nodes. We tried shutting the thrift server on
> the affected node. Loads did not come down.
>
>
>
> On Fri, Nov 1, 2013 at 12:59 AM, Evan Weaver <ev...@fauna.org> wrote:
>
>> Are all your clients only connecting to your first node? I would
>> probably strace it and compare the trace to one from a lightly loaded
>> node.
>>
>> On Thu, Oct 31, 2013 at 7:12 PM, Ashish Tyagi <ty...@gmail.com>
>> wrote:
>> > We have a 9 node cluster. 6 nodes are in one data-center and 3 nodes in
>> the
>> > other. All machines are Amazon M1.XLarge configuration.
>> >
>> > Datacenter: DC1
>> > ==========
>> > Address         Rack        Status State   Load            Owns
>> > Token
>> >
>> > ip11  1b          Up     Normal  76.46 GB        16.67%              0
>> > ip12  1b          Up     Normal  44.66 GB        16.67%
>> > 28356863910078205288614550619314017621
>> > ip13  1c          Up     Normal  85.94 GB        16.67%
>> > 56713727820156410577229101238628035241
>> > ip14  1c          Up     Normal  17.55 GB        16.67%
>> > 85070591730234615865843651857942052863
>> > ip15  1d          Up     Normal  80.74 GB        16.67%
>> > 113427455640312821154458202477256070484
>> > ip16  1d          Up     Normal  20.88 GB        16.67%
>> > 141784319550391026443072753096570088105
>> >
>> > Datacenter: DC2
>> > ==========
>> > Address         Rack        Status State   Load            Owns
>> > Token
>> >
>> > ip21  1a          Up     Normal  78.32 GB        0.00%
>> 1001
>> > ip22  1b          Up     Normal  71.23 GB        0.00%
>> > 56713727820156410577229101238628036241
>> > ip23  1b          Up     Normal  53.49 GB        0.00%
>> > 113427455640312821154458202477256071484
>> >
>> > Problem is that node with ip address: ip11 often has 5-10 times more
>> load
>> > than any other node. Most of the operations are on counters. The primary
>> > column family (which receives most writes) has a replication factor of
>> 2 in
>> > DataCenter DC1 and also in DataCenter DC2. The traffic is write heavy
>> (reads
>> > are less than 10% of total requests). We are using size-tiered
>> compaction.
>> > Both writes and reads happen with a consistency factor of LOCAL_QUORUM.
>> >
>> > More information:
>> >
>> > 1. cassandra.yaml - http://pastebin.com/u344fA6z
>> > 2. Jmap heap when node under high loads - http://pastebin.com/ib3D0Pa
>> > 3. Nodetool tpstats - http://pastebin.com/s0AS7bGd
>> > 4. Cassandra-env.sh - http://pastebin.com/ubp4cGUx
>> > 5. GC log lines -  http://pastebin.com/Y0TKphsm
>> >
>> > Am I doing anything wrong. Any pointers will be appreciated.
>> >
>> > Thanks in advance,
>> > Ashish
>>
>
>