You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Devaki, Srinivas" <me...@eightnoteight.space> on 2018/12/09 10:09:32 UTC

Help in understanding strange cassandra CPU usage

Hi Guys,

Since the start of our org, cassandra used to be a SPOF, due to recent
priorities we changed our code base so that cassandra won't be SPOF
anymore, and during that process we made a kill switch within the
code(PHP), this kill switch would ensure that no connection is made to the
cassandra for any queries.

During the testing phase of kill switch we have identified a strange
behaviour that CPU and Load Average would go down from 400%(cpu),
14-20(load on a 16 core machine) to 20%(cpu), 2-3(load)

and even if the kill switch is activated only for 30 secs, then cpu would
go down from 400 to 20, and maintain at 20% for atleast 24 hrs before it
starts to increase back to 400 and stay consistent from then. and this is
for all the nodes but not just a few.

Details:
Cassandra Version: 2.2.4
Number of Nodes: 8
AWS Instance Type: c4.4xlarge
Number of Open Files: 30k to 50k (depending on number of auto scaled php
nodes)

Would be grateful for any explanation regarding this strange behaviour

Thanks & Regards
Srinivas Devaki
SRE/SDE at Zomato

Re: Help in understanding strange cassandra CPU usage

Posted by Jeff Jirsa <jj...@gmail.com>.

Sounds like over time you’re ending to doing something odd - maybe you’re leaking cql connections or something and it gets more and more intensive to manage them until you invoke the breaker, then it drops

Will probably take someone going through a heap dump to really understand what’s going on, which is unfortunate because it’s a fair amount of effort. 

-- 
Jeff Jirsa


> On Dec 9, 2018, at 2:09 AM, Devaki, Srinivas <me...@eightnoteight.space> wrote:
> 
> Hi Guys,
> 
> Since the start of our org, cassandra used to be a SPOF, due to recent priorities we changed our code base so that cassandra won't be SPOF anymore, and during that process we made a kill switch within the code(PHP), this kill switch would ensure that no connection is made to the cassandra for any queries.
> 
> During the testing phase of kill switch we have identified a strange behaviour that CPU and Load Average would go down from 400%(cpu), 14-20(load on a 16 core machine) to 20%(cpu), 2-3(load)
> 
> and even if the kill switch is activated only for 30 secs, then cpu would go down from 400 to 20, and maintain at 20% for atleast 24 hrs before it starts to increase back to 400 and stay consistent from then. and this is for all the nodes but not just a few.
> 
> Details:
> Cassandra Version: 2.2.4
> Number of Nodes: 8
> AWS Instance Type: c4.4xlarge
> Number of Open Files: 30k to 50k (depending on number of auto scaled php nodes)
> 
> Would be grateful for any explanation regarding this strange behaviour
> 
> Thanks & Regards
> Srinivas Devaki
> SRE/SDE at Zomato
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: Help in understanding strange cassandra CPU usage

Posted by Michael Shuler <mi...@pbandjelly.org>.

On 12/9/18 4:09 AM, Devaki, Srinivas wrote:
> 
> Cassandra Version: 2.2.4

There have been over 300 bug fixes and improvements in the nearly 3
years between 2.2.4 and the latest 2.2.13 release. Somewhere in there
was a GC logging addition as I scanned the changes, which could help
with troubleshooting / tuning. I think that testing the current 2.2
release may also be prudent to rule out some issue that has already been
found & fixed.

https://github.com/apache/cassandra/blob/cassandra-2.2.13/CHANGES.txt#L1-L352
https://github.com/apache/cassandra/blob/cassandra-2.2.13/NEWS.txt#L1-L140

-- 
Kind regards,
Michael

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org