You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Harika Vangapelli -T (hvangape - AKRAYA INC at Cisco)" <hv...@cisco.com> on 2017/07/14 18:23:08 UTC

Cassandra Node keep going down

We are using Cassandra 3.x version..

Recently, our production database is going through some instability issues. One of our node is keep going down from every 2 days up to a few of times a day. The node is down due to JVM out of memory. According to my investigation, I suspect that this might be related to the writing and/or running compaction of the large partitions for some of our large data tables. Here's might be what had happened
1. The node went OOM due to unable to de-serialize or compacting some large partitions under some condition due to memory constrains.
2. Once we re-started it, which was usually a few hours later, the other nodes in the cluster were trying to perform the hinted handoff to the down node to patch the missing data. From now on, the down node would have to handle handoff plus the normal data load, which made it even busier.
3. The node was not able to complete the handoff and went down again.
4. This went again and again.

This was not the first time we're seeing this issue. The last time, we fixed the issue by manually stopping some of aggregation jobs for a whole night to allow the node to complete the handoff. We're not too sure about the root cause yet, and we don't have explanation why this happens only to one node. I investigated the issue and found two related JIRAs of Cassandra
https://issues.apache.org/jira/browse/CASSANDRA-8269 and
https://issues.apache.org/jira/browse/CASSANDRA-8723

Both JIRA mentioned that this might only be the case with Cassandra 2.x.

Thanks,

Harika


[http://wwwin.cisco.com/c/dam/cec/organizations/gmcc/services-tools/signaturetool/images/logo/logo_gradient.png]



Harika Vangapelli
Engineer - IT
hvangape@cisco.com<ma...@cisco.com>
Tel:

Cisco Systems, Inc.



United States
cisco.com


[http://www.cisco.com/assets/swa/img/thinkbeforeyouprint.gif]Think before you print.

This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message.
Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information.



Re: Cassandra Node keep going down

Posted by Jeff Jirsa <jj...@apache.org>.

On 2017-07-14 11:23 (-0700), "Harika Vangapelli -T (hvangape - AKRAYA INC at Cisco)"
	<hv...@cisco.com> wrote: 
> We are using Cassandra 3.x version..
> 

Which 3.x version? 3.11.0? 3.0.14? 3.7? Exact version is important. 

> Recently, our production database is going through some instability issues. One of our node is keep going down from every 2 days up to a few of times a day. The node is down due to JVM out of memory. According to my investigation, I suspect that this might be related to the writing and/or running compaction of the large partitions for some of our large data tables. Here's might be what had happened
> 1. The node went OOM due to unable to de-serialize or compacting some large partitions under some condition due to memory constrains.
> 2. Once we re-started it, which was usually a few hours later, the other nodes in the cluster were trying to perform the hinted handoff to the down node to patch the missing data. From now on, the down node would have to handle handoff plus the normal data load, which made it even busier.
> 3. The node was not able to complete the handoff and went down again.
> 4. This went again and again.
> 

Sounds like it's always the same node? You may want to try running 'nodetool scrub' on that node and watching logs for errors that may indicate a corrupt file on disk, which would cause the behavior you're seeing.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org