You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Nitin (JIRA)" <ji...@apache.org> on 2013/11/13 23:11:20 UTC

[jira] [Created] (CASSANDRA-6346) Cassandra 2.0 server node runs out of memory during writes/replications

Nitin created CASSANDRA-6346:
--------------------------------

             Summary: Cassandra 2.0 server node runs out of memory during writes/replications
                 Key: CASSANDRA-6346
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6346
             Project: Cassandra
          Issue Type: Bug
            Reporter: Nitin


Currently we are running 18 node cassandra cluster with NetworkTopologyReplication Strategy (d1 = 3 and d2=3).  

Our severs seem to crash with OOM exceptions. Our heap size is 8Gb. However while crashing i got hold of the hprof file and ran it through an eclipse MAT analyzer

After analyzing the hprof (please see attachment for top offenders), i find that there is a linked blocking queue (from mutation stage) that seems to hold about 7.3 Gb of the total 8Gb of ram. 

After deep diving into the cassandra2.0 code, i see that every update/write/replication goes through stages and mutation stage  and the no of threads that flush this queue (I am assuming memtable to sstable write) is controlled by concurrent writes. Ours is set to 32 concurrent writes

However we observe node crashes even when there are 0 writes to the node but replication requests are floating around the cluster. 

Any ideas what are the knobs to throttle the size of these queues/max no of write and replication requests a node can get? What are the recommended settings to operate cassandra node in a mode where it rejects requests beyond certain queue threshold?






--
This message was sent by Atlassian JIRA
(v6.1#6144)