You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Thibaut (JIRA)" <ji...@apache.org> on 2011/04/22 15:06:05 UTC

[jira] [Created] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Node not responding, bringing down cluster, marked as up
--------------------------------------------------------

                 Key: CASSANDRA-2543
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.7.4
            Reporter: Thibaut
             Fix For: 0.7.6
         Attachments: jstack

I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
If I restart the node, the node will hang after a certain number of time. I have no indication

It's marked as up when executing the nodetool ring command.
Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.


Netstats doesn't list anything interesting:

/software/cassandra/bin/nodetool -h localhost netstats
Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0          51064
Responses                       n/a         0         530479


I attached the jstack of the node. There are no indications that the node has faulty hardware. 



/usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon






--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023225#comment-13023225 ] 

Thibaut commented on CASSANDRA-2543:
------------------------------------

It flaps between one core and multiple cores. There was one thread taking 100% cpu, and the other ones flapping as you said.

How can the gc run constantly if there are still 1.5 Gig free according to nodetool info and I did shut down our application, so there shouldn't be any requests (or just a very few read requests) at all? But the localhost info heap info changed constantly (also grew close to the limit).

There also many small tables which haven't yet been compacted.


dynamic snitch is indeed disabled (because of data locality), but I will enable it and then bring the node to life again.

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023236#comment-13023236 ] 

Jonathan Ellis commented on CASSANDRA-2543:
-------------------------------------------

bq. It still interest me why the node didn't recover even though there were nearly no requests at all from our application

the two big users of memory are usually memtables and cache and neither one gets freed when requests stop. (until memtable_flush_after_mins is reached...  and flush needs memory to work so if you are GC storming hard enough it may not make progress even then.)

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thibaut updated CASSANDRA-2543:
-------------------------------

    Attachment: jstack

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023209#comment-13023209 ] 

Thibaut commented on CASSANDRA-2543:
------------------------------------

/software/cassandra/bin/nodetool -h localhost info
199
Gossip active    : true
Load             : 161.12 GB
Generation No    : 1303472783
Uptime (seconds) : 5024
Heap Memory (MB) : 3656.35 / 5214.00

As mentioned, restarting the node will put the node into this state again after some time.




> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023298#comment-13023298 ] 

Thibaut commented on CASSANDRA-2543:
------------------------------------

I stopped all java programs and restarted the cluster and increased the heap size to 5.5 gig. All my table have a memtable limit of 32MB and I only have about 20 tables with 2 column families in each table.

Executing a manual compactation or just waiting will return the following error:

ERROR [ReadStage:24] 2011-04-22 19:30:58,330 AbstractCassandraDaemon.java (line 112) Fatal exception in thread Thread[ReadStage:24,5,main]
java.lang.OutOfMemoryError: Java heap space
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:123)
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:108)
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.<init>(BufferedRandomAccessFile.java:93)
        at org.apache.cassandra.io.sstable.SSTableScanner.<init>(SSTableScanner.java:74)
        at org.apache.cassandra.io.sstable.SSTableReader.getScanner(SSTableReader.java:548)
        at org.apache.cassandra.db.RowIteratorFactory.getIterator(RowIteratorFactory.java:95)
        at org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:1425)
        at org.apache.cassandra.service.RangeSliceVerbHandler.doVerb(RangeSliceVerbHandler.java:49)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

One table has 778 intermediate tables and won't compact. (Each table has about the size of the memtable flush limit). Only the first one are biggers (e.g 5x 19 GB)

-rw-r--r-- 1 root root  18M 2011-04-22 17:07 table_userentries-f-42142-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:10 table_userentries-f-42143-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:12 table_userentries-f-42144-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:15 table_userentries-f-42145-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:17 table_userentries-f-42146-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:20 table_userentries-f-42147-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:23 table_userentries-f-42148-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:26 table_userentries-f-42149-Data.db
-rw-r--r-- 1 root root  13M 2011-04-22 17:33 table_userentries-f-42150-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:37 table_userentries-f-42152-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:38 table_userentries-f-42153-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:39 table_userentries-f-42154-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:40 table_userentries-f-42155-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:42 table_userentries-f-42156-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:43 table_userentries-f-42157-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:44 table_userentries-f-42158-Data.db
-rw-r--r-- 1 root root  17M 2011-04-22 17:45 table_userentries-f-42159-Data.db
-rw-r--r-- 1 root root 6.7M 2011-04-22 19:18 table_userentries-f-42160-Data.db
-rw-r--r-- 1 root root 528M 2011-04-22 19:19 table_userentries-f-42161-Data.db



It's certainly something related to compacting. 
There are log file entries related to cassandra compacting other tables (Compacting table_....). But this table never shows up (even not when I trigger a manual compaction on that table).





> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2543.
---------------------------------------

       Resolution: Duplicate
    Fix Version/s:     (was: 0.7.6)

Sounds good. Thanks for helping troubleshoot; I'm glad 0.7.5 is working well.

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025774#comment-13025774 ] 

Thibaut commented on CASSANDRA-2543:
------------------------------------

I simply removed the table, and everything works fine again, even at smaller memory footprints.

I did backup the files, so I will try to create test case with only one table. But it will take some time.

I did tune the memtable_throughput value and did get the OOM exception even with thrift & gossip disabled. 

It's similar to http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/OOM-on-heavy-write-load-td6296984.html.  I also wrote lot's of small entries (about 30-50KB per Entry).




> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025828#comment-13025828 ] 

Jonathan Ellis commented on CASSANDRA-2543:
-------------------------------------------

bq. I did tune the memtable_throughput value 

Not to belabor the point, but "remember that the memtable throughput value is the serialized size" means that if you're relying on tuning throughput alone you'll have to drop it to ~5MB with as many CFs as you have.

(This is improved in 0.8; see memtable_total_space_in_mb option from CASSANDRA-2006.)

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026398#comment-13026398 ] 

Thibaut commented on CASSANDRA-2543:
------------------------------------

You can close thisissue. I think it's a duplicate of CASSANDRA-2463 causing the Garbage collector to run every few seconds.
I updated to 0.7.5 and everything runs much smoother (I have also seen no timeoutexceptions in hector anymore).


> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023213#comment-13023213 ] 

Jonathan Ellis commented on CASSANDRA-2543:
-------------------------------------------

sounds like your node is OOMing

enable gc logging and check gc thread CPU usage (http://publib.boulder.ibm.com/infocenter/javasdk/tools/index.jsp?topic=/com.ibm.java.doc.igaa/_1vg0001475cb4a-1190e2e0f74-8000_1007.html)

re "bringing down cluster", are you sure dynamic snitch is enabled?

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023226#comment-13023226 ] 

Jonathan Ellis commented on CASSANDRA-2543:
-------------------------------------------

bq. How can the gc run constantly if there are still 1.5 Gig free 

could be fragmentation.  could be it's not gc.  i told you how to find out for sure earlier. :)

bq. dynamic snitch is indeed disabled (because of data locality)

you can have best of both worlds with dynamic snitch + badness threshold.

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Peter Schuller (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023220#comment-13023220 ] 

Peter Schuller commented on CASSANDRA-2543:
-------------------------------------------

If you're running out of memory with CMS there should be a constant fallback to full GC, which is single-threaded (with CMS). That's inconsistent with 50-100% cpu across all cores (typically you check top and it's immediately obvious that it's at almost exactly 100% and that tells you it's full gc).

Are you definitely spinning across multiple cores constantly, or does it flap between one core and multiple cores?

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Thibaut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023232#comment-13023232 ] 

Thibaut commented on CASSANDRA-2543:
------------------------------------

Ok thanks.

I will try above to make sure it's the GC thread taking up 100% cpu (which I suspect).

The GC indeed ran indeed very very often: (grepping the old log file)

 INFO [ScheduledTasks:1] 2011-04-22 15:07:24,742 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3133 ms, 1537221688 reclaimed leaving 3830570400 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:28,978 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 2901 ms, 2172093472 reclaimed leaving 3219903248 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:33,946 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3138 ms, 1715631768 reclaimed leaving 3581355456 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:38,318 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 2860 ms, 2467501736 reclaimed leaving 2874209912 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:43,466 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3059 ms, 1491300856 reclaimed leaving 3875493760 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:47,529 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 2780 ms, 2521501408 reclaimed leaving 2824239464 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:52,983 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3330 ms, 1287324312 reclaimed leaving 3964465048 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:07:57,147 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3071 ms, 1950115232 reclaimed leaving 3337420088 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:01,785 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3069 ms, 1748108592 reclaimed leaving 3444685288 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:06,397 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3011 ms, 1779092920 reclaimed leaving 3637964728 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:10,715 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 2950 ms, 2042754088 reclaimed leaving 3223819496 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:15,528 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3217 ms, 956021816 reclaimed leaving 4371838768 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:19,380 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 3007 ms, 2216336016 reclaimed leaving 3213552904 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:24,154 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 2973 ms, 1964499424 reclaimed leaving 3287557528 used; max is 5605687296
 INFO [ScheduledTasks:1] 2011-04-22 15:08:29,088 GCInspector.java (line 128) GC for ConcurrentMarkSweep: 2988 ms, 1964657952 reclaimed leaving 3418170312 used; max is 5605687296


It still interest me why the node didn't recover even though there were nearly no requests at all from our application.



> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2543) Node not responding, bringing down cluster, marked as up

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13023600#comment-13023600 ] 

Jonathan Ellis commented on CASSANDRA-2543:
-------------------------------------------

Sure sounds like a classic "my memtables and/or caches are too large" symptom to me. Note that the stack trace has nothing to do with compaction and is in fact OOMing trying to allocate a 256KB read buffer.

bq. All my table have a memtable limit of 32MB

Remember that the memtable throughput value is the *serialized* size, in-memory size is typically 8x to 12x that. So back of the envelope math is that you're in trouble if you haven't tuned the operations threshold down a lot.

> Node not responding, bringing down cluster, marked as up
> --------------------------------------------------------
>
>                 Key: CASSANDRA-2543
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2543
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.4
>            Reporter: Thibaut
>             Fix For: 0.7.6
>
>         Attachments: jstack
>
>
> I have one node which constantly hangs and brings done the entire cluster (not giving any answers).
> If I restart the node, the node will hang after a certain number of time. I have no indication
> It's marked as up when executing the nodetool ring command.
> Executing the ring command on the node itself (without any traffic on the cluster) takes at least 2 minutes to execute. The node takes about 50%-100% of cpu over all cpus.
> Netstats doesn't list anything interesting:
> /software/cassandra/bin/nodetool -h localhost netstats
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0          51064
> Responses                       n/a         0         530479
> I attached the jstack of the node. There are no indications that the node has faulty hardware. 
> /usr/bin/java -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms5254M -Xmx5254M -Xmn400M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=8080 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true -Dcassandra-foreground=yes -cp /software/cassandra/bin/../conf:/software/cassandra/bin/../build/classes:/software/cassandra/bin/../lib/antlr-3.1.3.jar:/software/cassandra/bin/../lib/apache-cassandra-0.7.4.jar:/software/cassandra/bin/../lib/avro-1.4.0-fixes.jar:/software/cassandra/bin/../lib/avro-1.4.0-sources-fixes.jar:/software/cassandra/bin/../lib/commons-cli-1.1.jar:/software/cassandra/bin/../lib/commons-codec-1.2.jar:/software/cassandra/bin/../lib/commons-collections-3.2.1.jar:/software/cassandra/bin/../lib/commons-lang-2.4.jar:/software/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:/software/cassandra/bin/../lib/guava-r05.jar:/software/cassandra/bin/../lib/high-scale-lib.jar:/software/cassandra/bin/../lib/jackson-core-asl-1.4.0.jar:/software/cassandra/bin/../lib/jackson-mapper-asl-1.4.0.jar:/software/cassandra/bin/../lib/jetty-6.1.21.jar:/software/cassandra/bin/../lib/jetty-util-6.1.21.jar:/software/cassandra/bin/../lib/jline-0.9.94.jar:/software/cassandra/bin/../lib/json-simple-1.1.jar:/software/cassandra/bin/../lib/jug-2.0.0.jar:/software/cassandra/bin/../lib/libthrift-0.5.jar:/software/cassandra/bin/../lib/log4j-1.2.16.jar:/software/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/software/cassandra/bin/../lib/slf4j-api-1.6.1.jar:/software/cassandra/bin/../lib/slf4j-log4j12-1.6.1.jar:/software/cassandra/bin/../lib/snakeyaml-1.6.jar org.apache.cassandra.thrift.CassandraDaemon

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira