You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Joel Samuelsson <sa...@gmail.com> on 2013/04/16 09:46:51 UTC

Reduce Cassandra GC

Hi,

We have a small production cluster with two nodes. The load on the nodes is
very small, around 20 reads / sec and about the same for writes. There are
around 2.5 million keys in the cluster and a RF of 2.

About 2.4 million of the rows are skinny (6 columns) and around 3kb in size
(each). Currently, scripts are running, accessing all of the keys in
timeorder to do some calculations.

While running the scripts, the nodes go down and then come back up 6-7
minutes later. This seems to be due to GC. I get lines like this in the log:
INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122)
GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
1046937600

However, the heap is not full. The heap usage has a jagged pattern going
from 60% up to 70% during 5 minutes and then back down to 60% the next 5
minutes and so on. I get no "Heap is X full..." messages. Every once in a
while at one of these peaks, I get these stop-the-world GC for 6-7
minutes. Why does GC take up so much time even though the heap isn't full?

I am aware that my access patterns make key caching very unlikely to be
high. And indeed, my average key cache hit ratio during the run of the
scripts is around 0.5%. I tried disabling key caching on the accessed
column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
cassandra-cli but I get the same behaviour. Is the turning key cache off
effective immediately?

Stop-the-world GC is fine if it happens for a few seconds but having them
for several minutes doesn't work. Any other suggestions to remove them?

Best regards,
Joel Samuelsson

Re: Reduce Cassandra GC

Posted by Fabrice Facorat <fa...@gmail.com>.

2013/6/19 Takenori Sato <ts...@cloudian.com>:
> GC options are not set. You should see the followings.
>
>  -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure
> -Xloggc:/var/log/cassandra/gc-1371603607.log
>
>> Is it normal to have two processes like this?
>
> No. You are running two processes.

It's "normal" as this is jsvc behavior.
It start as root, then fork to start cassandra as cassandra user, you
en up with 2 process with the second one running as cassandra having
the one started as root for parent

http://commons.apache.org/proper/commons-daemon/jsvc.html

--
Close the World, Open the Net
http://www.linux-wizard.net

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

Right, after getting the GC logging information I tested upgrading to 1.2.
Didn't help but I forgot to reenable the GC options.

> No. You are running two processes.
Ok, that's weird. I am using an unmodified version of a startup script in
/etc/init.d/cassandra from the Debian package. Here's some output:

joel@dev:~$ sudo /etc/init.d/cassandra stop
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
joel@dev:~$ ps -ef | grep cassandra
joel      5038  9763  0 08:58 pts/2    00:00:00 grep --color=auto cassandra
joel@dev:~$ sudo /etc/init.d/cassandra start
xss =  -ea -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M
-Xmn800M -XX:+HeapDumpOnOutOfMemoryError -Xss180k
joel@dev:~$ ps -ef | grep cassandra
root      5116     1  0 08:59 ?        00:00:00 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371632342.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371632342.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon
103       5117  5116 99 08:59 ?        640511-22:40:28 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371632342.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371632342.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon
joel      5502  9763  0 08:59 pts/2    00:00:00 grep --color=auto cassandra

Can the two processes have anything to do with my issues?


2013/6/19 Takenori Sato <ts...@cloudian.com>

> GC options are not set. You should see the followings.
>
>  -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure
> -Xloggc:/var/log/cassandra/gc-1371603607.log
>
> > Is it normal to have two processes like this?
>
> No. You are running two processes.
>
>
> On Wed, Jun 19, 2013 at 4:16 PM, Joel Samuelsson <
> samuelsson.joel@gmail.com> wrote:
>
>> My Cassandra ps info:
>>
>> root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user
>> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
>> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>> -cp
>> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> -Dlog4j.configuration=log4j-server.properties
>> -Dlog4j.defaultInitOverride=true
>> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
>> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
>> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> org.apache.cassandra.service.CassandraDaemon
>> 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc
>> -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
>> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>> -cp
>> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> -Dlog4j.configuration=log4j-server.properties
>> -Dlog4j.defaultInitOverride=true
>> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
>> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
>> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> org.apache.cassandra.service.CassandraDaemon
>>
>> Is it normal to have two processes like this?
>>
>>
>

Re: Reduce Cassandra GC

Posted by Takenori Sato <ts...@cloudian.com>.

GC options are not set. You should see the followings.

 -XX:+PrintGCDateStamps -XX:+PrintPromotionFailure
-Xloggc:/var/log/cassandra/gc-1371603607.log

> Is it normal to have two processes like this?

No. You are running two processes.


On Wed, Jun 19, 2013 at 4:16 PM, Joel Samuelsson
<sa...@gmail.com>wrote:

> My Cassandra ps info:
>
> root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
> 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
>
> Is it normal to have two processes like this?
>
>

Re: Reduce Cassandra GC

Posted by Mohit Anchlia <mo...@gmail.com>.

Can you paste the output of cfstats and cfhistograms?

Also try and get histo at 2 diff points 1) when it looks good 2) when it
gets slow

http://docs.oracle.com/javase/6/docs/technotes/tools/share/jmap.html

Look for jmap -histo

On Thu, Jun 20, 2013 at 12:27 AM, Joel Samuelsson <samuelsson.joel@gmail.com
> wrote:

> 12.3 GB data per node (only one ńode).
> 16GB RAM.
> In virtual environment with the CPU specified as "8 cores", average CPU
> use is close to 0% (basically no load, around 12 requests / sec, mostly
> from OpsCenter).
> Average memory use is 4GB. Around 1GB heap used by Cassandra (out of 4GB).
>
>
> 2013/6/19 Mohit Anchlia <mo...@gmail.com>
>
>> How much data do you have per node?
>> How much RAM per node?
>> How much CPU per node?
>> What is the avg CPU and memory usage?
>>
>>  On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson <
>> samuelsson.joel@gmail.com> wrote:
>>
>>>  My Cassandra ps info:
>>>
>>> root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user
>>> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
>>> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>>> -cp
>>> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>>> -Dlog4j.configuration=log4j-server.properties
>>> -Dlog4j.defaultInitOverride=true
>>> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
>>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
>>> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>>> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
>>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
>>> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
>>> -Dcom.sun.management.jmxremote.ssl=false
>>> -Dcom.sun.management.jmxremote.authenticate=false
>>> org.apache.cassandra.service.CassandraDaemon
>>> 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc
>>> -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
>>> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>>> -cp
>>> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>>> -Dlog4j.configuration=log4j-server.properties
>>> -Dlog4j.defaultInitOverride=true
>>> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
>>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
>>> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>>> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
>>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
>>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
>>> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
>>> -Dcom.sun.management.jmxremote.ssl=false
>>> -Dcom.sun.management.jmxremote.authenticate=false
>>> org.apache.cassandra.service.CassandraDaemon
>>>
>>> Is it normal to have two processes like this?
>>>
>>>
>>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

12.3 GB data per node (only one ńode).
16GB RAM.
In virtual environment with the CPU specified as "8 cores", average CPU use
is close to 0% (basically no load, around 12 requests / sec, mostly from
OpsCenter).
Average memory use is 4GB. Around 1GB heap used by Cassandra (out of 4GB).


2013/6/19 Mohit Anchlia <mo...@gmail.com>

> How much data do you have per node?
> How much RAM per node?
> How much CPU per node?
> What is the avg CPU and memory usage?
>
> On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson <
> samuelsson.joel@gmail.com> wrote:
>
>>  My Cassandra ps info:
>>
>> root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user
>> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
>> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>> -cp
>> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> -Dlog4j.configuration=log4j-server.properties
>> -Dlog4j.defaultInitOverride=true
>> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
>> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
>> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> org.apache.cassandra.service.CassandraDaemon
>> 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc
>> -user cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
>> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
>> -cp
>> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
>> -Dlog4j.configuration=log4j-server.properties
>> -Dlog4j.defaultInitOverride=true
>> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
>> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
>> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
>> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
>> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
>> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
>> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
>> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
>> -Dcom.sun.management.jmxremote.ssl=false
>> -Dcom.sun.management.jmxremote.authenticate=false
>> org.apache.cassandra.service.CassandraDaemon
>>
>> Is it normal to have two processes like this?
>>
>>
>

Re: Reduce Cassandra GC

Posted by Mohit Anchlia <mo...@gmail.com>.

How much data do you have per node?
How much RAM per node?
How much CPU per node?
What is the avg CPU and memory usage?

On Wed, Jun 19, 2013 at 12:16 AM, Joel Samuelsson <samuelsson.joel@gmail.com
> wrote:

>  My Cassandra ps info:
>
> root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
> 103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc -user
> cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
> /var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
> -cp
> /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
> -Dlog4j.configuration=log4j-server.properties
> -Dlog4j.defaultInitOverride=true
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
> -XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.authenticate=false
> org.apache.cassandra.service.CassandraDaemon
>
> Is it normal to have two processes like this?
>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

My Cassandra ps info:

root     26791     1  0 07:14 ?        00:00:00 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon
103      26792 26791 99 07:14 ?        854015-22:02:22 /usr/bin/jsvc -user
cassandra -home /opt/java/64/jre1.6.0_32/bin/../ -pidfile
/var/run/cassandra.pid -errfile &1 -outfile /var/log/cassandra/output.log
-cp
/usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.6.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.3.jar:/usr/share/cassandra/lib/guava-13.0.1.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/cassandra/lib/jline-1.0.jar:/usr/share/cassandra/lib/jna.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.7.0.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/lz4-1.1.0.jar:/usr/share/cassandra/lib/metrics-core-2.0.3.jar:/usr/share/cassandra/lib/netty-3.5.9.Final.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.7.2.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.7.2.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/lib/snaptree-0.1.jar:/usr/share/cassandra/apache-cassandra-1.2.5.jar:/usr/share/cassandra/apache-cassandra-thrift-1.2.5.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/cassandra/stress.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar
-Dlog4j.configuration=log4j-server.properties
-Dlog4j.defaultInitOverride=true
-XX:HeapDumpPath=/var/lib/cassandra/java_1371626058.hprof
-XX:ErrorFile=/var/lib/cassandra/hs_err_1371626058.log -ea
-javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities
-XX:ThreadPriorityPolicy=42 -Xms4004M -Xmx4004M -Xmn800M
-XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB
-Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false
org.apache.cassandra.service.CassandraDaemon

Is it normal to have two processes like this?

Re: Reduce Cassandra GC

Posted by "Takenori Sato(Cloudian)" <ts...@cloudian.com>.

Oops, sorry. I missed that.

As you see, if the ParNew GC have waited for such a CMS GC by promotion 
failure, it would make sense. But unfortunatelly, the time is 
different.  By 2 mins.

2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K), 
145.1887460 secs]
...
INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 GCInspector.java (line 
122) GC for ParNew: 145189 ms for 1 collections, 225905072 used; max is 
4114612224

Just to make sure, can you post your Cassandra's ps info?

(2013/06/19 0:48), Joel Samuelsson wrote:
> Yes, like I said, the only relevant output from that file was:
> 2013-06-17T08:11:22.300+0000: 2551.288: [GC 
> 870971K->216494K(4018176K), 145.1887460 secs]
>
>
> 2013/6/18 Takenori Sato <tsato@cloudian.com <ma...@cloudian.com>>
>
>     GC logging is not in system.log. But in the following file.
>
>     JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
>
>
>     At least, no GC logs are shown in your post.
>
>
>     On Tue, Jun 18, 2013 at 5:05 PM, Joel Samuelsson
>     <samuelsson.joel@gmail.com <ma...@gmail.com>> wrote:
>
>         Can't find any promotion failure.
>
>         In system.log this is what I get:
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490
>         GCInspector.java (line 122) GC for ParNew: 145189 ms for 1
>         collections, 225905072 used; max is 4114612224
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490
>         StatusLogger.java (line 57) Pool Name                  
>          Active Pending   Blocked
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,491
>         StatusLogger.java (line 72) ReadStage                        
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492
>         StatusLogger.java (line 72) RequestResponseStage            
>          0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492
>         StatusLogger.java (line 72) ReadRepairStage                  
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492
>         StatusLogger.java (line 72) MutationStage                    
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493
>         StatusLogger.java (line 72) ReplicateOnWriteStage            
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493
>         StatusLogger.java (line 72) GossipStage                      
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493
>         StatusLogger.java (line 72) AntiEntropyStage                
>          0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494
>         StatusLogger.java (line 72) MigrationStage                  
>          0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494
>         StatusLogger.java (line 72) StreamStage                      
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494
>         StatusLogger.java (line 72) MemtablePostFlusher              
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495
>         StatusLogger.java (line 72) FlushWriter                      
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495
>         StatusLogger.java (line 72) MiscStage                        
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499
>         StatusLogger.java (line 72) commitlog_archiver              
>          0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499
>         StatusLogger.java (line 72) InternalResponseStage            
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499
>         StatusLogger.java (line 72) HintedHandoff                    
>         0 0         0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500
>         StatusLogger.java (line 77) CompactionManager                 0 0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500
>         StatusLogger.java (line 89) MessagingService              
>          n/a 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500
>         StatusLogger.java (line 99) Cache Type     Size    Capacity
>          KeysToSave  Provider
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,504
>         StatusLogger.java (line 100) KeyCache      12129   2184533    
>         all
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505
>         StatusLogger.java (line 106) RowCache      0       0          
>         all org.apache.cassandra.cache.SerializingCacheProvider
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505
>         StatusLogger.java (line 113) ColumnFamily              
>          Memtable ops,data
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505
>         StatusLogger.java (line 116) system.NodeIdInfo 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505
>         StatusLogger.java (line 116) system.IndexInfo  0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505
>         StatusLogger.java (line 116) system.LocationInfo 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506
>         StatusLogger.java (line 116) system.Versions 3,103
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506
>         StatusLogger.java (line 116) system.schema_keyspacees 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506
>         StatusLogger.java (line 116) system.Migrations 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506
>         StatusLogger.java (line 116) system.schema_columnfamilies 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506
>         StatusLogger.java (line 116) system.schema_columns 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507
>         StatusLogger.java (line 116) system.HintsColumnFamily  0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507
>         StatusLogger.java (line 116) system.Schema 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507
>         StatusLogger.java (line 116) Keyspace.cf01                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507
>         StatusLogger.java (line 116) Keyspace.cf02                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507
>         StatusLogger.java (line 116) Keyspace.cf03                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508
>         StatusLogger.java (line 116) Keyspace.cf04                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508
>         StatusLogger.java (line 116) Keyspace.cf05                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508
>         StatusLogger.java (line 116) Keyspace.cf06                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508
>         StatusLogger.java (line 116) Keyspace.cf07                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508
>         StatusLogger.java (line 116) Keyspace.cf08                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508
>         StatusLogger.java (line 116) Keyspace.cf09                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509
>         StatusLogger.java (line 116) Keyspace.cf10                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509
>         StatusLogger.java (line 116) Keyspace.cf11                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509
>         StatusLogger.java (line 116) Keyspace.cf12                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509
>         StatusLogger.java (line 116) Keyspace.cf13                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509
>         StatusLogger.java (line 116) Keyspace.cf14                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510
>         StatusLogger.java (line 116) Keyspace.cf15                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510
>         StatusLogger.java (line 116) Keyspace.cf16                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510
>         StatusLogger.java (line 116) Keyspace.cf17                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510
>         StatusLogger.java (line 116) Keyspace.cf18                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510
>         StatusLogger.java (line 116) Keyspace.cf19                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511
>         StatusLogger.java (line 116) Keyspace.cf20                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511
>         StatusLogger.java (line 116) Keyspace.cf21                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511
>         StatusLogger.java (line 116) Keyspace.cf22                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511
>         StatusLogger.java (line 116) Keyspace.cf23                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511
>         StatusLogger.java (line 116) Keyspace.cf24                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512
>         StatusLogger.java (line 116) Keyspace.cf25                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512
>         StatusLogger.java (line 116) Keyspace.cf26                 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512
>         StatusLogger.java (line 116) StressKeyspace.StressStandard   0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512
>         StatusLogger.java (line 116) testing_Keyspace.cf01            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512
>         StatusLogger.java (line 116) testing_Keyspace.cf02            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513
>         StatusLogger.java (line 116) testing_Keyspace.cf03            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513
>         StatusLogger.java (line 116) testing_Keyspace.cf04            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513
>         StatusLogger.java (line 116) testing_Keyspace.cf05            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513
>         StatusLogger.java (line 116) testing_Keyspace.cf06            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513
>         StatusLogger.java (line 116) testing_Keyspace.cf07            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514
>         StatusLogger.java (line 116) testing_Keyspace.cf08            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514
>         StatusLogger.java (line 116) testing_Keyspace.cf09            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514
>         StatusLogger.java (line 116) testing_Keyspace.cf10            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514
>         StatusLogger.java (line 116) testing_Keyspace.cf11            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514
>         StatusLogger.java (line 116) testing_Keyspace.cf12            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515
>         StatusLogger.java (line 116) testing_Keyspace.cf13            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515
>         StatusLogger.java (line 116) testing_Keyspace.cf14            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515
>         StatusLogger.java (line 116) testing_Keyspace.cf15            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515
>         StatusLogger.java (line 116) testing_Keyspace.cf16            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515
>         StatusLogger.java (line 116) testing_Keyspace.cf17            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516
>         StatusLogger.java (line 116) testing_Keyspace.cf18            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516
>         StatusLogger.java (line 116) testing_Keyspace.cf19            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516
>         StatusLogger.java (line 116) testing_Keyspace.cf20            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516
>         StatusLogger.java (line 116) testing_Keyspace.cf21            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517
>         StatusLogger.java (line 116) testing_Keyspace.cf22            
>             0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517
>         StatusLogger.java (line 116) OpsCenter.rollups7200 0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517
>         StatusLogger.java (line 116) OpsCenter.rollups86400  0,0
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517
>         StatusLogger.java (line 116) OpsCenter.rollups60            
>         13745,3109686 <tel:3109686>
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517
>         StatusLogger.java (line 116) OpsCenter.events 18,826
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,518
>         StatusLogger.java (line 116) OpsCenter.rollups300  2516,570931
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519
>         StatusLogger.java (line 116) OpsCenter.pdps  9072,160850
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519
>         StatusLogger.java (line 116) OpsCenter.events_timeline  3,86
>          INFO [ScheduledTasks:1] 2013-06-17 08:13:47,520
>         StatusLogger.java (line 116) OpsCenter.settings  0,0
>
>         And from gc-1371454124.log I get:
>         2013-06-17T08:11:22.300+0000: 2551.288: [GC
>         870971K->216494K(4018176K), 145.1887460 secs]
>
>
>         2013/6/18 Takenori Sato <tsato@cloudian.com
>         <ma...@cloudian.com>>
>
>             Find "promotion failure". Bingo if it happened at the time.
>
>             Otherwise, post the relevant portion of the log here.
>             Someone may find a hint.
>
>
>             On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson
>             <samuelsson.joel@gmail.com
>             <ma...@gmail.com>> wrote:
>
>                 Just got a very long GC again. What am I to look for
>                 in the logging I just enabled?
>
>
>                 2013/6/17 Joel Samuelsson <samuelsson.joel@gmail.com
>                 <ma...@gmail.com>>
>
>                     > If you are talking about 1.2.x then I also have
>                     memory problems on the idle cluster: java memory
>                     constantly slow grows up to limit, then spend long
>                     time for GC. I never seen such behaviour for
>                      1.0.x and 1.1.x, where on idle cluster java
>                     memory stay on the same value.
>
>                     No I am running Cassandra 1.1.8.
>
>                     > Can you paste you gc config?
>
>                     I believe the relevant configs are these:
>                     # GC tuning options
>                     JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>                     JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>                     JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>                     JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>                     JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>                     JVM_OPTS="$JVM_OPTS
>                     -XX:CMSInitiatingOccupancyFraction=75"
>                     JVM_OPTS="$JVM_OPTS
>                     -XX:+UseCMSInitiatingOccupancyOnly"
>
>                     I haven't changed anything in the environment
>                     config up until now.
>
>                     > Also can you take a heap dump at 2 diff points
>                     so that we can compare it?
>
>                     I can't access the machine at all during the
>                     stop-the-world freezes. Was that what you wanted
>                     me to try?
>
>                     >Uncomment the followings in "cassandra-env.sh".
>                     Done. Will post results as soon as I get a new
>                     stop-the-world gc.
>
>                     > If you are unable to find a JIRA, file one
>
>                     Unless this turns out to be a problem on my end, I
>                     will.
>
>
>
>
>
>

Re: Reduce Cassandra GC

Posted by Mohit Anchlia <mo...@gmail.com>.

Is your young generation size set to 4GB? Can you paste the output of ps
-ef|grep cassandra ?
On Tue, Jun 18, 2013 at 8:48 AM, Joel Samuelsson
<sa...@gmail.com>wrote:

> Yes, like I said, the only relevant output from that file was:
>  2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K),
> 145.1887460 secs]
>
>
> 2013/6/18 Takenori Sato <ts...@cloudian.com>
>
>> GC logging is not in system.log. But in the following file.
>>
>>
>> JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
>>
>>
>> At least, no GC logs are shown in your post.
>>
>>
>> On Tue, Jun 18, 2013 at 5:05 PM, Joel Samuelsson <
>> samuelsson.joel@gmail.com> wrote:
>>
>>>  Can't find any promotion failure.
>>>
>>> In system.log this is what I get:
>>>   INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 GCInspector.java
>>> (line 122) GC for ParNew: 145189 ms for 1 collections, 225905072 used; max
>>> is 4114612224
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 StatusLogger.java (line
>>> 57) Pool Name                    Active   Pending   Blocked
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,491 StatusLogger.java (line
>>> 72) ReadStage                         0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
>>> 72) RequestResponseStage              0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
>>> 72) ReadRepairStage                   0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
>>> 72) MutationStage                     0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
>>> 72) ReplicateOnWriteStage             0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
>>> 72) GossipStage                       0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
>>> 72) AntiEntropyStage                  0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
>>> 72) MigrationStage                    0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
>>> 72) StreamStage                       0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
>>> 72) MemtablePostFlusher               0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
>>> 72) FlushWriter                       0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
>>> 72) MiscStage                         0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
>>> 72) commitlog_archiver                0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
>>> 72) InternalResponseStage             0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
>>> 72) HintedHandoff                     0         0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
>>> 77) CompactionManager                 0         0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
>>> 89) MessagingService                n/a       0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
>>> 99) Cache Type     Size    Capacity    KeysToSave  Provider
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,504 StatusLogger.java (line
>>> 100) KeyCache      12129   2184533     all
>>>
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>>> 106) RowCache      0       0           all
>>> org.apache.cassandra.cache.SerializingCacheProvider
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>>> 113) ColumnFamily                Memtable ops,data
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>>> 116) system.NodeIdInfo                         0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>>> 116) system.IndexInfo                          0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>>> 116) system.LocationInfo                       0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>>> 116) system.Versions                         3,103
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>>> 116) system.schema_keyspacees                   0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>>> 116) system.Migrations                         0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>>> 116) system.schema_columnfamilies                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>>> 116) system.schema_columns                     0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>>> 116) system.HintsColumnFamily                  0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>>> 116) system.Schema                             0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>>> 116) Keyspace.cf01                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>>> 116) Keyspace.cf02                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>>> 116) Keyspace.cf03                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>>> 116) Keyspace.cf04                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>>> 116) Keyspace.cf05                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>>> 116) Keyspace.cf06                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>>> 116) Keyspace.cf07                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>>> 116) Keyspace.cf08                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>>> 116) Keyspace.cf09                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>>> 116) Keyspace.cf10                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>>> 116) Keyspace.cf11                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>>> 116) Keyspace.cf12                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>>> 116) Keyspace.cf13                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>>> 116) Keyspace.cf14                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>>> 116) Keyspace.cf15                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>>> 116) Keyspace.cf16                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>>> 116) Keyspace.cf17                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>>> 116) Keyspace.cf18                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>>> 116) Keyspace.cf19                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>>> 116) Keyspace.cf20                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>>> 116) Keyspace.cf21                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>>> 116) Keyspace.cf22                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>>> 116) Keyspace.cf23                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>>> 116) Keyspace.cf24                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>>> 116) Keyspace.cf25                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>>> 116) Keyspace.cf26                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>>> 116) StressKeyspace.StressStandard                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>>> 116) testing_Keyspace.cf01                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>>> 116) testing_Keyspace.cf02                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>>> 116) testing_Keyspace.cf03                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>>> 116) testing_Keyspace.cf04                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>>> 116) testing_Keyspace.cf05                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>>> 116) testing_Keyspace.cf06                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>>> 116) testing_Keyspace.cf07                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>>> 116) testing_Keyspace.cf08                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>>> 116) testing_Keyspace.cf09                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>>> 116) testing_Keyspace.cf10                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>>> 116) testing_Keyspace.cf11                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>>> 116) testing_Keyspace.cf12                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>>> 116) testing_Keyspace.cf13                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>>> 116) testing_Keyspace.cf14                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>>> 116) testing_Keyspace.cf15                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>>> 116) testing_Keyspace.cf16                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>>> 116) testing_Keyspace.cf17                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>>> 116) testing_Keyspace.cf18                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>>> 116) testing_Keyspace.cf19                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>>> 116) testing_Keyspace.cf20                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>>> 116) testing_Keyspace.cf21                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>>> 116) testing_Keyspace.cf22                 0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>>> 116) OpsCenter.rollups7200                     0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>>> 116) OpsCenter.rollups86400                    0,0
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>>> 116) OpsCenter.rollups60             13745,3109686
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>>> 116) OpsCenter.events                       18,826
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,518 StatusLogger.java (line
>>> 116) OpsCenter.rollups300              2516,570931
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
>>> 116) OpsCenter.pdps                    9072,160850
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
>>> 116) OpsCenter.events_timeline                3,86
>>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,520 StatusLogger.java (line
>>> 116) OpsCenter.settings                        0,0
>>>
>>> And from gc-1371454124.log I get:
>>> 2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K),
>>> 145.1887460 secs]
>>>
>>>
>>> 2013/6/18 Takenori Sato <ts...@cloudian.com>
>>>
>>>> Find "promotion failure". Bingo if it happened at the time.
>>>>
>>>> Otherwise, post the relevant portion of the log here. Someone may find
>>>> a hint.
>>>>
>>>>
>>>> On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson <
>>>> samuelsson.joel@gmail.com> wrote:
>>>>
>>>>> Just got a very long GC again. What am I to look for in the logging I
>>>>> just enabled?
>>>>>
>>>>>
>>>>> 2013/6/17 Joel Samuelsson <sa...@gmail.com>
>>>>>
>>>>>>  > If you are talking about 1.2.x then I also have memory problems
>>>>>> on the idle cluster: java memory constantly slow grows up to limit, then
>>>>>> spend long time for GC. I never seen such behaviour for  1.0.x and 1.1.x,
>>>>>> where on idle cluster java memory stay on the same value.
>>>>>>
>>>>>> No I am running Cassandra 1.1.8.
>>>>>>
>>>>>> > Can you paste you gc config?
>>>>>>
>>>>>> I believe the relevant configs are these:
>>>>>>  # GC tuning options
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>>>>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>>>>
>>>>>> I haven't changed anything in the environment config up until now.
>>>>>>
>>>>>> > Also can you take a heap dump at 2 diff points so that we can
>>>>>> compare it?
>>>>>>
>>>>>> I can't access the machine at all during the stop-the-world freezes.
>>>>>> Was that what you wanted me to try?
>>>>>>
>>>>>> > Uncomment the followings in "cassandra-env.sh".
>>>>>> Done. Will post results as soon as I get a new stop-the-world gc.
>>>>>>
>>>>>> > If you are unable to find a JIRA, file one
>>>>>>
>>>>>> Unless this turns out to be a problem on my end, I will.
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

Yes, like I said, the only relevant output from that file was:
2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K),
145.1887460 secs]


2013/6/18 Takenori Sato <ts...@cloudian.com>

> GC logging is not in system.log. But in the following file.
>
> JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"
>
>
> At least, no GC logs are shown in your post.
>
>
> On Tue, Jun 18, 2013 at 5:05 PM, Joel Samuelsson <
> samuelsson.joel@gmail.com> wrote:
>
>> Can't find any promotion failure.
>>
>> In system.log this is what I get:
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 GCInspector.java (line
>> 122) GC for ParNew: 145189 ms for 1 collections, 225905072 used; max is
>> 4114612224
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 StatusLogger.java (line
>> 57) Pool Name                    Active   Pending   Blocked
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,491 StatusLogger.java (line
>> 72) ReadStage                         0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
>> 72) RequestResponseStage              0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
>> 72) ReadRepairStage                   0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
>> 72) MutationStage                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
>> 72) ReplicateOnWriteStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
>> 72) GossipStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
>> 72) AntiEntropyStage                  0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
>> 72) MigrationStage                    0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
>> 72) StreamStage                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
>> 72) MemtablePostFlusher               0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
>> 72) FlushWriter                       0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
>> 72) MiscStage                         0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
>> 72) commitlog_archiver                0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
>> 72) InternalResponseStage             0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
>> 72) HintedHandoff                     0         0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
>> 77) CompactionManager                 0         0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
>> 89) MessagingService                n/a       0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
>> 99) Cache Type     Size    Capacity    KeysToSave  Provider
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,504 StatusLogger.java (line
>> 100) KeyCache      12129   2184533     all
>>
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>> 106) RowCache      0       0           all
>> org.apache.cassandra.cache.SerializingCacheProvider
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>> 113) ColumnFamily                Memtable ops,data
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>> 116) system.NodeIdInfo                         0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>> 116) system.IndexInfo                          0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
>> 116) system.LocationInfo                       0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>> 116) system.Versions                         3,103
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>> 116) system.schema_keyspacees                   0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>> 116) system.Migrations                         0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>> 116) system.schema_columnfamilies                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
>> 116) system.schema_columns                     0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>> 116) system.HintsColumnFamily                  0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>> 116) system.Schema                             0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>> 116) Keyspace.cf01                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>> 116) Keyspace.cf02                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
>> 116) Keyspace.cf03                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>> 116) Keyspace.cf04                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>> 116) Keyspace.cf05                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>> 116) Keyspace.cf06                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>> 116) Keyspace.cf07                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>> 116) Keyspace.cf08                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
>> 116) Keyspace.cf09                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>> 116) Keyspace.cf10                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>> 116) Keyspace.cf11                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>> 116) Keyspace.cf12                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>> 116) Keyspace.cf13                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
>> 116) Keyspace.cf14                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>> 116) Keyspace.cf15                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>> 116) Keyspace.cf16                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>> 116) Keyspace.cf17                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>> 116) Keyspace.cf18                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
>> 116) Keyspace.cf19                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>> 116) Keyspace.cf20                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>> 116) Keyspace.cf21                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>> 116) Keyspace.cf22                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>> 116) Keyspace.cf23                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
>> 116) Keyspace.cf24                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>> 116) Keyspace.cf25                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>> 116) Keyspace.cf26                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>> 116) StressKeyspace.StressStandard                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>> 116) testing_Keyspace.cf01                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
>> 116) testing_Keyspace.cf02                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>> 116) testing_Keyspace.cf03                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>> 116) testing_Keyspace.cf04                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>> 116) testing_Keyspace.cf05                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>> 116) testing_Keyspace.cf06                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
>> 116) testing_Keyspace.cf07                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>> 116) testing_Keyspace.cf08                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>> 116) testing_Keyspace.cf09                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>> 116) testing_Keyspace.cf10                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>> 116) testing_Keyspace.cf11                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
>> 116) testing_Keyspace.cf12                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>> 116) testing_Keyspace.cf13                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>> 116) testing_Keyspace.cf14                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>> 116) testing_Keyspace.cf15                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>> 116) testing_Keyspace.cf16                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
>> 116) testing_Keyspace.cf17                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>> 116) testing_Keyspace.cf18                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>> 116) testing_Keyspace.cf19                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>> 116) testing_Keyspace.cf20                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
>> 116) testing_Keyspace.cf21                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>> 116) testing_Keyspace.cf22                 0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>> 116) OpsCenter.rollups7200                     0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>> 116) OpsCenter.rollups86400                    0,0
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>> 116) OpsCenter.rollups60             13745,3109686
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
>> 116) OpsCenter.events                       18,826
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,518 StatusLogger.java (line
>> 116) OpsCenter.rollups300              2516,570931
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
>> 116) OpsCenter.pdps                    9072,160850
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
>> 116) OpsCenter.events_timeline                3,86
>>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,520 StatusLogger.java (line
>> 116) OpsCenter.settings                        0,0
>>
>> And from gc-1371454124.log I get:
>> 2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K),
>> 145.1887460 secs]
>>
>>
>> 2013/6/18 Takenori Sato <ts...@cloudian.com>
>>
>>> Find "promotion failure". Bingo if it happened at the time.
>>>
>>> Otherwise, post the relevant portion of the log here. Someone may find a
>>> hint.
>>>
>>>
>>> On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson <
>>> samuelsson.joel@gmail.com> wrote:
>>>
>>>> Just got a very long GC again. What am I to look for in the logging I
>>>> just enabled?
>>>>
>>>>
>>>> 2013/6/17 Joel Samuelsson <sa...@gmail.com>
>>>>
>>>>> > If you are talking about 1.2.x then I also have memory problems on
>>>>> the idle cluster: java memory constantly slow grows up to limit, then spend
>>>>> long time for GC. I never seen such behaviour for  1.0.x and 1.1.x, where
>>>>> on idle cluster java memory stay on the same value.
>>>>>
>>>>> No I am running Cassandra 1.1.8.
>>>>>
>>>>> > Can you paste you gc config?
>>>>>
>>>>> I believe the relevant configs are these:
>>>>> # GC tuning options
>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>>>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>>>
>>>>> I haven't changed anything in the environment config up until now.
>>>>>
>>>>> > Also can you take a heap dump at 2 diff points so that we can
>>>>> compare it?
>>>>>
>>>>> I can't access the machine at all during the stop-the-world freezes.
>>>>> Was that what you wanted me to try?
>>>>>
>>>>> > Uncomment the followings in "cassandra-env.sh".
>>>>> Done. Will post results as soon as I get a new stop-the-world gc.
>>>>>
>>>>> > If you are unable to find a JIRA, file one
>>>>>
>>>>> Unless this turns out to be a problem on my end, I will.
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Reduce Cassandra GC

Posted by Takenori Sato <ts...@cloudian.com>.

GC logging is not in system.log. But in the following file.

JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"


At least, no GC logs are shown in your post.


On Tue, Jun 18, 2013 at 5:05 PM, Joel Samuelsson
<sa...@gmail.com>wrote:

> Can't find any promotion failure.
>
> In system.log this is what I get:
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 GCInspector.java (line
> 122) GC for ParNew: 145189 ms for 1 collections, 225905072 used; max is
> 4114612224
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 StatusLogger.java (line
> 57) Pool Name                    Active   Pending   Blocked
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,491 StatusLogger.java (line
> 72) ReadStage                         0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
> 72) RequestResponseStage              0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
> 72) ReadRepairStage                   0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
> 72) MutationStage                     0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
> 72) ReplicateOnWriteStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
> 72) GossipStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
> 72) AntiEntropyStage                  0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
> 72) MigrationStage                    0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
> 72) StreamStage                       0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
> 72) MemtablePostFlusher               0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
> 72) FlushWriter                       0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
> 72) MiscStage                         0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
> 72) commitlog_archiver                0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
> 72) InternalResponseStage             0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
> 72) HintedHandoff                     0         0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
> 77) CompactionManager                 0         0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
> 89) MessagingService                n/a       0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
> 99) Cache Type     Size    Capacity    KeysToSave  Provider
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,504 StatusLogger.java (line
> 100) KeyCache      12129   2184533     all
>
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
> 106) RowCache      0       0           all
> org.apache.cassandra.cache.SerializingCacheProvider
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
> 113) ColumnFamily                Memtable ops,data
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
> 116) system.NodeIdInfo                         0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
> 116) system.IndexInfo                          0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
> 116) system.LocationInfo                       0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
> 116) system.Versions                         3,103
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
> 116) system.schema_keyspacees                   0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
> 116) system.Migrations                         0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
> 116) system.schema_columnfamilies                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
> 116) system.schema_columns                     0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
> 116) system.HintsColumnFamily                  0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
> 116) system.Schema                             0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
> 116) Keyspace.cf01                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
> 116) Keyspace.cf02                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
> 116) Keyspace.cf03                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
> 116) Keyspace.cf04                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
> 116) Keyspace.cf05                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
> 116) Keyspace.cf06                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
> 116) Keyspace.cf07                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
> 116) Keyspace.cf08                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
> 116) Keyspace.cf09                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
> 116) Keyspace.cf10                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
> 116) Keyspace.cf11                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
> 116) Keyspace.cf12                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
> 116) Keyspace.cf13                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
> 116) Keyspace.cf14                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
> 116) Keyspace.cf15                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
> 116) Keyspace.cf16                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
> 116) Keyspace.cf17                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
> 116) Keyspace.cf18                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
> 116) Keyspace.cf19                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
> 116) Keyspace.cf20                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
> 116) Keyspace.cf21                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
> 116) Keyspace.cf22                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
> 116) Keyspace.cf23                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
> 116) Keyspace.cf24                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
> 116) Keyspace.cf25                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
> 116) Keyspace.cf26                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
> 116) StressKeyspace.StressStandard                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
> 116) testing_Keyspace.cf01                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
> 116) testing_Keyspace.cf02                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
> 116) testing_Keyspace.cf03                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
> 116) testing_Keyspace.cf04                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
> 116) testing_Keyspace.cf05                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
> 116) testing_Keyspace.cf06                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
> 116) testing_Keyspace.cf07                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
> 116) testing_Keyspace.cf08                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
> 116) testing_Keyspace.cf09                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
> 116) testing_Keyspace.cf10                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
> 116) testing_Keyspace.cf11                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
> 116) testing_Keyspace.cf12                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
> 116) testing_Keyspace.cf13                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
> 116) testing_Keyspace.cf14                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
> 116) testing_Keyspace.cf15                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
> 116) testing_Keyspace.cf16                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
> 116) testing_Keyspace.cf17                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
> 116) testing_Keyspace.cf18                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
> 116) testing_Keyspace.cf19                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
> 116) testing_Keyspace.cf20                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
> 116) testing_Keyspace.cf21                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
> 116) testing_Keyspace.cf22                 0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
> 116) OpsCenter.rollups7200                     0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
> 116) OpsCenter.rollups86400                    0,0
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
> 116) OpsCenter.rollups60             13745,3109686
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
> 116) OpsCenter.events                       18,826
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,518 StatusLogger.java (line
> 116) OpsCenter.rollups300              2516,570931
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
> 116) OpsCenter.pdps                    9072,160850
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
> 116) OpsCenter.events_timeline                3,86
>  INFO [ScheduledTasks:1] 2013-06-17 08:13:47,520 StatusLogger.java (line
> 116) OpsCenter.settings                        0,0
>
> And from gc-1371454124.log I get:
> 2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K),
> 145.1887460 secs]
>
>
> 2013/6/18 Takenori Sato <ts...@cloudian.com>
>
>> Find "promotion failure". Bingo if it happened at the time.
>>
>> Otherwise, post the relevant portion of the log here. Someone may find a
>> hint.
>>
>>
>> On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson <
>> samuelsson.joel@gmail.com> wrote:
>>
>>> Just got a very long GC again. What am I to look for in the logging I
>>> just enabled?
>>>
>>>
>>> 2013/6/17 Joel Samuelsson <sa...@gmail.com>
>>>
>>>> > If you are talking about 1.2.x then I also have memory problems on
>>>> the idle cluster: java memory constantly slow grows up to limit, then spend
>>>> long time for GC. I never seen such behaviour for  1.0.x and 1.1.x, where
>>>> on idle cluster java memory stay on the same value.
>>>>
>>>> No I am running Cassandra 1.1.8.
>>>>
>>>> > Can you paste you gc config?
>>>>
>>>> I believe the relevant configs are these:
>>>> # GC tuning options
>>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>>
>>>> I haven't changed anything in the environment config up until now.
>>>>
>>>> > Also can you take a heap dump at 2 diff points so that we can
>>>> compare it?
>>>>
>>>> I can't access the machine at all during the stop-the-world freezes.
>>>> Was that what you wanted me to try?
>>>>
>>>> > Uncomment the followings in "cassandra-env.sh".
>>>> Done. Will post results as soon as I get a new stop-the-world gc.
>>>>
>>>> > If you are unable to find a JIRA, file one
>>>>
>>>> Unless this turns out to be a problem on my end, I will.
>>>>
>>>
>>>
>>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

Can't find any promotion failure.

In system.log this is what I get:
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 GCInspector.java (line
122) GC for ParNew: 145189 ms for 1 collections, 225905072 used; max is
4114612224
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,490 StatusLogger.java (line
57) Pool Name                    Active   Pending   Blocked
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,491 StatusLogger.java (line
72) ReadStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
72) RequestResponseStage              0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
72) ReadRepairStage                   0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,492 StatusLogger.java (line
72) MutationStage                     0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
72) ReplicateOnWriteStage             0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
72) GossipStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,493 StatusLogger.java (line
72) AntiEntropyStage                  0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
72) MigrationStage                    0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
72) StreamStage                       0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,494 StatusLogger.java (line
72) MemtablePostFlusher               0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
72) FlushWriter                       0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,495 StatusLogger.java (line
72) MiscStage                         0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
72) commitlog_archiver                0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
72) InternalResponseStage             0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,499 StatusLogger.java (line
72) HintedHandoff                     0         0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
77) CompactionManager                 0         0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
89) MessagingService                n/a       0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,500 StatusLogger.java (line
99) Cache Type     Size    Capacity    KeysToSave  Provider
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,504 StatusLogger.java (line
100) KeyCache      12129   2184533     all

 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
106) RowCache      0       0           all
org.apache.cassandra.cache.SerializingCacheProvider
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
113) ColumnFamily                Memtable ops,data
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
116) system.NodeIdInfo                         0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
116) system.IndexInfo                          0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,505 StatusLogger.java (line
116) system.LocationInfo                       0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
116) system.Versions                         3,103
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
116) system.schema_keyspacees                   0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
116) system.Migrations                         0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
116) system.schema_columnfamilies                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,506 StatusLogger.java (line
116) system.schema_columns                     0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
116) system.HintsColumnFamily                  0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
116) system.Schema                             0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
116) Keyspace.cf01                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
116) Keyspace.cf02                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,507 StatusLogger.java (line
116) Keyspace.cf03                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
116) Keyspace.cf04                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
116) Keyspace.cf05                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
116) Keyspace.cf06                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
116) Keyspace.cf07                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
116) Keyspace.cf08                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,508 StatusLogger.java (line
116) Keyspace.cf09                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
116) Keyspace.cf10                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
116) Keyspace.cf11                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
116) Keyspace.cf12                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
116) Keyspace.cf13                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,509 StatusLogger.java (line
116) Keyspace.cf14                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
116) Keyspace.cf15                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
116) Keyspace.cf16                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
116) Keyspace.cf17                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
116) Keyspace.cf18                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,510 StatusLogger.java (line
116) Keyspace.cf19                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
116) Keyspace.cf20                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
116) Keyspace.cf21                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
116) Keyspace.cf22                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
116) Keyspace.cf23                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,511 StatusLogger.java (line
116) Keyspace.cf24                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
116) Keyspace.cf25                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
116) Keyspace.cf26                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
116) StressKeyspace.StressStandard                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
116) testing_Keyspace.cf01                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,512 StatusLogger.java (line
116) testing_Keyspace.cf02                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
116) testing_Keyspace.cf03                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
116) testing_Keyspace.cf04                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
116) testing_Keyspace.cf05                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
116) testing_Keyspace.cf06                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,513 StatusLogger.java (line
116) testing_Keyspace.cf07                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
116) testing_Keyspace.cf08                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
116) testing_Keyspace.cf09                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
116) testing_Keyspace.cf10                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
116) testing_Keyspace.cf11                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,514 StatusLogger.java (line
116) testing_Keyspace.cf12                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
116) testing_Keyspace.cf13                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
116) testing_Keyspace.cf14                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
116) testing_Keyspace.cf15                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
116) testing_Keyspace.cf16                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,515 StatusLogger.java (line
116) testing_Keyspace.cf17                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
116) testing_Keyspace.cf18                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
116) testing_Keyspace.cf19                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
116) testing_Keyspace.cf20                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,516 StatusLogger.java (line
116) testing_Keyspace.cf21                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
116) testing_Keyspace.cf22                 0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
116) OpsCenter.rollups7200                     0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
116) OpsCenter.rollups86400                    0,0
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
116) OpsCenter.rollups60             13745,3109686
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,517 StatusLogger.java (line
116) OpsCenter.events                       18,826
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,518 StatusLogger.java (line
116) OpsCenter.rollups300              2516,570931
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
116) OpsCenter.pdps                    9072,160850
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,519 StatusLogger.java (line
116) OpsCenter.events_timeline                3,86
 INFO [ScheduledTasks:1] 2013-06-17 08:13:47,520 StatusLogger.java (line
116) OpsCenter.settings                        0,0

And from gc-1371454124.log I get:
2013-06-17T08:11:22.300+0000: 2551.288: [GC 870971K->216494K(4018176K),
145.1887460 secs]


2013/6/18 Takenori Sato <ts...@cloudian.com>

> Find "promotion failure". Bingo if it happened at the time.
>
> Otherwise, post the relevant portion of the log here. Someone may find a
> hint.
>
>
> On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson <
> samuelsson.joel@gmail.com> wrote:
>
>> Just got a very long GC again. What am I to look for in the logging I
>> just enabled?
>>
>>
>> 2013/6/17 Joel Samuelsson <sa...@gmail.com>
>>
>>> > If you are talking about 1.2.x then I also have memory problems on the
>>> idle cluster: java memory constantly slow grows up to limit, then spend
>>> long time for GC. I never seen such behaviour for  1.0.x and 1.1.x, where
>>> on idle cluster java memory stay on the same value.
>>>
>>> No I am running Cassandra 1.1.8.
>>>
>>> > Can you paste you gc config?
>>>
>>> I believe the relevant configs are these:
>>> # GC tuning options
>>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>>
>>> I haven't changed anything in the environment config up until now.
>>>
>>> > Also can you take a heap dump at 2 diff points so that we can compare
>>> it?
>>>
>>> I can't access the machine at all during the stop-the-world freezes. Was
>>> that what you wanted me to try?
>>>
>>> > Uncomment the followings in "cassandra-env.sh".
>>> Done. Will post results as soon as I get a new stop-the-world gc.
>>>
>>> > If you are unable to find a JIRA, file one
>>>
>>> Unless this turns out to be a problem on my end, I will.
>>>
>>
>>
>

Re: Reduce Cassandra GC

Posted by Takenori Sato <ts...@cloudian.com>.

Find "promotion failure". Bingo if it happened at the time.

Otherwise, post the relevant portion of the log here. Someone may find a
hint.


On Mon, Jun 17, 2013 at 5:51 PM, Joel Samuelsson
<sa...@gmail.com>wrote:

> Just got a very long GC again. What am I to look for in the logging I just
> enabled?
>
>
> 2013/6/17 Joel Samuelsson <sa...@gmail.com>
>
>> > If you are talking about 1.2.x then I also have memory problems on the
>> idle cluster: java memory constantly slow grows up to limit, then spend
>> long time for GC. I never seen such behaviour for  1.0.x and 1.1.x, where
>> on idle cluster java memory stay on the same value.
>>
>> No I am running Cassandra 1.1.8.
>>
>> > Can you paste you gc config?
>>
>> I believe the relevant configs are these:
>> # GC tuning options
>> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
>> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
>> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
>> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
>> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
>> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
>> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>>
>> I haven't changed anything in the environment config up until now.
>>
>> > Also can you take a heap dump at 2 diff points so that we can compare
>> it?
>>
>> I can't access the machine at all during the stop-the-world freezes. Was
>> that what you wanted me to try?
>>
>> > Uncomment the followings in "cassandra-env.sh".
>> Done. Will post results as soon as I get a new stop-the-world gc.
>>
>> > If you are unable to find a JIRA, file one
>>
>> Unless this turns out to be a problem on my end, I will.
>>
>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

Just got a very long GC again. What am I to look for in the logging I just
enabled?


2013/6/17 Joel Samuelsson <sa...@gmail.com>

> > If you are talking about 1.2.x then I also have memory problems on the
> idle cluster: java memory constantly slow grows up to limit, then spend
> long time for GC. I never seen such behaviour for  1.0.x and 1.1.x, where
> on idle cluster java memory stay on the same value.
>
> No I am running Cassandra 1.1.8.
>
> > Can you paste you gc config?
>
> I believe the relevant configs are these:
> # GC tuning options
> JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
> JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
> JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
> JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
> JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
>
> I haven't changed anything in the environment config up until now.
>
> > Also can you take a heap dump at 2 diff points so that we can compare
> it?
>
> I can't access the machine at all during the stop-the-world freezes. Was
> that what you wanted me to try?
>
> > Uncomment the followings in "cassandra-env.sh".
> Done. Will post results as soon as I get a new stop-the-world gc.
>
> > If you are unable to find a JIRA, file one
>
> Unless this turns out to be a problem on my end, I will.
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

> If you are talking about 1.2.x then I also have memory problems on the
idle cluster: java memory constantly slow grows up to limit, then spend
long time for GC. I never seen such behaviour for  1.0.x and 1.1.x, where
on idle cluster java memory stay on the same value.

No I am running Cassandra 1.1.8.

> Can you paste you gc config?

I believe the relevant configs are these:
# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

I haven't changed anything in the environment config up until now.

> Also can you take a heap dump at 2 diff points so that we can compare it?

I can't access the machine at all during the stop-the-world freezes. Was
that what you wanted me to try?

> Uncomment the followings in "cassandra-env.sh".
Done. Will post results as soon as I get a new stop-the-world gc.

> If you are unable to find a JIRA, file one

Unless this turns out to be a problem on my end, I will.

Re: Reduce Cassandra GC

Posted by Takenori Sato <ts...@cloudian.com>.

> Also can you take a heap dump at 2 diff points so that we can compare it?

Also note that a promotion failure won't happen by a particular object, but
by a fragmentation in Old Generation space. So I am not sure if you can't
tell by a heap dump comparison.


On Sun, Jun 16, 2013 at 4:44 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> Can you paste you gc config? Also can you take a heap dump at 2 diff
> points so that we can compare it?
>
> Quick thing to do would be to do a histo live at 2 points and compare
>
> Sent from my iPhone
>
> On Jun 15, 2013, at 6:57 AM, Takenori Sato <ts...@cloudian.com> wrote:
>
> > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600
>
> This says GC for New Generation took so long. And this is usually
> unlikely.
>
> The only situation I am aware of is when a fairly large object is created,
> and which can not be promoted to Old Generation because it requires such a
> large *contiguous* memory space that is unavailable at the point in time.
> This is called promotion failure. So it has to wait until concurrent
> collector collects a large enough space. Thus you experience stop the
> world. But I think it is not stop the world, but only stop the new world.
>
> For example in case of Cassandra, a large number of
> in_memory_compaction_limit_in_mb can cause this. This is a limit when a
> compaction compacts(merges) rows of a key into the latest in memory. So
> this creates a large byte array up to the number.
>
> You can confirm this by enabling promotion failure GC logging in the
> future, and by checking compactions executed at that point in time.
>
>
>
> On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli <rc...@eventbrite.com>wrote:
>
>> On Fri, Jun 7, 2013 at 12:42 PM, Igor <ig...@4friends.od.ua> wrote:
>> > If you are talking about 1.2.x then I also have memory problems on the
>> idle
>> > cluster: java memory constantly slow grows up to limit, then spend long
>> time
>> > for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
>> > cluster java memory stay on the same value.
>>
>> If you are not aware of a pre-existing JIRA, I strongly encourage you to :
>>
>> 1) Document your experience of this.
>> 2) Search issues.apache.org for anything that sounds similar.
>> 3) If you are unable to find a JIRA, file one.
>>
>> Thanks!
>>
>> =Rob
>>
>
>

Re: Reduce Cassandra GC

Posted by Takenori Sato <ts...@cloudian.com>.

Uncomment the followings in "cassandra-env.sh".

JVM_OPTS="$JVM_OPTS -XX:+PrintGCDateStamps"

JVM_OPTS="$JVM_OPTS -XX:+PrintPromotionFailure"
JVM_OPTS="$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log"

*> *Also can you take a heap dump at 2 diff points so that we can compare it?

No, I'm afraid. I ordinary use profiling tools, but am not aware of
anything that could respond during this event.



On Sun, Jun 16, 2013 at 4:44 AM, Mohit Anchlia <mo...@gmail.com>wrote:

> Can you paste you gc config? Also can you take a heap dump at 2 diff
> points so that we can compare it?
>
> Quick thing to do would be to do a histo live at 2 points and compare
>
> Sent from my iPhone
>
> On Jun 15, 2013, at 6:57 AM, Takenori Sato <ts...@cloudian.com> wrote:
>
> > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600
>
> This says GC for New Generation took so long. And this is usually
> unlikely.
>
> The only situation I am aware of is when a fairly large object is created,
> and which can not be promoted to Old Generation because it requires such a
> large *contiguous* memory space that is unavailable at the point in time.
> This is called promotion failure. So it has to wait until concurrent
> collector collects a large enough space. Thus you experience stop the
> world. But I think it is not stop the world, but only stop the new world.
>
> For example in case of Cassandra, a large number of
> in_memory_compaction_limit_in_mb can cause this. This is a limit when a
> compaction compacts(merges) rows of a key into the latest in memory. So
> this creates a large byte array up to the number.
>
> You can confirm this by enabling promotion failure GC logging in the
> future, and by checking compactions executed at that point in time.
>
>
>
> On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli <rc...@eventbrite.com>wrote:
>
>> On Fri, Jun 7, 2013 at 12:42 PM, Igor <ig...@4friends.od.ua> wrote:
>> > If you are talking about 1.2.x then I also have memory problems on the
>> idle
>> > cluster: java memory constantly slow grows up to limit, then spend long
>> time
>> > for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
>> > cluster java memory stay on the same value.
>>
>> If you are not aware of a pre-existing JIRA, I strongly encourage you to :
>>
>> 1) Document your experience of this.
>> 2) Search issues.apache.org for anything that sounds similar.
>> 3) If you are unable to find a JIRA, file one.
>>
>> Thanks!
>>
>> =Rob
>>
>
>

Re: Reduce Cassandra GC

Posted by Mohit Anchlia <mo...@gmail.com>.

Can you paste you gc config? Also can you take a heap dump at 2 diff points so that we can compare it?

Quick thing to do would be to do a histo live at 2 points and compare

Sent from my iPhone

On Jun 15, 2013, at 6:57 AM, Takenori Sato <ts...@cloudian.com> wrote:

> > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
> 
> This says GC for New Generation took so long. And this is usually unlikely. 
> 
> The only situation I am aware of is when a fairly large object is created, and which can not be promoted to Old Generation because it requires such a large *contiguous* memory space that is unavailable at the point in time. This is called promotion failure. So it has to wait until concurrent collector collects a large enough space. Thus you experience stop the world. But I think it is not stop the world, but only stop the new world.
> 
> For example in case of Cassandra, a large number of in_memory_compaction_limit_in_mb can cause this. This is a limit when a compaction compacts(merges) rows of a key into the latest in memory. So this creates a large byte array up to the number.
> 
> You can confirm this by enabling promotion failure GC logging in the future, and by checking compactions executed at that point in time.
> 
> 
> 
> On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli <rc...@eventbrite.com> wrote:
>> On Fri, Jun 7, 2013 at 12:42 PM, Igor <ig...@4friends.od.ua> wrote:
>> > If you are talking about 1.2.x then I also have memory problems on the idle
>> > cluster: java memory constantly slow grows up to limit, then spend long time
>> > for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
>> > cluster java memory stay on the same value.
>> 
>> If you are not aware of a pre-existing JIRA, I strongly encourage you to :
>> 
>> 1) Document your experience of this.
>> 2) Search issues.apache.org for anything that sounds similar.
>> 3) If you are unable to find a JIRA, file one.
>> 
>> Thanks!
>> 
>> =Rob
>

Re: Reduce Cassandra GC

Posted by Takenori Sato <ts...@cloudian.com>.

> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
1046937600

This says GC for New Generation took so long. And this is usually unlikely.

The only situation I am aware of is when a fairly large object is created,
and which can not be promoted to Old Generation because it requires such a
large *contiguous* memory space that is unavailable at the point in time.
This is called promotion failure. So it has to wait until concurrent
collector collects a large enough space. Thus you experience stop the
world. But I think it is not stop the world, but only stop the new world.

For example in case of Cassandra, a large number of
in_memory_compaction_limit_in_mb can cause this. This is a limit when a
compaction compacts(merges) rows of a key into the latest in memory. So
this creates a large byte array up to the number.

You can confirm this by enabling promotion failure GC logging in the
future, and by checking compactions executed at that point in time.

On Sat, Jun 15, 2013 at 10:01 AM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Jun 7, 2013 at 12:42 PM, Igor <ig...@4friends.od.ua> wrote:
> > If you are talking about 1.2.x then I also have memory problems on the
> idle
> > cluster: java memory constantly slow grows up to limit, then spend long
> time
> > for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
> > cluster java memory stay on the same value.
>
> If you are not aware of a pre-existing JIRA, I strongly encourage you to :
>
> 1) Document your experience of this.
> 2) Search issues.apache.org for anything that sounds similar.
> 3) If you are unable to find a JIRA, file one.
>
> Thanks!
>
> =Rob
>

Re: Reduce Cassandra GC

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Jun 7, 2013 at 12:42 PM, Igor <ig...@4friends.od.ua> wrote:
> If you are talking about 1.2.x then I also have memory problems on the idle
> cluster: java memory constantly slow grows up to limit, then spend long time
> for GC. I never seen such behaviour for 1.0.x and 1.1.x, where on idle
> cluster java memory stay on the same value.

If you are not aware of a pre-existing JIRA, I strongly encourage you to :

1) Document your experience of this.
2) Search issues.apache.org for anything that sounds similar.
3) If you are unable to find a JIRA, file one.

Thanks!

=Rob

Re: Reduce Cassandra GC

Posted by Igor <ig...@4friends.od.ua>.

If you are talking about 1.2.x then I also have memory problems on the 
idle cluster: java memory constantly slow grows up to limit, then spend 
long time for GC. I never seen such behaviour for 1.0.x and 1.1.x, where 
on idle cluster java memory stay on the same value.

On 06/07/2013 05:19 PM, Joel Samuelsson wrote:
> I keep having issues with GC. Besides the cluster mentioned above, we 
> also have a single node development cluster having the same issues. 
> This node has 12.33 GB data, a couple of million skinny rows and 
> basically no load. It has default memory settings but keep getting 
> very long stop-the-world GC pauses:
> INFO [ScheduledTasks:1] 2013-06-07 10:37:02,537 GCInspector.java (line 
> 122) GC for ParNew: 99342 ms for 1 collections, 1400754488 used; max 
> is 4114612224
> To try to rule out amount of memory, I set it to 16GB (we're on a 
> virtual environment), with 4GB of it for Cassandra heap but that 
> didn't help either, the incredibly long GC pauses keep coming.
> So I think something else is causing these issues, unless everyone is 
> having really long GC pauses (which I doubt). I came across this thread:
> http://www.mail-archive.com/user@cassandra.apache.org/msg24042.html
> suggesting # date -s “`date`” might help my issues. It didn't however 
> (unless I am supposed to replace that second date with the actual date?).
>
> Has anyone had similar issues?
>
>
> 2013/4/17 aaron morton <aaron@thelastpickle.com 
> <ma...@thelastpickle.com>>
>
>     > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java
>     (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416
>     used; max is 1046937600
>     This does not say that the heap is full.
>     ParNew is GC activity for the new heap, which is typically a
>     smaller part of the overall heap.
>
>     It sounds like you are running with defaults for the memory
>     config, which is generally a good idea. But 4GB total memory for a
>     node is on the small size.
>
>     Try some changes, edit the cassandra-env.sh file and change
>
>     MAX_HEAP_SIZE="2G"
>     HEAP_NEWSIZE="400M"
>
>     You may also want to try:
>
>     MAX_HEAP_SIZE="2G"
>     HEAP_NEWSIZE="800M"
>     JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
>     JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>
>     The size of the new heap generally depends on the number of cores
>     available, see the commends in the -env file.
>
>     An older discussion about memory use, not that in 1.2 the bloom
>     filters (and compression data) are off heap now.
>     http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
>
>     Hope that helps.
>
>     -----------------
>     Aaron Morton
>     Freelance Cassandra Consultant
>     New Zealand
>
>     @aaronmorton
>     http://www.thelastpickle.com
>
>     On 17/04/2013, at 11:06 PM, Joel Samuelsson
>     <samuelsson.joel@gmail.com <ma...@gmail.com>> wrote:
>
>     > You're right, it's probably hard. I should have provided more data.
>     >
>     > I'm running Ubuntu 10.04 LTS with JNA installed. I believe this
>     line in the log indicates that JNA is working, please correct me
>     if I'm wrong:
>     > CLibrary.java (line 111) JNA mlockall successful
>     >
>     > Total amount of RAM is 4GB.
>     >
>     > My description of data size was very bad. Sorry about that. Data
>     set size is 12.3 GB per node, compressed.
>     >
>     > Heap size is 998.44MB according to nodetool info.
>     > Key cache is 49MB bytes according to nodetool info.
>     > Row cache size is 0 bytes acoording to nodetool info.
>     > Max new heap is 205MB kbytes according to Memory Pool "Par Eden
>     Space" max in jconsole.
>     > Memtable is left at default which should give it 333MB according
>     to documentation (uncertain where I can verify this).
>     >
>     > Our production cluster seems similar to your dev cluster so
>     possibly increasing the heap to 2GB might help our issues.
>     >
>     > I am still interested in getting rough estimates of how much
>     heap will be needed as data grows. Other than empirical studies
>     how would I go about getting such estimates?
>     >
>     >
>     > 2013/4/16 Viktor Jevdokimov <Viktor.Jevdokimov@adform.com
>     <ma...@adform.com>>
>     > How one could provide any help without any knowledge about your
>     cluster, node and environment settings?
>     >
>     >
>     >
>     > 40GB was calculated from 2 nodes with RF=2 (each has 100% data
>     range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without
>     compression and any overhead (sstable, bloom filters and indexes).
>     >
>     >
>     >
>     > With ParNew GC time such as yours even if it is a swapping issue
>     I could say only that heap size is too small.
>     >
>     >
>     >
>     > Check Heap, New Heap sizes, memtable and cache sizes. Are you on
>     Linux? Is JNA installed and used? What is total amount of RAM?
>     >
>     >
>     >
>     > Just for a DEV environment we use 3 virtual machines with 4GB
>     RAM and use 2GB heap without any GC issue with amount of data from
>     0 to 16GB compressed on each node. Memtable space sized to 100MB,
>     New Heap 400MB.
>     >
>     >
>     >
>     > Best regards / Pagarbiai
>     > Viktor Jevdokimov
>     > Senior Developer
>     >
>     > Email: Viktor.Jevdokimov@adform.com
>     <ma...@adform.com>
>     > Phone: +370 5 212 3063 <tel:%2B370%205%20212%203063>, Fax +370 5
>     261 0453 <tel:%2B370%205%20261%200453>
>     > J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>     > Follow us on Twitter: @adforminsider
>     > Take a ride with Adform's Rich Media Suite
>     > <signature-logo29.png>
>     > <signature-best-employer-logo4823.png>
>     >
>     > Disclaimer: The information contained in this message and
>     attachments is intended solely for the attention and use of the
>     named addressee and may be confidential. If you are not the
>     intended recipient, you are reminded that the information remains
>     the property of the sender. You must not use, disclose,
>     distribute, copy, print or rely on this e-mail. If you have
>     received this message in error, please contact the sender
>     immediately and irrevocably delete this message and any copies.
>     >
>     > From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com
>     <ma...@gmail.com>]
>     > Sent: Tuesday, April 16, 2013 12:52
>     > To: user@cassandra.apache.org <ma...@cassandra.apache.org>
>     > Subject: Re: Reduce Cassandra GC
>     >
>     >
>     >
>     > How do you calculate the heap / data size ratio? Is this a
>     linear ratio?
>     >
>     >
>     >
>     > Each node has slightly more than 12 GB right now though.
>     >
>     >
>     >
>     > 2013/4/16 Viktor Jevdokimov <Viktor.Jevdokimov@adform.com
>     <ma...@adform.com>>
>     >
>     > For a >40GB of data 1GB of heap is too low.
>     >
>     >
>     >
>     > Best regards / Pagarbiai
>     >
>     > Viktor Jevdokimov
>     >
>     > Senior Developer
>     >
>     >
>     >
>     > Email: Viktor.Jevdokimov@adform.com
>     <ma...@adform.com>
>     >
>     > Phone: +370 5 212 3063 <tel:%2B370%205%20212%203063>, Fax +370 5
>     261 0453 <tel:%2B370%205%20261%200453>
>     >
>     > J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>     >
>     > Follow us on Twitter: @adforminsider
>     >
>     > Take a ride with Adform's Rich Media Suite
>     >
>     > <image001.png>
>     >
>     > <image002.png>
>     >
>     >
>     > Disclaimer: The information contained in this message and
>     attachments is intended solely for the attention and use of the
>     named addressee and may be confidential. If you are not the
>     intended recipient, you are reminded that the information remains
>     the property of the sender. You must not use, disclose,
>     distribute, copy, print or rely on this e-mail. If you have
>     received this message in error, please contact the sender
>     immediately and irrevocably delete this message and any copies.
>     >
>     >
>     >
>     > From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com
>     <ma...@gmail.com>]
>     > Sent: Tuesday, April 16, 2013 10:47
>     > To: user@cassandra.apache.org <ma...@cassandra.apache.org>
>     > Subject: Reduce Cassandra GC
>     >
>     >
>     >
>     > Hi,
>     >
>     >
>     >
>     > We have a small production cluster with two nodes. The load on
>     the nodes is very small, around 20 reads / sec and about the same
>     for writes. There are around 2.5 million keys in the cluster and a
>     RF of 2.
>     >
>     >
>     >
>     > About 2.4 million of the rows are skinny (6 columns) and around
>     3kb in size (each). Currently, scripts are running, accessing all
>     of the keys in timeorder to do some calculations.
>     >
>     >
>     >
>     > While running the scripts, the nodes go down and then come back
>     up 6-7 minutes later. This seems to be due to GC. I get lines like
>     this in the log:
>     >
>     > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java
>     (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416
>     used; max is 1046937600
>     >
>     >
>     >
>     > However, the heap is not full. The heap usage has a jagged
>     pattern going from 60% up to 70% during 5 minutes and then back
>     down to 60% the next 5 minutes and so on. I get no "Heap is X
>     full..." messages. Every once in a while at one of these peaks, I
>     get these stop-the-world GC for 6-7 minutes. Why does GC take up
>     so much time even though the heap isn't full?
>     >
>     >
>     >
>     > I am aware that my access patterns make key caching very
>     unlikely to be high. And indeed, my average key cache hit ratio
>     during the run of the scripts is around 0.5%. I tried disabling
>     key caching on the accessed column family (UPDATE COLUMN FAMILY cf
>     WITH caching=none;) through the cassandra-cli but I get the same
>     behaviour. Is the turning key cache off effective immediately?
>     >
>     >
>     >
>     > Stop-the-world GC is fine if it happens for a few seconds but
>     having them for several minutes doesn't work. Any other
>     suggestions to remove them?
>     >
>     >
>     >
>     > Best regards,
>     >
>     > Joel Samuelsson
>     >
>     >
>     >
>     >
>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

I keep having issues with GC. Besides the cluster mentioned above, we also
have a single node development cluster having the same issues. This node
has 12.33 GB data, a couple of million skinny rows and basically no load.
It has default memory settings but keep getting very long stop-the-world GC
pauses:
INFO [ScheduledTasks:1] 2013-06-07 10:37:02,537 GCInspector.java (line 122)
GC for ParNew: 99342 ms for 1 collections, 1400754488 used; max is
4114612224
To try to rule out amount of memory, I set it to 16GB (we're on a virtual
environment), with 4GB of it for Cassandra heap but that didn't help
either, the incredibly long GC pauses keep coming.
So I think something else is causing these issues, unless everyone is
having really long GC pauses (which I doubt). I came across this thread:
http://www.mail-archive.com/user@cassandra.apache.org/msg24042.html
suggesting # date -s “`date`” might help my issues. It didn't however
(unless I am supposed to replace that second date with the actual date?).

Has anyone had similar issues?


2013/4/17 aaron morton <aa...@thelastpickle.com>

> > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600
> This does not say that the heap is full.
> ParNew is GC activity for the new heap, which is typically a smaller part
> of the overall heap.
>
> It sounds like you are running with defaults for the memory config, which
> is generally a good idea. But 4GB total memory for a node is on the small
> size.
>
> Try some changes, edit the cassandra-env.sh file and change
>
> MAX_HEAP_SIZE="2G"
> HEAP_NEWSIZE="400M"
>
> You may also want to try:
>
> MAX_HEAP_SIZE="2G"
> HEAP_NEWSIZE="800M"
> JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
> JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"
>
> The size of the new heap generally depends on the number of cores
> available, see the commends in the -env file.
>
> An older discussion about memory use, not that in 1.2 the bloom filters
> (and compression data) are off heap now.
> http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html
>
> Hope that helps.
>
> -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 17/04/2013, at 11:06 PM, Joel Samuelsson <sa...@gmail.com>
> wrote:
>
> > You're right, it's probably hard. I should have provided more data.
> >
> > I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in
> the log indicates that JNA is working, please correct me if I'm wrong:
> > CLibrary.java (line 111) JNA mlockall successful
> >
> > Total amount of RAM is 4GB.
> >
> > My description of data size was very bad. Sorry about that. Data set
> size is 12.3 GB per node, compressed.
> >
> > Heap size is 998.44MB according to nodetool info.
> > Key cache is 49MB bytes according to nodetool info.
> > Row cache size is 0 bytes acoording to nodetool info.
> > Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space"
> max in jconsole.
> > Memtable is left at default which should give it 333MB according to
> documentation (uncertain where I can verify this).
> >
> > Our production cluster seems similar to your dev cluster so possibly
> increasing the heap to 2GB might help our issues.
> >
> > I am still interested in getting rough estimates of how much heap will
> be needed as data grows. Other than empirical studies how would I go about
> getting such estimates?
> >
> >
> > 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
> > How one could provide any help without any knowledge about your cluster,
> node and environment settings?
> >
> >
> >
> > 40GB was calculated from 2 nodes with RF=2 (each has 100% data range),
> 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any
> overhead (sstable, bloom filters and indexes).
> >
> >
> >
> > With ParNew GC time such as yours even if it is a swapping issue I could
> say only that heap size is too small.
> >
> >
> >
> > Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux?
> Is JNA installed and used? What is total amount of RAM?
> >
> >
> >
> > Just for a DEV environment we use 3 virtual machines with 4GB RAM and
> use 2GB heap without any GC issue with amount of data from 0 to 16GB
> compressed on each node. Memtable space sized to 100MB, New Heap 400MB.
> >
> >
> >
> > Best regards / Pagarbiai
> > Viktor Jevdokimov
> > Senior Developer
> >
> > Email: Viktor.Jevdokimov@adform.com
> > Phone: +370 5 212 3063, Fax +370 5 261 0453
> > J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> > Follow us on Twitter: @adforminsider
> > Take a ride with Adform's Rich Media Suite
> > <signature-logo29.png>
> > <signature-best-employer-logo4823.png>
> >
> > Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
> >
> > From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
> > Sent: Tuesday, April 16, 2013 12:52
> > To: user@cassandra.apache.org
> > Subject: Re: Reduce Cassandra GC
> >
> >
> >
> > How do you calculate the heap / data size ratio? Is this a linear ratio?
> >
> >
> >
> > Each node has slightly more than 12 GB right now though.
> >
> >
> >
> > 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
> >
> > For a >40GB of data 1GB of heap is too low.
> >
> >
> >
> > Best regards / Pagarbiai
> >
> > Viktor Jevdokimov
> >
> > Senior Developer
> >
> >
> >
> > Email: Viktor.Jevdokimov@adform.com
> >
> > Phone: +370 5 212 3063, Fax +370 5 261 0453
> >
> > J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> >
> > Follow us on Twitter: @adforminsider
> >
> > Take a ride with Adform's Rich Media Suite
> >
> > <image001.png>
> >
> > <image002.png>
> >
> >
> > Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
> >
> >
> >
> > From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
> > Sent: Tuesday, April 16, 2013 10:47
> > To: user@cassandra.apache.org
> > Subject: Reduce Cassandra GC
> >
> >
> >
> > Hi,
> >
> >
> >
> > We have a small production cluster with two nodes. The load on the nodes
> is very small, around 20 reads / sec and about the same for writes. There
> are around 2.5 million keys in the cluster and a RF of 2.
> >
> >
> >
> > About 2.4 million of the rows are skinny (6 columns) and around 3kb in
> size (each). Currently, scripts are running, accessing all of the keys in
> timeorder to do some calculations.
> >
> >
> >
> > While running the scripts, the nodes go down and then come back up 6-7
> minutes later. This seems to be due to GC. I get lines like this in the log:
> >
> > INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600
> >
> >
> >
> > However, the heap is not full. The heap usage has a jagged pattern going
> from 60% up to 70% during 5 minutes and then back down to 60% the next 5
> minutes and so on. I get no "Heap is X full..." messages. Every once in a
> while at one of these peaks, I get these stop-the-world GC for 6-7 minutes.
> Why does GC take up so much time even though the heap isn't full?
> >
> >
> >
> > I am aware that my access patterns make key caching very unlikely to be
> high. And indeed, my average key cache hit ratio during the run of the
> scripts is around 0.5%. I tried disabling key caching on the accessed
> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
> cassandra-cli but I get the same behaviour. Is the turning key cache off
> effective immediately?
> >
> >
> >
> > Stop-the-world GC is fine if it happens for a few seconds but having
> them for several minutes doesn't work. Any other suggestions to remove them?
> >
> >
> >
> > Best regards,
> >
> > Joel Samuelsson
> >
> >
> >
> >
>
>

Re: Reduce Cassandra GC

Posted by aaron morton <aa...@thelastpickle.com>.

> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
This does not say that the heap is full. 
ParNew is GC activity for the new heap, which is typically a smaller part of the overall heap. 

It sounds like you are running with defaults for the memory config, which is generally a good idea. But 4GB total memory for a node is on the small size.

Try some changes, edit the cassandra-env.sh file and change

MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="400M"

You may also want to try:

MAX_HEAP_SIZE="2G"
HEAP_NEWSIZE="800M"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4" 
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=2"

The size of the new heap generally depends on the number of cores available, see the commends in the -env file. 

An older discussion about memory use, not that in 1.2 the bloom filters (and compression data) are off heap now.
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html  

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 11:06 PM, Joel Samuelsson <sa...@gmail.com> wrote:

> You're right, it's probably hard. I should have provided more data.
> 
> I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the log indicates that JNA is working, please correct me if I'm wrong:
> CLibrary.java (line 111) JNA mlockall successful
> 
> Total amount of RAM is 4GB.
> 
> My description of data size was very bad. Sorry about that. Data set size is 12.3 GB per node, compressed.
> 
> Heap size is 998.44MB according to nodetool info. 
> Key cache is 49MB bytes according to nodetool info.
> Row cache size is 0 bytes acoording to nodetool info. 
> Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space" max in jconsole.
> Memtable is left at default which should give it 333MB according to documentation (uncertain where I can verify this).
> 
> Our production cluster seems similar to your dev cluster so possibly increasing the heap to 2GB might help our issues.
> 
> I am still interested in getting rough estimates of how much heap will be needed as data grows. Other than empirical studies how would I go about getting such estimates?
> 
> 
> 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
> How one could provide any help without any knowledge about your cluster, node and environment settings?
> 
>  
> 
> 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, bloom filters and indexes).
> 
>  
> 
> With ParNew GC time such as yours even if it is a swapping issue I could say only that heap size is too small.
> 
>  
> 
> Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA installed and used? What is total amount of RAM?
> 
>  
> 
> Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB heap without any GC issue with amount of data from 0 to 16GB compressed on each node. Memtable space sized to 100MB, New Heap 400MB.
> 
>  
> 
> Best regards / Pagarbiai
> Viktor Jevdokimov
> Senior Developer
> 
> Email: Viktor.Jevdokimov@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider
> Take a ride with Adform's Rich Media Suite
> <signature-logo29.png>
> <signature-best-employer-logo4823.png> 
> 
> Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
> 
> From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com] 
> Sent: Tuesday, April 16, 2013 12:52
> To: user@cassandra.apache.org
> Subject: Re: Reduce Cassandra GC
> 
>  
> 
> How do you calculate the heap / data size ratio? Is this a linear ratio?
> 
>  
> 
> Each node has slightly more than 12 GB right now though.
> 
>  
> 
> 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
> 
> For a >40GB of data 1GB of heap is too low.
> 
>  
> 
> Best regards / Pagarbiai
> 
> Viktor Jevdokimov
> 
> Senior Developer
> 
>  
> 
> Email: Viktor.Jevdokimov@adform.com
> 
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> 
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> 
> Follow us on Twitter: @adforminsider
> 
> Take a ride with Adform's Rich Media Suite
> 
> <image001.png>
> 
> <image002.png>
> 
> 
> Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.
> 
>  
> 
> From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com] 
> Sent: Tuesday, April 16, 2013 10:47
> To: user@cassandra.apache.org
> Subject: Reduce Cassandra GC
> 
>  
> 
> Hi,
> 
>  
> 
> We have a small production cluster with two nodes. The load on the nodes is very small, around 20 reads / sec and about the same for writes. There are around 2.5 million keys in the cluster and a RF of 2.
> 
>  
> 
> About 2.4 million of the rows are skinny (6 columns) and around 3kb in size (each). Currently, scripts are running, accessing all of the keys in timeorder to do some calculations.
> 
>  
> 
> While running the scripts, the nodes go down and then come back up 6-7 minutes later. This seems to be due to GC. I get lines like this in the log:
> 
> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
> 
>  
> 
> However, the heap is not full. The heap usage has a jagged pattern going from 60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes and so on. I get no "Heap is X full..." messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full?
> 
>  
> 
> I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio during the run of the scripts is around 0.5%. I tried disabling key caching on the accessed column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the cassandra-cli but I get the same behaviour. Is the turning key cache off effective immediately?
> 
>  
> 
> Stop-the-world GC is fine if it happens for a few seconds but having them for several minutes doesn't work. Any other suggestions to remove them?
> 
>  
> 
> Best regards,
> 
> Joel Samuelsson
> 
>  
> 
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

You're right, it's probably hard. I should have provided more data.

I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the
log indicates that JNA is working, please correct me if I'm wrong:
CLibrary.java (line 111) JNA mlockall successful

Total amount of RAM is 4GB.

My description of data size was very bad. Sorry about that. Data set size
is 12.3 GB per node, compressed.

Heap size is 998.44MB according to nodetool info.
Key cache is 49MB bytes according to nodetool info.
Row cache size is 0 bytes acoording to nodetool info.
Max new heap is 205MB kbytes according to Memory Pool "Par Eden Space" max
in jconsole.
Memtable is left at default which should give it 333MB according to
documentation (uncertain where I can verify this).

Our production cluster seems similar to your dev cluster so possibly
increasing the heap to 2GB might help our issues.

I am still interested in getting rough estimates of how much heap will be
needed as data grows. Other than empirical studies how would I go about
getting such estimates?


2013/4/16 Viktor Jevdokimov <Vi...@adform.com>

>  How one could provide any help without any knowledge about your cluster,
> node and environment settings?****
>
> ** **
>
> 40GB was calculated from 2 nodes with RF=2 (each has 100% data range),
> 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any
> overhead (sstable, bloom filters and indexes).****
>
> ** **
>
> With ParNew GC time such as yours even if it is a swapping issue I could
> say only that heap size is too small.****
>
> ** **
>
> Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is
> JNA installed and used? What is total amount of RAM?****
>
> ** **
>
> Just for a DEV environment we use 3 virtual machines with 4GB RAM and use
> 2GB heap without any GC issue with amount of data from 0 to 16GB compressed
> on each node. Memtable space sized to 100MB, New Heap 400MB.****
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: Viktor.Jevdokimov@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
> [image: Adform awarded the Best Employer 2012]
> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
> *Sent:* Tuesday, April 16, 2013 12:52
> *To:* user@cassandra.apache.org
> *Subject:* Re: Reduce Cassandra GC****
>
> ** **
>
> How do you calculate the heap / data size ratio? Is this a linear ratio?**
> **
>
> ** **
>
> Each node has slightly more than 12 GB right now though.****
>
> ** **
>
> 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>****
>
> For a >40GB of data 1GB of heap is too low.****
>
>  ****
>
> Best regards / Pagarbiai****
>
> *Viktor Jevdokimov*****
>
> Senior Developer****
>
> ** **
>
> Email: Viktor.Jevdokimov@adform.com****
>
> Phone: +370 5 212 3063, Fax +370 5 261 0453****
>
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania****
>
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> ****
>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
> ****
>
> [image: Adform News] <http://www.adform.com>****
>
> [image: Adform awarded the Best Employer 2012]<http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
> ****
>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies. ****
>
> ** **
>
> *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
> *Sent:* Tuesday, April 16, 2013 10:47
> *To:* user@cassandra.apache.org
> *Subject:* Reduce Cassandra GC****
>
>  ****
>
> Hi,****
>
>  ****
>
> We have a small production cluster with two nodes. The load on the nodes
> is very small, around 20 reads / sec and about the same for writes. There
> are around 2.5 million keys in the cluster and a RF of 2.****
>
>  ****
>
> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
> size (each). Currently, scripts are running, accessing all of the keys in
> timeorder to do some calculations.****
>
>  ****
>
> While running the scripts, the nodes go down and then come back up 6-7
> minutes later. This seems to be due to GC. I get lines like this in the log:
> ****
>
> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600****
>
>  ****
>
> However, the heap is not full. The heap usage has a jagged pattern going
> from 60% up to 70% during 5 minutes and then back down to 60% the next 5
> minutes and so on. I get no "Heap is X full..." messages. Every once in a
> while at one of these peaks, I get these stop-the-world GC for 6-7
> minutes. Why does GC take up so much time even though the heap isn't full?
> ****
>
>  ****
>
> I am aware that my access patterns make key caching very unlikely to be
> high. And indeed, my average key cache hit ratio during the run of the
> scripts is around 0.5%. I tried disabling key caching on the accessed
> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
> cassandra-cli but I get the same behaviour. Is the turning key cache off
> effective immediately?****
>
>  ****
>
> Stop-the-world GC is fine if it happens for a few seconds but having them
> for several minutes doesn't work. Any other suggestions to remove them?***
> *
>
>  ****
>
> Best regards,****
>
> Joel Samuelsson****
>
> ** **
>

RE: Reduce Cassandra GC

Posted by Viktor Jevdokimov <Vi...@adform.com>.

How one could provide any help without any knowledge about your cluster, node and environment settings?

40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, bloom filters and indexes).

With ParNew GC time such as yours even if it is a swapping issue I could say only that heap size is too small.

Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA installed and used? What is total amount of RAM?

Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB heap without any GC issue with amount of data from 0 to 16GB compressed on each node. Memtable space sized to 100MB, New Heap 400MB.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<ma...@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News] <http://www.adform.com>
[Adform awarded the Best Employer 2012] <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
Sent: Tuesday, April 16, 2013 12:52
To: user@cassandra.apache.org
Subject: Re: Reduce Cassandra GC

How do you calculate the heap / data size ratio? Is this a linear ratio?

Each node has slightly more than 12 GB right now though.

2013/4/16 Viktor Jevdokimov <Vi...@adform.com>>
For a >40GB of data 1GB of heap is too low.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<ma...@adform.com>
Phone: +370 5 212 3063<tel:%2B370%205%20212%203063>, Fax +370 5 261 0453<tel:%2B370%205%20261%200453>
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News]<http://www.adform.com>
[Adform awarded the Best Employer 2012]<http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com<ma...@gmail.com>]
Sent: Tuesday, April 16, 2013 10:47
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Reduce Cassandra GC

Hi,

We have a small production cluster with two nodes. The load on the nodes is very small, around 20 reads / sec and about the same for writes. There are around 2.5 million keys in the cluster and a RF of 2.

About 2.4 million of the rows are skinny (6 columns) and around 3kb in size (each). Currently, scripts are running, accessing all of the keys in timeorder to do some calculations.

While running the scripts, the nodes go down and then come back up 6-7 minutes later. This seems to be due to GC. I get lines like this in the log:
INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600

However, the heap is not full. The heap usage has a jagged pattern going from 60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes and so on. I get no "Heap is X full..." messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full?

I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio during the run of the scripts is around 0.5%. I tried disabling key caching on the accessed column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the cassandra-cli but I get the same behaviour. Is the turning key cache off effective immediately?

Stop-the-world GC is fine if it happens for a few seconds but having them for several minutes doesn't work. Any other suggestions to remove them?

Best regards,
Joel Samuelsson

Re: Reduce Cassandra GC

Posted by Alexis Rodríguez <ar...@inconcertcc.com>.

Joel,

You may want to take a look to the datastax docs [1] :

*The more memory a Cassandra node has, the better read performance. More
RAM allows for larger cache sizes and reduces disk I/O for reads. More RAM
also allows memory tables (memtables) to hold more recently written data.
Larger memtables lead to a fewer number of SSTables being flushed to disk
and fewer files to scan during a read. The ideal amount of RAM depends on
the anticipated size of your hot data.*
*
*
*For dedicated hardware, a minimum of than 8GB of RAM is needed. DataStax
recommends 16GB - 32GB.*
*Java heap space should be set to a maximum of 8GB or half of your total
RAM, whichever is lower. (A greater heap size has more intense garbage
collection periods.)*
*For a virtual environment use a minimum of 4GB, such as Amazon EC2 Large
instances. For production clusters with a healthy amount of traffic, 8GB is
more common.*

If you like you can test your configuration trying different heap sizes
with your data and check the cache hit rates [2]

[1] http://www.datastax.com/docs/1.0/cluster_architecture/cluster_planning
[2] http://www.datastax.com/dev/blog/maximizing-cache-benefit-with-cassandra


On Tue, Apr 16, 2013 at 9:18 AM, Joel Samuelsson
<sa...@gmail.com>wrote:

> Indeed!
> What I meant was:
> If 1GB heap is too low for >40GB data, how can I know what an appropiate
> heap is for various data sizes?
>
>
> 2013/4/16 Tomàs Núnez <to...@groupalia.com>
>
>> Hi!
>> Reading the documentation, heap size calculation is done without counting
>> data size.
>>
>> Extract from
>> http://www.datastax.com/docs/1.1/operations/tuning#heap-sizing
>>
>> System MemoryHeap Size Less than 2GB1/2 of system memory2GB to 4GB1GB Greater
>> than 4GB1/4 system memory, but not more than 8GB
>>
>>
>>
>> 2013/4/16 Joel Samuelsson <sa...@gmail.com>
>>
>>> How do you calculate the heap / data size ratio? Is this a linear ratio?
>>>
>>> Each node has slightly more than 12 GB right now though.
>>>
>>>
>>> 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
>>>
>>>>  For a >40GB of data 1GB of heap is too low.****
>>>>
>>>> ** **
>>>>    Best regards / Pagarbiai
>>>> *Viktor Jevdokimov*
>>>> Senior Developer
>>>>
>>>> Email: Viktor.Jevdokimov@adform.com
>>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>>  [image: Adform News] <http://www.adform.com>
>>>> [image: Adform awarded the Best Employer 2012]
>>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>>
>>>> Disclaimer: The information contained in this message and attachments
>>>> is intended solely for the attention and use of the named addressee and may
>>>> be confidential. If you are not the intended recipient, you are reminded
>>>> that the information remains the property of the sender. You must not use,
>>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>>> received this message in error, please contact the sender immediately and
>>>> irrevocably delete this message and any copies.
>>>>
>>>>   *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
>>>> *Sent:* Tuesday, April 16, 2013 10:47
>>>> *To:* user@cassandra.apache.org
>>>> *Subject:* Reduce Cassandra GC****
>>>>
>>>> ** **
>>>>
>>>> Hi,****
>>>>
>>>> ** **
>>>>
>>>> We have a small production cluster with two nodes. The load on the
>>>> nodes is very small, around 20 reads / sec and about the same for writes.
>>>> There are around 2.5 million keys in the cluster and a RF of 2.****
>>>>
>>>> ** **
>>>>
>>>> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
>>>> size (each). Currently, scripts are running, accessing all of the keys in
>>>> timeorder to do some calculations.****
>>>>
>>>> ** **
>>>>
>>>> While running the scripts, the nodes go down and then come back up 6-7
>>>> minutes later. This seems to be due to GC. I get lines like this in the log:
>>>> ****
>>>>
>>>> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
>>>> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
>>>> 1046937600****
>>>>
>>>> ** **
>>>>
>>>> However, the heap is not full. The heap usage has a jagged pattern
>>>> going from 60% up to 70% during 5 minutes and then back down to 60% the
>>>> next 5 minutes and so on. I get no "Heap is X full..." messages. Every once
>>>> in a while at one of these peaks, I get these stop-the-world GC for 6-7
>>>> minutes. Why does GC take up so much time even though the heap isn't full?
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> I am aware that my access patterns make key caching very unlikely to be
>>>> high. And indeed, my average key cache hit ratio during the run of the
>>>> scripts is around 0.5%. I tried disabling key caching on the accessed
>>>> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
>>>> cassandra-cli but I get the same behaviour. Is the turning key cache off
>>>> effective immediately?****
>>>>
>>>> ** **
>>>>
>>>> Stop-the-world GC is fine if it happens for a few seconds but having
>>>> them for several minutes doesn't work. Any other suggestions to remove them?
>>>> ****
>>>>
>>>> ** **
>>>>
>>>> Best regards,****
>>>>
>>>> Joel Samuelsson****
>>>>
>>>
>>>
>>
>>
>> --
>> [image: Groupalia] <http://es.groupalia.com/>
>> www.groupalia.com <http://es.groupalia.com/> Tomàs Núñez IT-Sysprod Tel. +
>> 34 93 159 31 00  Fax. + 34 93 396 18 52 Llull, 95-97, 2º planta, 08005
>> BarcelonaSkype: tomas.nunez.groupalia tomas.nunez@groupalia.com<no...@groupalia.com> [image:
>> Twitter] Twitter <http://twitter.com/#%21/groupaliaes>    [image:
>> Twitter] Facebook <https://www.facebook.com/GroupaliaEspana>    [image:
>> Twitter] Linkedin <http://www.linkedin.com/company/groupalia>
>>
>
>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

Indeed!
What I meant was:
If 1GB heap is too low for >40GB data, how can I know what an appropiate
heap is for various data sizes?


2013/4/16 Tomàs Núnez <to...@groupalia.com>

> Hi!
> Reading the documentation, heap size calculation is done without counting
> data size.
>
> Extract from
> http://www.datastax.com/docs/1.1/operations/tuning#heap-sizing
>
> System MemoryHeap Size Less than 2GB1/2 of system memory2GB to 4GB1GB Greater
> than 4GB1/4 system memory, but not more than 8GB
>
>
>
> 2013/4/16 Joel Samuelsson <sa...@gmail.com>
>
>> How do you calculate the heap / data size ratio? Is this a linear ratio?
>>
>> Each node has slightly more than 12 GB right now though.
>>
>>
>> 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
>>
>>>  For a >40GB of data 1GB of heap is too low.****
>>>
>>> ** **
>>>    Best regards / Pagarbiai
>>> *Viktor Jevdokimov*
>>> Senior Developer
>>>
>>> Email: Viktor.Jevdokimov@adform.com
>>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>>  [image: Adform News] <http://www.adform.com>
>>> [image: Adform awarded the Best Employer 2012]
>>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>>
>>> Disclaimer: The information contained in this message and attachments is
>>> intended solely for the attention and use of the named addressee and may be
>>> confidential. If you are not the intended recipient, you are reminded that
>>> the information remains the property of the sender. You must not use,
>>> disclose, distribute, copy, print or rely on this e-mail. If you have
>>> received this message in error, please contact the sender immediately and
>>> irrevocably delete this message and any copies.
>>>
>>>   *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
>>> *Sent:* Tuesday, April 16, 2013 10:47
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Reduce Cassandra GC****
>>>
>>> ** **
>>>
>>> Hi,****
>>>
>>> ** **
>>>
>>> We have a small production cluster with two nodes. The load on the nodes
>>> is very small, around 20 reads / sec and about the same for writes. There
>>> are around 2.5 million keys in the cluster and a RF of 2.****
>>>
>>> ** **
>>>
>>> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
>>> size (each). Currently, scripts are running, accessing all of the keys in
>>> timeorder to do some calculations.****
>>>
>>> ** **
>>>
>>> While running the scripts, the nodes go down and then come back up 6-7
>>> minutes later. This seems to be due to GC. I get lines like this in the log:
>>> ****
>>>
>>> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
>>> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
>>> 1046937600****
>>>
>>> ** **
>>>
>>> However, the heap is not full. The heap usage has a jagged pattern going
>>> from 60% up to 70% during 5 minutes and then back down to 60% the next 5
>>> minutes and so on. I get no "Heap is X full..." messages. Every once in a
>>> while at one of these peaks, I get these stop-the-world GC for 6-7
>>> minutes. Why does GC take up so much time even though the heap isn't full?
>>> ****
>>>
>>> ** **
>>>
>>> I am aware that my access patterns make key caching very unlikely to be
>>> high. And indeed, my average key cache hit ratio during the run of the
>>> scripts is around 0.5%. I tried disabling key caching on the accessed
>>> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
>>> cassandra-cli but I get the same behaviour. Is the turning key cache off
>>> effective immediately?****
>>>
>>> ** **
>>>
>>> Stop-the-world GC is fine if it happens for a few seconds but having
>>> them for several minutes doesn't work. Any other suggestions to remove them?
>>> ****
>>>
>>> ** **
>>>
>>> Best regards,****
>>>
>>> Joel Samuelsson****
>>>
>>
>>
>
>
> --
> [image: Groupalia] <http://es.groupalia.com/>
> www.groupalia.com <http://es.groupalia.com/> Tomàs Núñez IT-Sysprod Tel. +
> 34 93 159 31 00 Fax. + 34 93 396 18 52 Llull, 95-97, 2º planta, 08005
> BarcelonaSkype: tomas.nunez.groupalia tomas.nunez@groupalia.com<no...@groupalia.com> [image:
> Twitter] Twitter <http://twitter.com/#%21/groupaliaes>    [image: Twitter]
>  Facebook <https://www.facebook.com/GroupaliaEspana>    [image: Twitter]
>  Linkedin <http://www.linkedin.com/company/groupalia>
>

Re: Reduce Cassandra GC

Posted by Tomàs Núnez <to...@groupalia.com>.

Hi!
Reading the documentation, heap size calculation is done without counting
data size.

Extract from http://www.datastax.com/docs/1.1/operations/tuning#heap-sizing

System MemoryHeap SizeLess than 2GB1/2 of system memory2GB to 4GB1GBGreater
than 4GB1/4 system memory, but not more than 8GB



2013/4/16 Joel Samuelsson <sa...@gmail.com>

> How do you calculate the heap / data size ratio? Is this a linear ratio?
>
> Each node has slightly more than 12 GB right now though.
>
>
> 2013/4/16 Viktor Jevdokimov <Vi...@adform.com>
>
>>  For a >40GB of data 1GB of heap is too low.****
>>
>> ** **
>>    Best regards / Pagarbiai
>> *Viktor Jevdokimov*
>> Senior Developer
>>
>> Email: Viktor.Jevdokimov@adform.com
>> Phone: +370 5 212 3063, Fax +370 5 261 0453
>> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
>> Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
>> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>>  [image: Adform News] <http://www.adform.com>
>> [image: Adform awarded the Best Employer 2012]
>> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>   *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
>> *Sent:* Tuesday, April 16, 2013 10:47
>> *To:* user@cassandra.apache.org
>> *Subject:* Reduce Cassandra GC****
>>
>> ** **
>>
>> Hi,****
>>
>> ** **
>>
>> We have a small production cluster with two nodes. The load on the nodes
>> is very small, around 20 reads / sec and about the same for writes. There
>> are around 2.5 million keys in the cluster and a RF of 2.****
>>
>> ** **
>>
>> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
>> size (each). Currently, scripts are running, accessing all of the keys in
>> timeorder to do some calculations.****
>>
>> ** **
>>
>> While running the scripts, the nodes go down and then come back up 6-7
>> minutes later. This seems to be due to GC. I get lines like this in the log:
>> ****
>>
>> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
>> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
>> 1046937600****
>>
>> ** **
>>
>> However, the heap is not full. The heap usage has a jagged pattern going
>> from 60% up to 70% during 5 minutes and then back down to 60% the next 5
>> minutes and so on. I get no "Heap is X full..." messages. Every once in a
>> while at one of these peaks, I get these stop-the-world GC for 6-7
>> minutes. Why does GC take up so much time even though the heap isn't full?
>> ****
>>
>> ** **
>>
>> I am aware that my access patterns make key caching very unlikely to be
>> high. And indeed, my average key cache hit ratio during the run of the
>> scripts is around 0.5%. I tried disabling key caching on the accessed
>> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
>> cassandra-cli but I get the same behaviour. Is the turning key cache off
>> effective immediately?****
>>
>> ** **
>>
>> Stop-the-world GC is fine if it happens for a few seconds but having them
>> for several minutes doesn't work. Any other suggestions to remove them?**
>> **
>>
>> ** **
>>
>> Best regards,****
>>
>> Joel Samuelsson****
>>
>
>


-- 
[image: Groupalia] <http://es.groupalia.com/>
www.groupalia.com <http://es.groupalia.com/>Tomàs NúñezIT-SysprodTel. + 34
93 159 31 00 Fax. + 34 93 396 18 52Llull, 95-97, 2º planta, 08005
BarcelonaSkype:
tomas.nunez.groupaliatomas.nunez@groupalia.com<no...@groupalia.com>[image:
Twitter] Twitter <http://twitter.com/#%21/groupaliaes>    [image: Twitter]
 Facebook <https://www.facebook.com/GroupaliaEspana>    [image: Twitter]
 Linkedin <http://www.linkedin.com/company/groupalia>

Re: Reduce Cassandra GC

Posted by Joel Samuelsson <sa...@gmail.com>.

How do you calculate the heap / data size ratio? Is this a linear ratio?

Each node has slightly more than 12 GB right now though.


2013/4/16 Viktor Jevdokimov <Vi...@adform.com>

>  For a >40GB of data 1GB of heap is too low.****
>
> ** **
>    Best regards / Pagarbiai
> *Viktor Jevdokimov*
> Senior Developer
>
> Email: Viktor.Jevdokimov@adform.com
> Phone: +370 5 212 3063, Fax +370 5 261 0453
> J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
> Follow us on Twitter: @adforminsider <http://twitter.com/#!/adforminsider>
> Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>
>  [image: Adform News] <http://www.adform.com>
> [image: Adform awarded the Best Employer 2012]
> <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>   *From:* Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
> *Sent:* Tuesday, April 16, 2013 10:47
> *To:* user@cassandra.apache.org
> *Subject:* Reduce Cassandra GC****
>
> ** **
>
> Hi,****
>
> ** **
>
> We have a small production cluster with two nodes. The load on the nodes
> is very small, around 20 reads / sec and about the same for writes. There
> are around 2.5 million keys in the cluster and a RF of 2.****
>
> ** **
>
> About 2.4 million of the rows are skinny (6 columns) and around 3kb in
> size (each). Currently, scripts are running, accessing all of the keys in
> timeorder to do some calculations.****
>
> ** **
>
> While running the scripts, the nodes go down and then come back up 6-7
> minutes later. This seems to be due to GC. I get lines like this in the log:
> ****
>
> INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line
> 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
> 1046937600****
>
> ** **
>
> However, the heap is not full. The heap usage has a jagged pattern going
> from 60% up to 70% during 5 minutes and then back down to 60% the next 5
> minutes and so on. I get no "Heap is X full..." messages. Every once in a
> while at one of these peaks, I get these stop-the-world GC for 6-7
> minutes. Why does GC take up so much time even though the heap isn't full?
> ****
>
> ** **
>
> I am aware that my access patterns make key caching very unlikely to be
> high. And indeed, my average key cache hit ratio during the run of the
> scripts is around 0.5%. I tried disabling key caching on the accessed
> column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
> cassandra-cli but I get the same behaviour. Is the turning key cache off
> effective immediately?****
>
> ** **
>
> Stop-the-world GC is fine if it happens for a few seconds but having them
> for several minutes doesn't work. Any other suggestions to remove them?***
> *
>
> ** **
>
> Best regards,****
>
> Joel Samuelsson****
>

RE: Reduce Cassandra GC

Posted by Viktor Jevdokimov <Vi...@adform.com>.

For a >40GB of data 1GB of heap is too low.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: Viktor.Jevdokimov@adform.com<ma...@adform.com>
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsider<http://twitter.com/#!/adforminsider>
Take a ride with Adform's Rich Media Suite<http://vimeo.com/adform/richmedia>

[Adform News] <http://www.adform.com>
[Adform awarded the Best Employer 2012] <http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/>


Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

From: Joel Samuelsson [mailto:samuelsson.joel@gmail.com]
Sent: Tuesday, April 16, 2013 10:47
To: user@cassandra.apache.org
Subject: Reduce Cassandra GC

Hi,

We have a small production cluster with two nodes. The load on the nodes is very small, around 20 reads / sec and about the same for writes. There are around 2.5 million keys in the cluster and a RF of 2.

About 2.4 million of the rows are skinny (6 columns) and around 3kb in size (each). Currently, scripts are running, accessing all of the keys in timeorder to do some calculations.

While running the scripts, the nodes go down and then come back up 6-7 minutes later. This seems to be due to GC. I get lines like this in the log:
INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600

However, the heap is not full. The heap usage has a jagged pattern going from 60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes and so on. I get no "Heap is X full..." messages. Every once in a while at one of these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up so much time even though the heap isn't full?

I am aware that my access patterns make key caching very unlikely to be high. And indeed, my average key cache hit ratio during the run of the scripts is around 0.5%. I tried disabling key caching on the accessed column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the cassandra-cli but I get the same behaviour. Is the turning key cache off effective immediately?

Stop-the-world GC is fine if it happens for a few seconds but having them for several minutes doesn't work. Any other suggestions to remove them?

Best regards,
Joel Samuelsson