You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by André Cruz <an...@co.sapo.pt> on 2013/09/23 16:51:38 UTC

Frequent Full GC that take > 30s

Hello.

I'm having problems with frequent Full GCs that take a long time, and cause a burst of timeouts for the client application. But first, here is my configuration:

Cassandra: 1.1.5
Key Cache        : size 209715168 (bytes), capacity 209715168 (bytes), 1331992571 hits, 1831790912 requests, 0.854 recent hit rate, 14400 save period in seconds
Row Cache        : size 104588313 (bytes), capacity 104857600 (bytes), 585590776 hits, 646992444 requests, 0.880 recent hit rate, 0 save period in seconds
flush_largest_memtables_at: 0.8
memtable_total_space_in_mb: 2048

JVM: 
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

JVM Options:
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:ParallelCMSThreads=4
-Xms16G
-Xmx16G
-Xmn800M
-Xss196k

6 nodes with 32GB RAM, Intel(R) Xeon(R) CPU E5-2609, 300GB data per node.


Current status:
I had to raise the heap to 16GB due to constant very high heap usage. I've since lowered the bf ratio, but still haven't got to lowering the heap size because I'm still not sure a 8GB heap can handle the load per node.

Things I plan to do:
- upgrade to Cassandra 1.2. Supposedly it makes better use of the heap.
- add more nodes. Still waiting for the hardware.
- test again with lower heap sizes.
- test JVM 7 (G1?). I've read somewhere that JVM 6 is recommended for Cassandra but Cassandra 2.0 requires JVM 7 so...

Besides these, I would also like to double-check my current GC parameters. I'm thinking of increasing the NewSize to 1200M, so that less garbage ends up in the old generation. But here are the GC logs, maybe someone has a better idea?

2013-09-20T07:32:59.090+0100: 141675.121: [GC 141675.121: [ParNew (2: promotion failure size = 9564)  (promotion failed)
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:     569496 bytes,     569496 total
: 85346K->67732K(737280K), 0.1206690 secs] 14838953K->14882152K(16695296K), 0.1210110 secs] [Times: user=0.58 sys=0.01, real=0.12 secs] 
Heap after GC invocations=190322 (full 281):
 par new generation   total 737280K, used 67732K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690, 0x0000000422e00000)
  from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988, 0x000000042ce00000)
  to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 14814420K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
GC locker: Trying a full collection because scavenge failed
{Heap before GC invocations=190322 (full 281):
 par new generation   total 737280K, used 67732K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690, 0x0000000422e00000)
  from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988, 0x000000042ce00000)
  to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 14814420K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
2013-09-20T07:32:59.212+0100: 141675.242: [Full GC 141675.242: [CMS2013-09-20T07:33:06.691+0100: 141682.721: [CMS-concurrent-sweep: 19.719/29.300 secs] [Times: user=146.19 sys=3.74, real=29.31 secs] 
 (concurrent mode failure): 14814420K->7653329K(15958016K), 35.8052520 secs] 14882152K->7653329K(16695296K), [CMS Perm : 26411K->26355K(44188K)], 35.8055760 secs] [Times: user=35.74 sys=0.04, real=35.80 secs] 
Heap after GC invocations=190323 (full 282):
 par new generation   total 737280K, used 0K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000)
  from space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000, 0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 7653329K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26355K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 35.9305810 seconds
Total time for which application threads were stopped: 0.0030240 seconds


...


2013-09-20T09:58:41.765+0100: 150417.795: [GC 150417.796: [ParNew
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:   83887848 bytes,   83887848 total
: 674422K->81920K(737280K), 0.0680000 secs] 15824090K->15233726K(16695296K), 0.0685120 secs] [Times: user=0.54 sys=0.00, real=0.07 secs] 
Heap after GC invocations=214529 (full 323):
 par new generation   total 737280K, used 81920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000)
  from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 15151806K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 0.0721100 seconds
{Heap before GC invocations=214529 (full 323):
 par new generation   total 737280K, used 82214K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000)
  from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 15151806K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
2013-09-20T09:58:41.837+0100: 150417.868: [GC 150417.868: [ParNew (0: promotion failure size = 1677)  (1: promotion failure size = 2584)  (2: promotion failure size = 2584)  (3: promotion failure size = 1723)  (4: promotion failure size = 1723)  (7: promotion failure size
 = 4766)  (promotion failed)
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:      10688 bytes,      10688 total
: 82214K->62920K(737280K), 0.4677070 secs] 15234020K->15298061K(16695296K), 0.4681720 secs] [Times: user=1.15 sys=0.01, real=0.46 secs] 
Heap after GC invocations=214530 (full 323):
 par new generation   total 737280K, used 62920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000)
  from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940, 0x0000000427e00000)
  to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 15235140K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
GC locker: Trying a full collection because scavenge failed
{Heap before GC invocations=214530 (full 323):
 par new generation   total 737280K, used 62920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000)
  from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940, 0x0000000427e00000)
  to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 15235140K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
2013-09-20T09:58:42.306+0100: 150418.336: [Full GC 150418.336: [CMS2013-09-20T09:58:46.720+0100: 150422.751: [CMS-concurrent-preclean: 7.057/10.443 secs] [Times: user=45.24 sys=0.32, real=10.44 secs] 
 (concurrent mode failure): 15235140K->5325067K(15958016K), 26.3214360 secs] 15298061K->5325067K(16695296K), [CMS Perm : 26424K->26343K(44188K)], 26.3218650 secs] [Times: user=26.29 sys=0.02, real=26.33 secs] 
Heap after GC invocations=214531 (full 324):
 par new generation   total 737280K, used 0K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000)
  from space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
  to   space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000, 0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 5325067K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26343K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 26.7940000 seconds
Total time for which application threads were stopped: 0.0408350 seconds
Total time for which application threads were stopped: 0.0264510 seconds



Thanks for the help,
André Cruz

Re: Frequent Full GC that take > 30s

Posted by André Cruz <an...@co.sapo.pt>.
On Sep 24, 2013, at 5:18 AM, Mohit Anchlia <mo...@gmail.com> wrote:

> Your ParNew size is way too small. Generally 4GB ParNew (-Xmn) works out best for 16GB heap

I was afraid that a 4GB ParNew would cause Young GCs to take too long. I'm going to test higher ParNew values.

Thanks,
André

Re: 答复: Frequent Full GC that take > 30s

Posted by Mohit Anchlia <mo...@gmail.com>.
Your ParNew size is way too small. Generally 4GB ParNew (-Xmn) works out
best for 16GB heap

On Mon, Sep 23, 2013 at 9:05 PM, � <xi...@xiaomi.com> wrote:

> it looks to me that "MaxTenuringThreshold" is too small, do you have any
> chance to try with a bigger one, like 4 or 8 or sth else?
>
> ________________________________________
> ������: Andr�� Cruz [andre.cruz@co.sapo.pt]
> ����ʱ��: 2013��9��23�� 22:51
> �ռ���: user@cassandra.apache.org
> ����: Frequent Full GC that take > 30s
>
> Hello.
>
> I'm having problems with frequent Full GCs that take a long time, and
> cause a burst of timeouts for the client application. But first, here is my
> configuration:
>
> Cassandra: 1.1.5
> Key Cache        : size 209715168 (bytes), capacity 209715168 (bytes),
> 1331992571 hits, 1831790912 requests, 0.854 recent hit rate, 14400 save
> period in seconds
> Row Cache        : size 104588313 (bytes), capacity 104857600 (bytes),
> 585590776 hits, 646992444 requests, 0.880 recent hit rate, 0 save period in
> seconds
> flush_largest_memtables_at: 0.8
> memtable_total_space_in_mb: 2048
>
> JVM:
> java version "1.6.0_35"
> Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
> Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)
>
> JVM Options:
> -XX:+UseParNewGC
> -XX:+UseConcMarkSweepGC
> -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=8
> -XX:MaxTenuringThreshold=1
> -XX:CMSInitiatingOccupancyFraction=75
> -XX:+UseCMSInitiatingOccupancyOnly
> -XX:ParallelCMSThreads=4
> -Xms16G
> -Xmx16G
> -Xmn800M
> -Xss196k
>
> 6 nodes with 32GB RAM, Intel(R) Xeon(R) CPU E5-2609, 300GB data per node.
>
>
> Current status:
> I had to raise the heap to 16GB due to constant very high heap usage. I've
> since lowered the bf ratio, but still haven't got to lowering the heap size
> because I'm still not sure a 8GB heap can handle the load per node.
>
> Things I plan to do:
> - upgrade to Cassandra 1.2. Supposedly it makes better use of the heap.
> - add more nodes. Still waiting for the hardware.
> - test again with lower heap sizes.
> - test JVM 7 (G1?). I've read somewhere that JVM 6 is recommended for
> Cassandra but Cassandra 2.0 requires JVM 7 so...
>
> Besides these, I would also like to double-check my current GC parameters.
> I'm thinking of increasing the NewSize to 1200M, so that less garbage ends
> up in the old generation. But here are the GC logs, maybe someone has a
> better idea?
>
> 2013-09-20T07:32:59.090+0100: 141675.121: [GC 141675.121: [ParNew (2:
> promotion failure size = 9564)  (promotion failed)
> Desired survivor size 41943040 bytes, new threshold 1 (max 1)
> - age   1:     569496 bytes,     569496 total
> : 85346K->67732K(737280K), 0.1206690 secs]
> 14838953K->14882152K(16695296K), 0.1210110 secs] [Times: user=0.58
> sys=0.01, real=0.12 secs]
> Heap after GC invocations=190322 (full 281):
>  par new generation   total 737280K, used 67732K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690,
> 0x0000000422e00000)
>   from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988,
> 0x000000042ce00000)
>   to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538,
> 0x0000000427e00000)
>  concurrent mark-sweep generation total 15958016K, used 14814420K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26411K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> }
> GC locker: Trying a full collection because scavenge failed
> {Heap before GC invocations=190322 (full 281):
>  par new generation   total 737280K, used 67732K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690,
> 0x0000000422e00000)
>   from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988,
> 0x000000042ce00000)
>   to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538,
> 0x0000000427e00000)
>  concurrent mark-sweep generation total 15958016K, used 14814420K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26411K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> 2013-09-20T07:32:59.212+0100: 141675.242: [Full GC 141675.242:
> [CMS2013-09-20T07:33:06.691+0100: 141682.721: [CMS-concurrent-sweep:
> 19.719/29.300 secs] [Times: user=146.19 sys=3.74, real=29.31 secs]
>  (concurrent mode failure): 14814420K->7653329K(15958016K), 35.8052520
> secs] 14882152K->7653329K(16695296K), [CMS Perm : 26411K->26355K(44188K)],
> 35.8055760 secs] [Times: user=35.74 sys=0.04, real=35.80 secs]
> Heap after GC invocations=190323 (full 282):
>  par new generation   total 737280K, used 0K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000,
> 0x0000000422e00000)
>   from space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000,
> 0x000000042ce00000)
>   to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000,
> 0x0000000427e00000)
>  concurrent mark-sweep generation total 15958016K, used 7653329K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26355K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> }
> Total time for which application threads were stopped: 35.9305810 seconds
> Total time for which application threads were stopped: 0.0030240 seconds
>
>
> ...
>
>
> 2013-09-20T09:58:41.765+0100: 150417.795: [GC 150417.796: [ParNew
> Desired survivor size 41943040 bytes, new threshold 1 (max 1)
> - age   1:   83887848 bytes,   83887848 total
> : 674422K->81920K(737280K), 0.0680000 secs]
> 15824090K->15233726K(16695296K), 0.0685120 secs] [Times: user=0.54
> sys=0.00, real=0.07 secs]
> Heap after GC invocations=214529 (full 323):
>  par new generation   total 737280K, used 81920K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000,
> 0x0000000422e00000)
>   from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000,
> 0x000000042ce00000)
>   to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000,
> 0x0000000427e00000)
>  concurrent mark-sweep generation total 15958016K, used 15151806K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26424K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> }
> Total time for which application threads were stopped: 0.0721100 seconds
> {Heap before GC invocations=214529 (full 323):
>  par new generation   total 737280K, used 82214K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998,
> 0x0000000422e00000)
>   from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000,
> 0x000000042ce00000)
>   to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000,
> 0x0000000427e00000)
>  concurrent mark-sweep generation total 15958016K, used 15151806K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26424K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> 2013-09-20T09:58:41.837+0100: 150417.868: [GC 150417.868: [ParNew (0:
> promotion failure size = 1677)  (1: promotion failure size = 2584)  (2:
> promotion failure size = 2584)  (3: promotion failure size = 1723)  (4:
> promotion failure size = 1723)  (7: promotion failure size
>  = 4766)  (promotion failed)
> Desired survivor size 41943040 bytes, new threshold 1 (max 1)
> - age   1:      10688 bytes,      10688 total
> : 82214K->62920K(737280K), 0.4677070 secs]
> 15234020K->15298061K(16695296K), 0.4681720 secs] [Times: user=1.15
> sys=0.01, real=0.46 secs]
> Heap after GC invocations=214530 (full 323):
>  par new generation   total 737280K, used 62920K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998,
> 0x0000000422e00000)
>   from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940,
> 0x0000000427e00000)
>   to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000,
> 0x000000042ce00000)
>  concurrent mark-sweep generation total 15958016K, used 15235140K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26424K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> }
> GC locker: Trying a full collection because scavenge failed
> {Heap before GC invocations=214530 (full 323):
>  par new generation   total 737280K, used 62920K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998,
> 0x0000000422e00000)
>   from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940,
> 0x0000000427e00000)
>   to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000,
> 0x000000042ce00000)
>  concurrent mark-sweep generation total 15958016K, used 15235140K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26424K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> 2013-09-20T09:58:42.306+0100: 150418.336: [Full GC 150418.336:
> [CMS2013-09-20T09:58:46.720+0100: 150422.751: [CMS-concurrent-preclean:
> 7.057/10.443 secs] [Times: user=45.24 sys=0.32, real=10.44 secs]
>  (concurrent mode failure): 15235140K->5325067K(15958016K), 26.3214360
> secs] 15298061K->5325067K(16695296K), [CMS Perm : 26424K->26343K(44188K)],
> 26.3218650 secs] [Times: user=26.29 sys=0.02, real=26.33 secs]
> Heap after GC invocations=214531 (full 324):
>  par new generation   total 737280K, used 0K [0x00000003fae00000,
> 0x000000042ce00000, 0x000000042ce00000)
>   eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000,
> 0x0000000422e00000)
>   from space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000,
> 0x0000000427e00000)
>   to   space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000,
> 0x000000042ce00000)
>  concurrent mark-sweep generation total 15958016K, used 5325067K
> [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
>  concurrent-mark-sweep perm gen total 44188K, used 26343K
> [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
> }
> Total time for which application threads were stopped: 26.7940000 seconds
> Total time for which application threads were stopped: 0.0408350 seconds
> Total time for which application threads were stopped: 0.0264510 seconds
>
>
>
> Thanks for the help,
> Andr�� Cruz
>

Re: Frequent Full GC that take > 30s

Posted by André Cruz <an...@co.sapo.pt>.
On Sep 24, 2013, at 5:05 AM, 谢良 <xi...@xiaomi.com> wrote:

> it looks to me that "MaxTenuringThreshold" is too small, do you have any chance to try with a bigger one, like 4 or 8 or sth else?

MaxTenuringThreshold=1 seems a bit odd, yes. But it is the Cassandra default, maybe there is a reason for this? Perhaps under normal usage all objects that survive 1 tenure should be moved to the old generation?

Anyway, I will also try higher values.

Thanks,
André


答复: Frequent Full GC that take > 30s

Posted by 谢良 <xi...@xiaomi.com>.
it looks to me that "MaxTenuringThreshold" is too small, do you have any chance to try with a bigger one, like 4 or 8 or sth else?

________________________________________
发件人: André Cruz [andre.cruz@co.sapo.pt]
发送时间: 2013年9月23日 22:51
收件人: user@cassandra.apache.org
主题: Frequent Full GC that take > 30s

Hello.

I'm having problems with frequent Full GCs that take a long time, and cause a burst of timeouts for the client application. But first, here is my configuration:

Cassandra: 1.1.5
Key Cache        : size 209715168 (bytes), capacity 209715168 (bytes), 1331992571 hits, 1831790912 requests, 0.854 recent hit rate, 14400 save period in seconds
Row Cache        : size 104588313 (bytes), capacity 104857600 (bytes), 585590776 hits, 646992444 requests, 0.880 recent hit rate, 0 save period in seconds
flush_largest_memtables_at: 0.8
memtable_total_space_in_mb: 2048

JVM:
java version "1.6.0_35"
Java(TM) SE Runtime Environment (build 1.6.0_35-b10)
Java HotSpot(TM) 64-Bit Server VM (build 20.10-b01, mixed mode)

JVM Options:
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:SurvivorRatio=8
-XX:MaxTenuringThreshold=1
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:ParallelCMSThreads=4
-Xms16G
-Xmx16G
-Xmn800M
-Xss196k

6 nodes with 32GB RAM, Intel(R) Xeon(R) CPU E5-2609, 300GB data per node.


Current status:
I had to raise the heap to 16GB due to constant very high heap usage. I've since lowered the bf ratio, but still haven't got to lowering the heap size because I'm still not sure a 8GB heap can handle the load per node.

Things I plan to do:
- upgrade to Cassandra 1.2. Supposedly it makes better use of the heap.
- add more nodes. Still waiting for the hardware.
- test again with lower heap sizes.
- test JVM 7 (G1?). I've read somewhere that JVM 6 is recommended for Cassandra but Cassandra 2.0 requires JVM 7 so...

Besides these, I would also like to double-check my current GC parameters. I'm thinking of increasing the NewSize to 1200M, so that less garbage ends up in the old generation. But here are the GC logs, maybe someone has a better idea?

2013-09-20T07:32:59.090+0100: 141675.121: [GC 141675.121: [ParNew (2: promotion failure size = 9564)  (promotion failed)
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:     569496 bytes,     569496 total
: 85346K->67732K(737280K), 0.1206690 secs] 14838953K->14882152K(16695296K), 0.1210110 secs] [Times: user=0.58 sys=0.01, real=0.12 secs]
Heap after GC invocations=190322 (full 281):
 par new generation   total 737280K, used 67732K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690, 0x0000000422e00000)
  from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988, 0x000000042ce00000)
  to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 14814420K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
GC locker: Trying a full collection because scavenge failed
{Heap before GC invocations=190322 (full 281):
 par new generation   total 737280K, used 67732K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   3% used [0x00000003fae00000, 0x00000003fc212690, 0x0000000422e00000)
  from space 81920K,  57% used [0x0000000427e00000, 0x000000042ac12988, 0x000000042ce00000)
  to   space 81920K,  79% used [0x0000000422e00000, 0x0000000426d46538, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 14814420K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26411K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
2013-09-20T07:32:59.212+0100: 141675.242: [Full GC 141675.242: [CMS2013-09-20T07:33:06.691+0100: 141682.721: [CMS-concurrent-sweep: 19.719/29.300 secs] [Times: user=146.19 sys=3.74, real=29.31 secs]
 (concurrent mode failure): 14814420K->7653329K(15958016K), 35.8052520 secs] 14882152K->7653329K(16695296K), [CMS Perm : 26411K->26355K(44188K)], 35.8055760 secs] [Times: user=35.74 sys=0.04, real=35.80 secs]
Heap after GC invocations=190323 (full 282):
 par new generation   total 737280K, used 0K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000)
  from space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000, 0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 7653329K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26355K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 35.9305810 seconds
Total time for which application threads were stopped: 0.0030240 seconds


...


2013-09-20T09:58:41.765+0100: 150417.795: [GC 150417.796: [ParNew
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:   83887848 bytes,   83887848 total
: 674422K->81920K(737280K), 0.0680000 secs] 15824090K->15233726K(16695296K), 0.0685120 secs] [Times: user=0.54 sys=0.00, real=0.07 secs]
Heap after GC invocations=214529 (full 323):
 par new generation   total 737280K, used 81920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000)
  from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 15151806K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 0.0721100 seconds
{Heap before GC invocations=214529 (full 323):
 par new generation   total 737280K, used 82214K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000)
  from space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
  to   space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
 concurrent mark-sweep generation total 15958016K, used 15151806K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
2013-09-20T09:58:41.837+0100: 150417.868: [GC 150417.868: [ParNew (0: promotion failure size = 1677)  (1: promotion failure size = 2584)  (2: promotion failure size = 2584)  (3: promotion failure size = 1723)  (4: promotion failure size = 1723)  (7: promotion failure size
 = 4766)  (promotion failed)
Desired survivor size 41943040 bytes, new threshold 1 (max 1)
- age   1:      10688 bytes,      10688 total
: 82214K->62920K(737280K), 0.4677070 secs] 15234020K->15298061K(16695296K), 0.4681720 secs] [Times: user=1.15 sys=0.01, real=0.46 secs]
Heap after GC invocations=214530 (full 323):
 par new generation   total 737280K, used 62920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000)
  from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940, 0x0000000427e00000)
  to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 15235140K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
GC locker: Trying a full collection because scavenge failed
{Heap before GC invocations=214530 (full 323):
 par new generation   total 737280K, used 62920K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae49998, 0x0000000422e00000)
  from space 81920K,  76% used [0x0000000422e00000, 0x0000000426b28940, 0x0000000427e00000)
  to   space 81920K, 100% used [0x0000000427e00000, 0x000000042ce00000, 0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 15235140K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26424K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
2013-09-20T09:58:42.306+0100: 150418.336: [Full GC 150418.336: [CMS2013-09-20T09:58:46.720+0100: 150422.751: [CMS-concurrent-preclean: 7.057/10.443 secs] [Times: user=45.24 sys=0.32, real=10.44 secs]
 (concurrent mode failure): 15235140K->5325067K(15958016K), 26.3214360 secs] 15298061K->5325067K(16695296K), [CMS Perm : 26424K->26343K(44188K)], 26.3218650 secs] [Times: user=26.29 sys=0.02, real=26.33 secs]
Heap after GC invocations=214531 (full 324):
 par new generation   total 737280K, used 0K [0x00000003fae00000, 0x000000042ce00000, 0x000000042ce00000)
  eden space 655360K,   0% used [0x00000003fae00000, 0x00000003fae00000, 0x0000000422e00000)
  from space 81920K,   0% used [0x0000000422e00000, 0x0000000422e00000, 0x0000000427e00000)
  to   space 81920K,   0% used [0x0000000427e00000, 0x0000000427e00000, 0x000000042ce00000)
 concurrent mark-sweep generation total 15958016K, used 5325067K [0x000000042ce00000, 0x00000007fae00000, 0x00000007fae00000)
 concurrent-mark-sweep perm gen total 44188K, used 26343K [0x00000007fae00000, 0x00000007fd927000, 0x0000000800000000)
}
Total time for which application threads were stopped: 26.7940000 seconds
Total time for which application threads were stopped: 0.0408350 seconds
Total time for which application threads were stopped: 0.0264510 seconds



Thanks for the help,
André Cruz