You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ruchir Jha <ru...@gmail.com> on 2014/04/16 15:14:31 UTC
GC histogram analysis
Hi,
I am trying to investigate ParNew promotion failures happening routinely in
production. As part of this exercise, I enabled
-XX:PrintHistogramBeforeFullGC and saw the following output. As you can see
there are a ton of Columns, ExpiringColumns and DeletedColumns before GC
ran and these numbers go down significantly right after GC. Why are there
so many expiring and deleted columns?
*Before GC:* num #instances #bytes class name
----------------------------------------------
1: 113539896 5449915008 java.nio.*HeapByteBuffer*
2: 15979061 2681431488 [B
3: 36364545 1745498160
edu.stanford.ppl.concurrent.SnapTreeMap$Node
4: 23583282 754665024 org.apache.cassandra.db.*Column*
5: 8745428 209890272
java.util.concurrent.ConcurrentSkipListMap$Node
6: 5062619 202504760 org.apache.cassandra.db.*ExpiringColumn*
7: 45261 198998216 [I
8: 1801535 172947360
edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
9: 1473677 169570040 [J
10: 4713304 113119296 java.lang.Double
11: 3246729 103895328 org.apache.cassandra.db.*DeletedColumn*
*After GC:*
num #instances #bytes class name
----------------------------------------------
1: 11807204 1505962728 [B
2: 12525536 601225728 java.nio.*HeapByteBuffer*
3: 8839073 424275504
edu.stanford.ppl.concurrent.SnapTreeMap$Node
4: 8194496 262223872 org.apache.cassandra.db.*Column*
cache.KeyCacheKey
17: 432119 17284760 org.apache.cassandra.db.*ExpiringColumn*
21: 351096 11235072 org.apache.cassandra.db.*DeletedColumn*
Re: GC histogram analysis
Posted by Chris Lohfink <cl...@blackbirdit.com>.
You can take a heap dump and find out who has references to it. Can find out more which column family they are from. Do you have a lot of tombstones or have data thats over written a lot or and doing a ton of reads? Maybe wide rows that your querying across or using filtering? Reads could have to read and deserialize a lot of columns that will quickly leave scope unused with large slices.
You may just have to tune the GC for your use case a little if the ParNew promotion failures is a problem. I doubt the byte buffers actually need to be promoted to old gen since they are probably from sstables or large batch writes or something and are temporal. I would try increasing young gen size and upping tenuring threashold but it can be pretty dependent on hardware.
Chris
On Apr 16, 2014, at 8:40 AM, Ruchir Jha <ru...@gmail.com> wrote:
> No we don't.
>
> Sent from my iPhone
>
> On Apr 16, 2014, at 9:21 AM, Mark Reddy <ma...@boxever.com> wrote:
>
>> Do you delete and/or set TTLs on your data?
>>
>>
>> On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha <ru...@gmail.com> wrote:
>> Hi,
>>
>> I am trying to investigate ParNew promotion failures happening routinely in production. As part of this exercise, I enabled -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see there are a ton of Columns, ExpiringColumns and DeletedColumns before GC ran and these numbers go down significantly right after GC. Why are there so many expiring and deleted columns?
>>
>> Before GC:
>>
>> num #instances #bytes class name
>> ----------------------------------------------
>> 1: 113539896 5449915008 java.nio.HeapByteBuffer
>> 2: 15979061 2681431488 [B
>> 3: 36364545 1745498160 edu.stanford.ppl.concurrent.SnapTreeMap$Node
>> 4: 23583282 754665024 org.apache.cassandra.db.Column
>> 5: 8745428 209890272 java.util.concurrent.ConcurrentSkipListMap$Node
>> 6: 5062619 202504760 org.apache.cassandra.db.ExpiringColumn
>> 7: 45261 198998216 [I
>> 8: 1801535 172947360 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
>> 9: 1473677 169570040 [J
>> 10: 4713304 113119296 java.lang.Double
>> 11: 3246729 103895328 org.apache.cassandra.db.DeletedColumn
>>
>> After GC:
>> num #instances #bytes class name
>> ----------------------------------------------
>> 1: 11807204 1505962728 [B
>> 2: 12525536 601225728 java.nio.HeapByteBuffer
>> 3: 8839073 424275504 edu.stanford.ppl.concurrent.SnapTreeMap$Node
>> 4: 8194496 262223872 org.apache.cassandra.db.Column
>> cache.KeyCacheKey
>> 17: 432119 17284760 org.apache.cassandra.db.ExpiringColumn
>> 21: 351096 11235072 org.apache.cassandra.db.DeletedColumn
>>
>>
Re: GC histogram analysis
Posted by Ruchir Jha <ru...@gmail.com>.
No we don't.
Sent from my iPhone
> On Apr 16, 2014, at 9:21 AM, Mark Reddy <ma...@boxever.com> wrote:
>
> Do you delete and/or set TTLs on your data?
>
>
>> On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha <ru...@gmail.com> wrote:
>> Hi,
>>
>> I am trying to investigate ParNew promotion failures happening routinely in production. As part of this exercise, I enabled -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see there are a ton of Columns, ExpiringColumns and DeletedColumns before GC ran and these numbers go down significantly right after GC. Why are there so many expiring and deleted columns?
>>
>> Before GC:
>>
>> num #instances #bytes class name
>> ----------------------------------------------
>> 1: 113539896 5449915008 java.nio.HeapByteBuffer
>> 2: 15979061 2681431488 [B
>> 3: 36364545 1745498160 edu.stanford.ppl.concurrent.SnapTreeMap$Node
>> 4: 23583282 754665024 org.apache.cassandra.db.Column
>> 5: 8745428 209890272 java.util.concurrent.ConcurrentSkipListMap$Node
>> 6: 5062619 202504760 org.apache.cassandra.db.ExpiringColumn
>> 7: 45261 198998216 [I
>> 8: 1801535 172947360 edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
>> 9: 1473677 169570040 [J
>> 10: 4713304 113119296 java.lang.Double
>> 11: 3246729 103895328 org.apache.cassandra.db.DeletedColumn
>>
>> After GC:
>> num #instances #bytes class name
>> ----------------------------------------------
>> 1: 11807204 1505962728 [B
>> 2: 12525536 601225728 java.nio.HeapByteBuffer
>> 3: 8839073 424275504 edu.stanford.ppl.concurrent.SnapTreeMap$Node
>> 4: 8194496 262223872 org.apache.cassandra.db.Column
>> cache.KeyCacheKey
>> 17: 432119 17284760 org.apache.cassandra.db.ExpiringColumn
>> 21: 351096 11235072 org.apache.cassandra.db.DeletedColumn
>
Re: GC histogram analysis
Posted by Mark Reddy <ma...@boxever.com>.
Do you delete and/or set TTLs on your data?
On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha <ru...@gmail.com> wrote:
> Hi,
>
> I am trying to investigate ParNew promotion failures happening routinely
> in production. As part of this exercise, I enabled
> -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see
> there are a ton of Columns, ExpiringColumns and DeletedColumns before GC
> ran and these numbers go down significantly right after GC. Why are there
> so many expiring and deleted columns?
>
>
>
> *Before GC:* num #instances #bytes class name
> ----------------------------------------------
> 1: 113539896 5449915008 java.nio.*HeapByteBuffer*
> 2: 15979061 2681431488 [B
> 3: 36364545 1745498160
> edu.stanford.ppl.concurrent.SnapTreeMap$Node
> 4: 23583282 754665024 org.apache.cassandra.db.*Column*
> 5: 8745428 209890272
> java.util.concurrent.ConcurrentSkipListMap$Node
> 6: 5062619 202504760 org.apache.cassandra.db.
> *ExpiringColumn*
> 7: 45261 198998216 [I
> 8: 1801535 172947360
> edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
> 9: 1473677 169570040 [J
> 10: 4713304 113119296 java.lang.Double
> 11: 3246729 103895328 org.apache.cassandra.db.
> *DeletedColumn*
>
> *After GC:*
> num #instances #bytes class name
> ----------------------------------------------
> 1: 11807204 1505962728 [B
> 2: 12525536 601225728 java.nio.*HeapByteBuffer*
> 3: 8839073 424275504
> edu.stanford.ppl.concurrent.SnapTreeMap$Node
> 4: 8194496 262223872 org.apache.cassandra.db.*Column*
> cache.KeyCacheKey
> 17: 432119 17284760 org.apache.cassandra.db.*ExpiringColumn*
> 21: 351096 11235072 org.apache.cassandra.db.*DeletedColumn*
>
>