You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Ruchir Jha <ru...@gmail.com> on 2014/04/16 15:14:31 UTC

GC histogram analysis

Hi,

I am trying to investigate ParNew promotion failures happening routinely in
production. As part of this exercise, I enabled
-XX:PrintHistogramBeforeFullGC and saw the following output. As you can see
there are a ton of Columns, ExpiringColumns and DeletedColumns before GC
ran and these numbers go down significantly right after GC. Why are there
so many expiring and deleted columns?



*Before GC:* num     #instances         #bytes  class name
----------------------------------------------
   1:     113539896     5449915008  java.nio.*HeapByteBuffer*
   2:      15979061     2681431488  [B
   3:      36364545     1745498160
edu.stanford.ppl.concurrent.SnapTreeMap$Node
   4:      23583282      754665024  org.apache.cassandra.db.*Column*
   5:       8745428      209890272
java.util.concurrent.ConcurrentSkipListMap$Node
   6:       5062619      202504760  org.apache.cassandra.db.*ExpiringColumn*
   7:         45261      198998216  [I
   8:       1801535      172947360
edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
   9:       1473677      169570040  [J
  10:       4713304      113119296  java.lang.Double
  11:       3246729      103895328  org.apache.cassandra.db.*DeletedColumn*

*After GC:*
num     #instances         #bytes  class name
----------------------------------------------
1:      11807204     1505962728  [B
2:      12525536      601225728  java.nio.*HeapByteBuffer*
3:       8839073      424275504
edu.stanford.ppl.concurrent.SnapTreeMap$Node
4:       8194496      262223872  org.apache.cassandra.db.*Column*
cache.KeyCacheKey
17:        432119       17284760  org.apache.cassandra.db.*ExpiringColumn*
21:        351096       11235072  org.apache.cassandra.db.*DeletedColumn*

Re: GC histogram analysis

Posted by Chris Lohfink <cl...@blackbirdit.com>.

You can take a heap dump and find out who has references to it.  Can find out more which column family they are from.  Do you have a lot of tombstones or have data thats over written a lot or and doing a ton of reads? Maybe wide rows that your querying across or using filtering?  Reads could have to read and deserialize a lot of columns that will quickly leave scope unused with large slices.  

You may just have to tune the GC for your use case a little if the ParNew promotion failures is a problem. I doubt the byte buffers actually need to be promoted to old gen since they are probably from sstables or large batch writes or something and are temporal.  I would try increasing young gen size and upping tenuring threashold but it can be pretty dependent on hardware.

Chris

On Apr 16, 2014, at 8:40 AM, Ruchir Jha <ru...@gmail.com> wrote:

> No we don't. 
> 
> Sent from my iPhone
> 
> On Apr 16, 2014, at 9:21 AM, Mark Reddy <ma...@boxever.com> wrote:
> 
>> Do you delete and/or set TTLs on your data?
>> 
>> 
>> On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha <ru...@gmail.com> wrote:
>> Hi,
>> 
>> I am trying to investigate ParNew promotion failures happening routinely in production. As part of this exercise, I enabled -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see there are a ton of Columns, ExpiringColumns and DeletedColumns before GC ran and these numbers go down significantly right after GC. Why are there so many expiring and deleted columns? 
>> 
>> Before GC:
>> 
>>  num     #instances         #bytes  class name
>> ----------------------------------------------
>>    1:     113539896     5449915008  java.nio.HeapByteBuffer
>>    2:      15979061     2681431488  [B
>>    3:      36364545     1745498160  edu.stanford.ppl.concurrent.SnapTreeMap$Node
>>    4:      23583282      754665024  org.apache.cassandra.db.Column
>>    5:       8745428      209890272  java.util.concurrent.ConcurrentSkipListMap$Node
>>    6:       5062619      202504760  org.apache.cassandra.db.ExpiringColumn
>>    7:         45261      198998216  [I
>>    8:       1801535      172947360  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
>>    9:       1473677      169570040  [J
>>   10:       4713304      113119296  java.lang.Double
>>   11:       3246729      103895328  org.apache.cassandra.db.DeletedColumn
>> 
>> After GC:
>> num     #instances         #bytes  class name
>> ----------------------------------------------
>> 1:      11807204     1505962728  [B
>> 2:      12525536      601225728  java.nio.HeapByteBuffer
>> 3:       8839073      424275504  edu.stanford.ppl.concurrent.SnapTreeMap$Node
>> 4:       8194496      262223872  org.apache.cassandra.db.Column
>> cache.KeyCacheKey
>> 17:        432119       17284760  org.apache.cassandra.db.ExpiringColumn
>> 21:        351096       11235072  org.apache.cassandra.db.DeletedColumn
>> 
>>

Re: GC histogram analysis

Posted by Ruchir Jha <ru...@gmail.com>.

No we don't. 

Sent from my iPhone

> On Apr 16, 2014, at 9:21 AM, Mark Reddy <ma...@boxever.com> wrote:
> 
> Do you delete and/or set TTLs on your data?
> 
> 
>> On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha <ru...@gmail.com> wrote:
>> Hi,
>> 
>> I am trying to investigate ParNew promotion failures happening routinely in production. As part of this exercise, I enabled -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see there are a ton of Columns, ExpiringColumns and DeletedColumns before GC ran and these numbers go down significantly right after GC. Why are there so many expiring and deleted columns? 
>> 
>> Before GC:
>> 
>>  num     #instances         #bytes  class name
>> ----------------------------------------------
>>    1:     113539896     5449915008  java.nio.HeapByteBuffer
>>    2:      15979061     2681431488  [B
>>    3:      36364545     1745498160  edu.stanford.ppl.concurrent.SnapTreeMap$Node
>>    4:      23583282      754665024  org.apache.cassandra.db.Column
>>    5:       8745428      209890272  java.util.concurrent.ConcurrentSkipListMap$Node
>>    6:       5062619      202504760  org.apache.cassandra.db.ExpiringColumn
>>    7:         45261      198998216  [I
>>    8:       1801535      172947360  edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
>>    9:       1473677      169570040  [J
>>   10:       4713304      113119296  java.lang.Double
>>   11:       3246729      103895328  org.apache.cassandra.db.DeletedColumn
>> 
>> After GC:
>> num     #instances         #bytes  class name
>> ----------------------------------------------
>> 1:      11807204     1505962728  [B
>> 2:      12525536      601225728  java.nio.HeapByteBuffer
>> 3:       8839073      424275504  edu.stanford.ppl.concurrent.SnapTreeMap$Node
>> 4:       8194496      262223872  org.apache.cassandra.db.Column
>> cache.KeyCacheKey
>> 17:        432119       17284760  org.apache.cassandra.db.ExpiringColumn
>> 21:        351096       11235072  org.apache.cassandra.db.DeletedColumn
>

Re: GC histogram analysis

Posted by Mark Reddy <ma...@boxever.com>.

Do you delete and/or set TTLs on your data?


On Wed, Apr 16, 2014 at 2:14 PM, Ruchir Jha <ru...@gmail.com> wrote:

> Hi,
>
> I am trying to investigate ParNew promotion failures happening routinely
> in production. As part of this exercise, I enabled
> -XX:PrintHistogramBeforeFullGC and saw the following output. As you can see
> there are a ton of Columns, ExpiringColumns and DeletedColumns before GC
> ran and these numbers go down significantly right after GC. Why are there
> so many expiring and deleted columns?
>
>
>
> *Before GC:* num     #instances         #bytes  class name
> ----------------------------------------------
>    1:     113539896     5449915008  java.nio.*HeapByteBuffer*
>    2:      15979061     2681431488  [B
>    3:      36364545     1745498160
> edu.stanford.ppl.concurrent.SnapTreeMap$Node
>    4:      23583282      754665024  org.apache.cassandra.db.*Column*
>    5:       8745428      209890272
> java.util.concurrent.ConcurrentSkipListMap$Node
>    6:       5062619      202504760  org.apache.cassandra.db.
> *ExpiringColumn*
>    7:         45261      198998216  [I
>    8:       1801535      172947360
> edu.stanford.ppl.concurrent.CopyOnWriteManager$COWEpoch
>    9:       1473677      169570040  [J
>   10:       4713304      113119296  java.lang.Double
>   11:       3246729      103895328  org.apache.cassandra.db.
> *DeletedColumn*
>
> *After GC:*
> num     #instances         #bytes  class name
> ----------------------------------------------
> 1:      11807204     1505962728  [B
> 2:      12525536      601225728  java.nio.*HeapByteBuffer*
> 3:       8839073      424275504
> edu.stanford.ppl.concurrent.SnapTreeMap$Node
> 4:       8194496      262223872  org.apache.cassandra.db.*Column*
> cache.KeyCacheKey
> 17:        432119       17284760  org.apache.cassandra.db.*ExpiringColumn*
> 21:        351096       11235072  org.apache.cassandra.db.*DeletedColumn*
>
>