You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ruslan usifov <ru...@gmail.com> on 2011/03/23 16:18:02 UTC

ParNew (promotion failed)

Hello

Sometimes i seen in gc log follow message:

2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion
failed)
Desired survivor size 41943040 bytes, new threshold 2 (max 2)
- age   1:    5573024 bytes,    5573024 total
- age   2:    5064608 bytes,   10637632 total
: 672577K->670749K(737280K), 0.1837950 secs]14897.288: [CMS:
1602487K->779310K(2326528K), 4.7525580 secs] 2270940K->779310K(3063808K), [
CMS Perm : 20073K->19913K(33420K)], 4.9365810 secs] [Times: user=5.06
sys=0.00, real=4.93 secs]
Total time for which application threads were stopped: 4.9378750 seconds


How can i minimize they frequency, or disable?

May current workload is a many small objects (about 200 bytes long), and
summary of my memtables about 300 MB (16 CF). My heap is 3G,

Re: ParNew (promotion failed)

Posted by ruslan usifov <ru...@gmail.com>.
Also after all this messages in stdout.log i see follow:

[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor3]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor2]
[Unloading class sun.reflect.GeneratedSerializationConstructorAccessor1]
[Unloading class sun.reflect.GeneratedConstructorAccessor3]


As write here:
http://anshuiitk.blogspot.com/2010/11/excessive-full-garbage-collection.html.
This is may Perm size problems, but line Perm : 20073K->19913K(33420K),
doesn't say about this?

Re: ParNew (promotion failed)

Posted by ruslan usifov <ru...@gmail.com>.
2011/3/24 Erik Onnen <eo...@gmail.com>

> It's been about 7 months now but at the time G1 would regularly
> segfault for me under load on Linux x64. I'd advise extra precautions
> in testing and make sure you test with representative load.
>

Which java version do you use?

Re: ParNew (promotion failed)

Posted by Erik Onnen <eo...@gmail.com>.
It's been about 7 months now but at the time G1 would regularly
segfault for me under load on Linux x64. I'd advise extra precautions
in testing and make sure you test with representative load.

Re: ParNew (promotion failed)

Posted by Narendra Sharma <na...@gmail.com>.
I haven't used G1. I remember someone shared his experience in detail on G1.
The bottom line is you need to test it for your deployment and based on test
and results conclude if it will work for you. I believe for a small heap G1
will do well.

-Naren


On Wed, Mar 23, 2011 at 1:47 PM, ruslan usifov <ru...@gmail.com>wrote:

>
>
> 2011/3/23 Narendra Sharma <na...@gmail.com>
>
>> I understand that. The overhead could be as high as 10x of memtable data
>> size. So overall the overhead for 16CF collectively in your case could be
>> 300*10 = 3G.
>>
>>
>> And how about G1 GC, it must prevent memory fragmentation. but some post
> on this email, told that it is not so good as it described. What do you
> think about it?
>

Re: ParNew (promotion failed)

Posted by ruslan usifov <ru...@gmail.com>.
2011/3/23 Narendra Sharma <na...@gmail.com>

> I understand that. The overhead could be as high as 10x of memtable data
> size. So overall the overhead for 16CF collectively in your case could be
> 300*10 = 3G.
>
>
> And how about G1 GC, it must prevent memory fragmentation. but some post on
this email, told that it is not so good as it described. What do you think
about it?

Re: ParNew (promotion failed)

Posted by Narendra Sharma <na...@gmail.com>.
I understand that. The overhead could be as high as 10x of memtable data
size. So overall the overhead for 16CF collectively in your case could be
300*10 = 3G.

Thanks,
Naren

On Wed, Mar 23, 2011 at 11:18 AM, ruslan usifov <ru...@gmail.com>wrote:

>
>
> 2011/3/23 Narendra Sharma <na...@gmail.com>
>
>> I think it is due to fragmentation in old gen, due to which survivor area
>> cannot be moved to old gen. 300MB data size of memtable looks high for 3G
>> heap. I learned that in memory overhead of memtable can be as high as 10x of
>> memtable data size in memory. So either increase the heap or reduce the
>> memtable thresholds further so that old gen gets freed up faster. With
>> 16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable
>> thresholds further.
>>
>>
>> I think that you don't undestend me, 300MB is a summary thresholds on all
> 16 CF, so one memtable_threshold is about 18MB. Or all the same it is
> necessary to reduce memtable_threshold?

Re: ParNew (promotion failed)

Posted by ruslan usifov <ru...@gmail.com>.
2011/3/23 Narendra Sharma <na...@gmail.com>

> I think it is due to fragmentation in old gen, due to which survivor area
> cannot be moved to old gen. 300MB data size of memtable looks high for 3G
> heap. I learned that in memory overhead of memtable can be as high as 10x of
> memtable data size in memory. So either increase the heap or reduce the
> memtable thresholds further so that old gen gets freed up faster. With
> 16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable
> thresholds further.
>
>
> I think that you don't undestend me, 300MB is a summary thresholds on all
16 CF, so one memtable_threshold is about 18MB. Or all the same it is
necessary to reduce memtable_threshold?

Re: ParNew (promotion failed)

Posted by Narendra Sharma <na...@gmail.com>.
I think it is due to fragmentation in old gen, due to which survivor area
cannot be moved to old gen. 300MB data size of memtable looks high for 3G
heap. I learned that in memory overhead of memtable can be as high as 10x of
memtable data size in memory. So either increase the heap or reduce the
memtable thresholds further so that old gen gets freed up faster. With
16CFs, I would do both i.e. increase the heap to say 4GB and reduce memtable
thresholds further.

-Naren

On Wed, Mar 23, 2011 at 8:18 AM, ruslan usifov <ru...@gmail.com>wrote:

> Hello
>
> Sometimes i seen in gc log follow message:
>
> 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion
> failed)
> Desired survivor size 41943040 bytes, new threshold 2 (max 2)
> - age   1:    5573024 bytes,    5573024 total
> - age   2:    5064608 bytes,   10637632 total
> : 672577K->670749K(737280K), 0.1837950 secs]14897.288: [CMS:
> 1602487K->779310K(2326528K), 4.7525580 secs] 2270940K->779310K(3063808K), [
> CMS Perm : 20073K->19913K(33420K)], 4.9365810 secs] [Times: user=5.06
> sys=0.00, real=4.93 secs]
> Total time for which application threads were stopped: 4.9378750 seconds
>
>
> How can i minimize they frequency, or disable?
>
> May current workload is a many small objects (about 200 bytes long), and
> summary of my memtables about 300 MB (16 CF). My heap is 3G,
>

Re: ParNew (promotion failed)

Posted by Peter Schuller <pe...@infidyne.com>.
>> So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
>> tune in_memory_compaction_limit_in_mb config parameter?
>
> More likely adjust the initial occupancy trigger and/or the heap size.
> Probably just the latter. This is assuming you're on 0.7 with mostly
> default JVM options. See cassandra-env.sh.

Well, in_memory_compaction_limit_in_mb can be relevant. But this
should only matter if you have large rows in your column family.

-- 
/ Peter Schuller

Re: ParNew (promotion failed)

Posted by Peter Schuller <pe...@infidyne.com>.
> But he's talking about "promotion failed" which is about heap
> fragmentation, not "concurrent mode failure" which would indicate CMS
> too late.  So increasing young generation size + tenuring threshold is
> probably the way to go (especially in a read-heavy workload;
> increasing tenuring will just mean copying data in memtables around
> between survivor spaces for a write-heavy load).

Thanks for the catch. You're right.

For interested parties:

This caused me to look into when 'promotion failed' and 'concurrent
mode failure' are actually reported. WIth some background here (from
2006, so potentially out of date):

  http://blogs.sun.com/jonthecollector/entry/when_the_sum_of_the

I looked at a semi-recent openjdk7 (so it may have changed since 1.6).
"concurrent mode failure" seems to be logged in two cases; one is
CMSCollector::do_mark_sweep_work(). The other is
CMSCollector::acquire_control_and_collect().

The former is called by the latter if it is determined that compaction
should happens, which seems to boil down to whether the the
incremental collection is "believed" to fail (my source navigation fu
failed me and I'm for some reason unable to find the implementation of
collection_attempt_is_safe() that applies...). The other concurrent
mode failure is if acquire_control_and_collect() determines that one
is already in progress.

That seems consistent with the blog entry.

"promotion failed" seems reported when an actual
next_gen->par_promote() call fails for a specific object.

So, my reading is that while 'promotion failed' can indeed be an
indicator of promotion failure due to fragmentation alone (if a
promotion were to fail in spite of there being plenty of free space
left), it can also have a cause overlapping with concurrent mode
failure in case a young-gen collection was attempted under the belief
that there would be enough space - only to then fail.

However, given the reported numbers (CMS:
1341669K->1142937K(2428928K)) it does seem clear that finding
contiguous free space is indeed the problem.

Running with -XX:PrintFLSStatistics=1 may yield interesting results,
but of course won't actually help.

-- 
/ Peter Schuller

Re: ParNew (promotion failed)

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sat, Mar 26, 2011 at 5:21 PM, Peter Schuller
<pe...@infidyne.com> wrote:
>> So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
>> tune in_memory_compaction_limit_in_mb config parameter?
>
> More likely adjust the initial occupancy trigger and/or the heap size.
> Probably just the latter. This is assuming you're on 0.7 with mostly
> default JVM options. See cassandra-env.sh.
>
> In fact, you may even be helped by *decreasing* the young generation
> size if you're running a version which has a cassandra-env which
> specifies -Xmn. I'm not entirely sure because I don't know off hand
> exactly based on what the occupancy trigger is happening, but the
> young gen is large and the workload is such that young-gen gc:s
> promote a high percentage of it's data, I suspect that can lead to CMS
> triggering too late.  (So this paragraph is speculation.)

But he's talking about "promotion failed" which is about heap
fragmentation, not "concurrent mode failure" which would indicate CMS
too late.  So increasing young generation size + tenuring threshold is
probably the way to go (especially in a read-heavy workload;
increasing tenuring will just mean copying data in memtables around
between survivor spaces for a write-heavy load).

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: ParNew (promotion failed)

Posted by Peter Schuller <pe...@infidyne.com>.
> So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
> tune in_memory_compaction_limit_in_mb config parameter?

More likely adjust the initial occupancy trigger and/or the heap size.
Probably just the latter. This is assuming you're on 0.7 with mostly
default JVM options. See cassandra-env.sh.

In fact, you may even be helped by *decreasing* the young generation
size if you're running a version which has a cassandra-env which
specifies -Xmn. I'm not entirely sure because I don't know off hand
exactly based on what the occupancy trigger is happening, but the
young gen is large and the workload is such that young-gen gc:s
promote a high percentage of it's data, I suspect that can lead to CMS
triggering too late.  (So this paragraph is speculation.)

-- 
/ Peter Schuller

Re: ParNew (promotion failed)

Posted by ruslan usifov <ru...@gmail.com>.
2011/3/23 ruslan usifov <ru...@gmail.com>

> Hello
>
> Sometimes i seen in gc log follow message:
>
> 2011-03-23T14:40:56.049+0300: 14897.104: [GC 14897.104: [ParNew (promotion
> failed)
> Desired survivor size 41943040 bytes, new threshold 2 (max 2)
> - age   1:    5573024 bytes,    5573024 total
> - age   2:    5064608 bytes,   10637632 total
> : 672577K->670749K(737280K), 0.1837950 secs]14897.288: [CMS:
> 1602487K->779310K(2326528K), 4.7525580 secs] 2270940K->779310K(3063808K), [
> CMS Perm : 20073K->19913K(33420K)], 4.9365810 secs] [Times: user=5.06
> sys=0.00, real=4.93 secs]
> Total time for which application threads were stopped: 4.9378750 seconds
>

After investigations i detect that this happens when Memtableflash and
compact happens. So at this moment young part of heap is overflown and Full
GC happens.

So to resolve this i must tune young generation (HEAP_NEWSIZE) -Xmn, and
tune in_memory_compaction_limit_in_mb config parameter?

Also if memtables flushes due  "memtable_flush_after" if i separate in time
memtable flushes can this helps?