You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Yang <te...@gmail.com> on 2011/09/25 08:14:04 UTC

CMS GC initial-mark taking 6 seconds , bad?

I see the following in my GC log

1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
real=6.07 secs]

so there is a stop-the-world period of 6 seconds. does this sound bad
? or 6 seconds is OK  and we should expect the built-in
fault-tolerance of Cassandra handle this?

Thanks
Yang

Re: CMS GC initial-mark taking 6 seconds , bad?

Posted by Yang <te...@gmail.com>.

Thanks Peter and Aaron.


right now I have too much logging so the CMS logging is flushed
(somehow it does not appear in the system.log, only on stdout ), I'll
keep an eye on the correlation with ParNew as I get more logging

Yang

On Sun, Sep 25, 2011 at 3:59 AM, Peter Schuller
<pe...@infidyne.com> wrote:
>> I see the following in my GC log
>>
>> 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
>> 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
>> real=6.07 secs]
>>
>> so there is a stop-the-world period of 6 seconds. does this sound bad
>> ? or 6 seconds is OK  and we should expect the built-in
>> fault-tolerance of Cassandra handle this?
>
> initial-mark pauses are stop-the-world, so a 6 second initial-mark
> would have paused the node for those 6 seconds.
>
> The initial mark is essentially marking roots for old-gen; that should
> include thread stacks and such, but will also include younger
> generations. You might read [1] which talks a bit about it; a
> recommendation there is to make sure that initial marks happen right
> after a young-gen collection, and they advise increasing heap size
> sufficiently to allow an ininitial mark to trigger (I suppose by
> heuristics) after the young gen collection, prior to the CMS trigger.
> It makes sense, especially given that initial-mark is single-threaded,
> to try do to that (and leave the young-gen smaller, collected by the
> parallel collector). However I'm not entirely clear on what VM options
> are required for this. I had a brief look at the code but it wasn't
> obvious at cursory glance under what circumstances an initial mark is
> triggered right after young-gen vs. not. In your case you clearly have
> enough heap.
>
> Can you correlate with ParNew collections and see if the initial mark
> pauses seem to happen immediately after a ParNew, or somewhere in
> between, in the cases where they take this long?
>
> Also, as a mitigationg: What's your young generation size? One way to
> mitigate the problem, if it is indeed the young gen marking that is
> taking time, is to decrease the size of the young generation to leave
> less work for initial marking. Normally the young gen is sized based
> on expected pause times given parallel ParNew ollections, but if a
> non-parallel initial-mark is having to do marking of the same contents
> the pause time could be higher (hence the discussion above).
>
> Also, is each initial mark this long, or is that something that
> happens once in a while?
>
> As for Cassandra dealing with it: It is definitely not a good thing to
> have 6 second pauses. Even with all other nodes up, it takes time for
> the dynamic snitch to realize what's going on and you will tend to see
> a subset of requests to the cluster get 'stuck' in circumstances like
> that. Also, if you're e.g. doing QUORUM at RF=3, if a node is down for
> legitimate reasons, another node having a 6 second pause will by
> necessity cause high latency for requests during that period.
>
> [1] http://answerpot.com/showthread.php?1558705-CMS+initial+mark+pauses
>
>
> --
> / Peter Schuller (@scode on twitter)
>

Re: CMS GC initial-mark taking 6 seconds , bad?

Posted by Peter Schuller <pe...@infidyne.com>.

> I see the following in my GC log
>
> 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
> 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
> real=6.07 secs]
>
> so there is a stop-the-world period of 6 seconds. does this sound bad
> ? or 6 seconds is OK  and we should expect the built-in
> fault-tolerance of Cassandra handle this?

initial-mark pauses are stop-the-world, so a 6 second initial-mark
would have paused the node for those 6 seconds.

The initial mark is essentially marking roots for old-gen; that should
include thread stacks and such, but will also include younger
generations. You might read [1] which talks a bit about it; a
recommendation there is to make sure that initial marks happen right
after a young-gen collection, and they advise increasing heap size
sufficiently to allow an ininitial mark to trigger (I suppose by
heuristics) after the young gen collection, prior to the CMS trigger.
It makes sense, especially given that initial-mark is single-threaded,
to try do to that (and leave the young-gen smaller, collected by the
parallel collector). However I'm not entirely clear on what VM options
are required for this. I had a brief look at the code but it wasn't
obvious at cursory glance under what circumstances an initial mark is
triggered right after young-gen vs. not. In your case you clearly have
enough heap.

Can you correlate with ParNew collections and see if the initial mark
pauses seem to happen immediately after a ParNew, or somewhere in
between, in the cases where they take this long?

Also, as a mitigationg: What's your young generation size? One way to
mitigate the problem, if it is indeed the young gen marking that is
taking time, is to decrease the size of the young generation to leave
less work for initial marking. Normally the young gen is sized based
on expected pause times given parallel ParNew ollections, but if a
non-parallel initial-mark is having to do marking of the same contents
the pause time could be higher (hence the discussion above).

Also, is each initial mark this long, or is that something that
happens once in a while?

As for Cassandra dealing with it: It is definitely not a good thing to
have 6 second pauses. Even with all other nodes up, it takes time for
the dynamic snitch to realize what's going on and you will tend to see
a subset of requests to the cluster get 'stuck' in circumstances like
that. Also, if you're e.g. doing QUORUM at RF=3, if a node is down for
legitimate reasons, another node having a 6 second pause will by
necessity cause high latency for requests during that period.

[1] http://answerpot.com/showthread.php?1558705-CMS+initial+mark+pauses


-- 
/ Peter Schuller (@scode on twitter)

Re: CMS GC initial-mark taking 6 seconds , bad?

Posted by Chris Burroughs <ch...@gmail.com>.

On 10/20/2011 09:38 AM, Maxim Potekhin wrote:
> I happen to have 48GB on each machines I use in the cluster. Can I
> assume that I can't really use all of this memory productively? Do you
> have any suggestion related to that? Can I run more than one instance on
> Cassandra on the same box (using different ports) to take advantage of
> this memory, assuming the disk has enough bandwidth?

You are likely to not have good luck with a JVM heap that large.  But
you can:
 - Leave all that memory to the OS page cache.
 - mmap index files
 - use an off heap cache

All of those are productive uses.

Re: CMS GC initial-mark taking 6 seconds , bad?

Posted by Todd Burruss <bb...@expedia.com>.

Are you using cassandra's caching?  If you are then you will need to play
around with the RAM setting to find a sweet spot.  A low hit rate on the
cache (which is counter productive anyway) will cause more GC.  A high hit
rate, less GC.

If you are not caching, no need to use a large heap as the OS will do a
fine job of caching the data files for you.  And in fact I would run with
a smaller'ish heap to give the OS more room - how much depends on a
variety of things (how many column families, how often repair, write/read
ratio) - so testing is the best course of action

On 10/20/11 6:38 AM, "Maxim Potekhin" <po...@bnl.gov> wrote:

>Hello Aaron,
>
>I happen to have 48GB on each machines I use in the cluster. Can I
>assume that I can't really use all of this memory productively? Do you
>have any suggestion related to that? Can I run more than one instance on
>Cassandra on the same box (using different ports) to take advantage of
>this memory, assuming the disk has enough bandwidth?
>
>Thanks,
>Maxim
>
>On 9/25/2011 11:37 AM, aaron morton wrote:
>> It does seem long and will be felt by your application.
>>
>> Are you running a 47GB heap ? Most peeps seem to think 8 to 12 is about
>>the viable maximum.
>>
>> Cheers
>>
>> -----------------
>> Aaron Morton
>> Freelance Cassandra Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 25/09/2011, at 7:14 PM, Yang wrote:
>>
>>> I see the following in my GC log
>>>
>>> 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
>>> 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
>>> real=6.07 secs]
>>>
>>> so there is a stop-the-world period of 6 seconds. does this sound bad
>>> ? or 6 seconds is OK  and we should expect the built-in
>>> fault-tolerance of Cassandra handle this?
>>>
>>> Thanks
>>> Yang
>

Re: CMS GC initial-mark taking 6 seconds , bad?

Posted by Maxim Potekhin <po...@bnl.gov>.

Hello Aaron,

I happen to have 48GB on each machines I use in the cluster. Can I 
assume that I can't really use all of this memory productively? Do you 
have any suggestion related to that? Can I run more than one instance on 
Cassandra on the same box (using different ports) to take advantage of 
this memory, assuming the disk has enough bandwidth?

Thanks,
Maxim

On 9/25/2011 11:37 AM, aaron morton wrote:
> It does seem long and will be felt by your application.
>
> Are you running a 47GB heap ? Most peeps seem to think 8 to 12 is about the viable maximum.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 25/09/2011, at 7:14 PM, Yang wrote:
>
>> I see the following in my GC log
>>
>> 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
>> 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
>> real=6.07 secs]
>>
>> so there is a stop-the-world period of 6 seconds. does this sound bad
>> ? or 6 seconds is OK  and we should expect the built-in
>> fault-tolerance of Cassandra handle this?
>>
>> Thanks
>> Yang

Re: CMS GC initial-mark taking 6 seconds , bad?

Posted by aaron morton <aa...@thelastpickle.com>.

It does seem long and will be felt by your application. 

Are you running a 47GB heap ? Most peeps seem to think 8 to 12 is about the viable maximum. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 25/09/2011, at 7:14 PM, Yang wrote:

> I see the following in my GC log
> 
> 1910.513: [GC [1 CMS-initial-mark: 2598619K(26214400K)]
> 13749939K(49807360K), 6.0696680 secs] [Times: user=6.10 sys=0.00,
> real=6.07 secs]
> 
> so there is a stop-the-world period of 6 seconds. does this sound bad
> ? or 6 seconds is OK  and we should expect the built-in
> fault-tolerance of Cassandra handle this?
> 
> Thanks
> Yang