You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Sean Sechrist <ss...@gmail.com> on 2010/11/24 16:01:45 UTC

Garbage collection issues

Hey guys,

I just want to get an idea about how everyone avoids these long GC pauses
that cause regionservers to die.

What kind of java heap and garbage collection settings do you use?

What do you do to make sure that the HBase vm never uses swap? I have heard
turning off swap altogether can be dangerous, so right now we have the
setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia, we
see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
high?

To try to avoid using too much memory, is reducing the memstore upper/lower
limit, or the block cache size a good idea? Should we just tune down HBase's
total heap to try to avoid swap?

In terms of our specific problem:

We seem to keep running into garbage collection pauses that cause the
regionservers to die. We have mix of some random read jobs, as well as a few
full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and we
are always inserting data. We would rather sacrifice a little speed for
stability, if that means anything. We have 7 nodes (RS + DN + TT) with 12GB
max heap given to HBase, and 24GB memory total.

We were using the following garbage collection options:
-XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
-XX:CMSInitiatingOccupancyFraction=75

After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
trying to lower NewSize/MaxNewSize to 6m as well as reducing
CMSInitiatingOccupancyFraction to 50.

We see messages like this in our GC logs:

2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
[CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
 (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340 secs]
10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]

There's a lot of questions there, but I definitely appreciate any advice or
input anybody else has. Thanks so much!

-Sean

Re: Garbage collection issues

Posted by Alex Baranau <al...@gmail.com>.

Just wanted to add to Todd's explanation this link:
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html (Java
SE 6 HotSpot[tm] Virtual Machine Garbage Collection Tuning).
It gives more detailed (to some extent of course, on this deep topic)
description on what Todd mentioned.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Wed, Nov 24, 2010 at 10:34 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com>
> wrote:
>
> > Hey guys,
> >
> > I just want to get an idea about how everyone avoids these long GC pauses
> > that cause regionservers to die.
> >
> > What kind of java heap and garbage collection settings do you use?
> >
> > What do you do to make sure that the HBase vm never uses swap? I have
> heard
> > turning off swap altogether can be dangerous, so right now we have the
> > setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia,
> we
> > see the "CPU wio" metric at around 4.5% before one of our crashes. Is
> that
> > high?
> >
> > To try to avoid using too much memory, is reducing the memstore
> upper/lower
> > limit, or the block cache size a good idea? Should we just tune down
> > HBase's
> > total heap to try to avoid swap?
> >
> > In terms of our specific problem:
> >
> > We seem to keep running into garbage collection pauses that cause the
> > regionservers to die. We have mix of some random read jobs, as well as a
> > few
> > full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and
> we
> > are always inserting data. We would rather sacrifice a little speed for
> > stability, if that means anything. We have 7 nodes (RS + DN + TT) with
> 12GB
> > max heap given to HBase, and 24GB memory total.
> >
> > We were using the following garbage collection options:
> > -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
> > -XX:CMSInitiatingOccupancyFraction=75
> >
> > After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
> > trying to lower NewSize/MaxNewSize to 6m as well as reducing
> > CMSInitiatingOccupancyFraction to 50.
> >
>
> Rather than reducing the new size, you should consider increasing new size
> if you're OK with higher latency but fewer long GC pauses.
>
> GC is a complicated subject, but here are a few rules of thumb:
>
> - A larger young generation means that the young GC pauses, which are
> stop-the-world, will take longer. In my experience it's somewhere around 1
> second per GB of new size. So, if you're OK with periodic 1 second pauses,
> a
> large (1GB) new size should be fine.
> - A larger young generation also means that less data will get tenured to
> the old generation. This means that the old generation will have to collect
> less often and also that it will become less fragmented.
> - In HBase, the long (45second+) pauses generally happen when promotion
> fails due to heap fragmentation in the old generation. So, it falls back to
> stop-the-world compacting collection which takes a long time.
>
> So, in general, a large young gen will reduce the frequency of super-long
> pauses, but will increase the frequency of shorter pauses.
>
> It sounds like you may be OK with longer young gen pauses, so maybe
> consider
> new size at 512M with your 12G total heap?
>
> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS
> to always be running which isn't that efficient.
>
> -Todd
>
>
> >
> > We see messages like this in our GC logs:
> >>
> >> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> >
> >  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
> >> secs]
> >
> >
> >
> > 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew
> (promotion
> > failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
> > [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
> > 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> >  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
> > secs]
> > 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
> > 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
> >
> > There's a lot of questions there, but I definitely appreciate any advice
> or
> > input anybody else has. Thanks so much!
> >
> > -Sean
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Garbage collection issues

Posted by Lars George <la...@gmail.com>.

Hi Friso,

Great to know! Todd was the last one to try to crash G1 and the recent iteration seemed much more stable. 

Lars

On Nov 29, 2010, at 10:49, Friso van Vollenhoven <fv...@xebia.com> wrote:

> On a slightly related note, we've been running with G1 with default settings on a 16GB heap for some weeks now. It's never given us trouble, so I didn't do any real analysis on the GC times, just some eye balling.
> 
> I looked at the longer GCs (everything longer than 1 second: grep -C 5 -i real=[1-9] gc-hbase.log), which gives a list of full GCs all around 10s. The minor pauses all appear to be around 0.2s. I can pastebin a GC log if anyone is interested in the G1 behavior.
> 
> 
> 
> Friso
> 
> 
> 
> On 29 nov 2010, at 09:47, Ryan Rawson wrote:
> 
>> I'd love to hear the kinds of minor pauses you get... left alone to
>> it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your
>> xmx is large enough, at that size you are looking at 800ms minor
>> pauses!
>> 
>> It's a tough subject.
>> 
>> -ryan
>> 
>> On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ss...@gmail.com> wrote:
>>> Interesting. The settings we tried earlier today slowed jobs significantly,
>>> but no failures (yet). We're going to try the 512MB newSize and 60%
>>> CMSInitiatingOccupancyFraction. 1 second pauses here and there would be OK
>>> for us.... we just want to avoid the long pauses right now. We'll also do
>>> what we can to avoid swapping. The ganglia metrics on on there.
>>> 
>>> Thanks,
>>> Sean
>>> 
>>> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <to...@cloudera.com> wrote:
>>> 
>>>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com>wrote:
>>>> 
>>>>> Hey guys,
>>>>> 
>>>>> I just want to get an idea about how everyone avoids these long GC pauses
>>>>> that cause regionservers to die.
>>>>> 
>>>>> What kind of java heap and garbage collection settings do you use?
>>>>> 
>>>>> What do you do to make sure that the HBase vm never uses swap? I have
>>>>> heard
>>>>> turning off swap altogether can be dangerous, so right now we have the
>>>>> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia,
>>>>> we
>>>>> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
>>>>> high?
>>>>> 
>>>>> To try to avoid using too much memory, is reducing the memstore
>>>>> upper/lower
>>>>> limit, or the block cache size a good idea? Should we just tune down
>>>>> HBase's
>>>>> total heap to try to avoid swap?
>>>>> 
>>>>> In terms of our specific problem:
>>>>> 
>>>>> We seem to keep running into garbage collection pauses that cause the
>>>>> regionservers to die. We have mix of some random read jobs, as well as a
>>>>> few
>>>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and
>>>>> we
>>>>> are always inserting data. We would rather sacrifice a little speed for
>>>>> stability, if that means anything. We have 7 nodes (RS + DN + TT) with
>>>>> 12GB
>>>>> max heap given to HBase, and 24GB memory total.
>>>>> 
>>>>> We were using the following garbage collection options:
>>>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
>>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>>> 
>>>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
>>>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
>>>>> CMSInitiatingOccupancyFraction to 50.
>>>>> 
>>>> 
>>>> Rather than reducing the new size, you should consider increasing new size
>>>> if you're OK with higher latency but fewer long GC pauses.
>>>> 
>>>> GC is a complicated subject, but here are a few rules of thumb:
>>>> 
>>>> - A larger young generation means that the young GC pauses, which are
>>>> stop-the-world, will take longer. In my experience it's somewhere around 1
>>>> second per GB of new size. So, if you're OK with periodic 1 second pauses, a
>>>> large (1GB) new size should be fine.
>>>> - A larger young generation also means that less data will get tenured to
>>>> the old generation. This means that the old generation will have to collect
>>>> less often and also that it will become less fragmented.
>>>> - In HBase, the long (45second+) pauses generally happen when promotion
>>>> fails due to heap fragmentation in the old generation. So, it falls back to
>>>> stop-the-world compacting collection which takes a long time.
>>>> 
>>>> So, in general, a large young gen will reduce the frequency of super-long
>>>> pauses, but will increase the frequency of shorter pauses.
>>>> 
>>>> It sounds like you may be OK with longer young gen pauses, so maybe
>>>> consider new size at 512M with your 12G total heap?
>>>> 
>>>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS
>>>> to always be running which isn't that efficient.
>>>> 
>>>> -Todd
>>>> 
>>>> 
>>>>> 
>>>>> We see messages like this in our GC logs:
>>>>> 
>>>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>>>> 
>>>>>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>>>>> secs]
>>>>> 
>>>>> 
>>>>> 
>>>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
>>>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
>>>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
>>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>>>> (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>>>> secs]
>>>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
>>>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
>>>>> 
>>>>> There's a lot of questions there, but I definitely appreciate any advice
>>>>> or
>>>>> input anybody else has. Thanks so much!
>>>>> 
>>>>> -Sean
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Todd Lipcon
>>>> Software Engineer, Cloudera
>>>> 
>>> 
>

Re: Garbage collection issues

Posted by Todd Lipcon <to...@cloudera.com>.

On Mon, Nov 29, 2010 at 6:33 AM, Sean Sechrist <ss...@gmail.com> wrote:

> Just an update, in case anyone's interested in our performance numbers:
>
> With the 512MB newSize, our minor GC pauses are generally less than .05s,
> although we see a fair amount get up around .15s. We still see some
> promotion failures causing full pauses over a minute occasionally. But we
> have a script running to automatically restart our regionservers if that
> happens. Things seem to be going ok right now.
>
> On a related note: If a region server encounters the GC pause of death,
> will
> all of the writes in its memstore at the time be lost (without using WAL)?
> I
> think it would be.
>

Yep, they would be - that's why the WAL is important.

One thing I've been thinking about is a way to have an HBase-orchestrated
constant rolling System.gc(). If we can detect heap fragmentation before it
causes a long pause, we can shed regions gracefully, do system.gc(), and
then pick them up again. A little tricky but should solve these issues once
and forall, especially on big clusters where constant rolling restart isn't
a big deal compared to total capacity.

-Todd


> On Mon, Nov 29, 2010 at 4:49 AM, Friso van Vollenhoven <
> fvanvollenhoven@xebia.com> wrote:
>
> > On a slightly related note, we've been running with G1 with default
> > settings on a 16GB heap for some weeks now. It's never given us trouble,
> so
> > I didn't do any real analysis on the GC times, just some eye balling.
> >
> > I looked at the longer GCs (everything longer than 1 second: grep -C 5 -i
> > real=[1-9] gc-hbase.log), which gives a list of full GCs all around 10s.
> The
> > minor pauses all appear to be around 0.2s. I can pastebin a GC log if
> anyone
> > is interested in the G1 behavior.
> >
> >
> >
> > Friso
> >
> >
> >
> > On 29 nov 2010, at 09:47, Ryan Rawson wrote:
> >
> > > I'd love to hear the kinds of minor pauses you get... left alone to
> > > it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your
> > > xmx is large enough, at that size you are looking at 800ms minor
> > > pauses!
> > >
> > > It's a tough subject.
> > >
> > > -ryan
> > >
> > > On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ss...@gmail.com>
> > wrote:
> > >> Interesting. The settings we tried earlier today slowed jobs
> > significantly,
> > >> but no failures (yet). We're going to try the 512MB newSize and 60%
> > >> CMSInitiatingOccupancyFraction. 1 second pauses here and there would
> be
> > OK
> > >> for us.... we just want to avoid the long pauses right now. We'll also
> > do
> > >> what we can to avoid swapping. The ganglia metrics on on there.
> > >>
> > >> Thanks,
> > >> Sean
> > >>
> > >> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <to...@cloudera.com>
> wrote:
> > >>
> > >>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ssechrist@gmail.com
> > >wrote:
> > >>>
> > >>>> Hey guys,
> > >>>>
> > >>>> I just want to get an idea about how everyone avoids these long GC
> > pauses
> > >>>> that cause regionservers to die.
> > >>>>
> > >>>> What kind of java heap and garbage collection settings do you use?
> > >>>>
> > >>>> What do you do to make sure that the HBase vm never uses swap? I
> have
> > >>>> heard
> > >>>> turning off swap altogether can be dangerous, so right now we have
> the
> > >>>> setting vm.swappiness=0. How do you tell if it's using swap? On
> > Ganglia,
> > >>>> we
> > >>>> see the "CPU wio" metric at around 4.5% before one of our crashes.
> Is
> > that
> > >>>> high?
> > >>>>
> > >>>> To try to avoid using too much memory, is reducing the memstore
> > >>>> upper/lower
> > >>>> limit, or the block cache size a good idea? Should we just tune down
> > >>>> HBase's
> > >>>> total heap to try to avoid swap?
> > >>>>
> > >>>> In terms of our specific problem:
> > >>>>
> > >>>> We seem to keep running into garbage collection pauses that cause
> the
> > >>>> regionservers to die. We have mix of some random read jobs, as well
> as
> > a
> > >>>> few
> > >>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions),
> > and
> > >>>> we
> > >>>> are always inserting data. We would rather sacrifice a little speed
> > for
> > >>>> stability, if that means anything. We have 7 nodes (RS + DN + TT)
> with
> > >>>> 12GB
> > >>>> max heap given to HBase, and 24GB memory total.
> > >>>>
> > >>>> We were using the following garbage collection options:
> > >>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
> > >>>> -XX:CMSInitiatingOccupancyFraction=75
> > >>>>
> > >>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning,
> we
> > are
> > >>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
> > >>>> CMSInitiatingOccupancyFraction to 50.
> > >>>>
> > >>>
> > >>> Rather than reducing the new size, you should consider increasing new
> > size
> > >>> if you're OK with higher latency but fewer long GC pauses.
> > >>>
> > >>> GC is a complicated subject, but here are a few rules of thumb:
> > >>>
> > >>> - A larger young generation means that the young GC pauses, which are
> > >>> stop-the-world, will take longer. In my experience it's somewhere
> > around 1
> > >>> second per GB of new size. So, if you're OK with periodic 1 second
> > pauses, a
> > >>> large (1GB) new size should be fine.
> > >>> - A larger young generation also means that less data will get
> tenured
> > to
> > >>> the old generation. This means that the old generation will have to
> > collect
> > >>> less often and also that it will become less fragmented.
> > >>> - In HBase, the long (45second+) pauses generally happen when
> promotion
> > >>> fails due to heap fragmentation in the old generation. So, it falls
> > back to
> > >>> stop-the-world compacting collection which takes a long time.
> > >>>
> > >>> So, in general, a large young gen will reduce the frequency of
> > super-long
> > >>> pauses, but will increase the frequency of shorter pauses.
> > >>>
> > >>> It sounds like you may be OK with longer young gen pauses, so maybe
> > >>> consider new size at 512M with your 12G total heap?
> > >>>
> > >>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will
> cause
> > CMS
> > >>> to always be running which isn't that efficient.
> > >>>
> > >>> -Todd
> > >>>
> > >>>
> > >>>>
> > >>>> We see messages like this in our GC logs:
> > >>>>
> > >>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> > >>>>
> > >>>>   (concurrent mode failure): 10126729K->5760080K(13246464K),
> > 91.2530340
> > >>>>> secs]
> > >>>>
> > >>>>
> > >>>>
> > >>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew
> > (promotion
> > >>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
> > >>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
> > >>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> > >>>>  (concurrent mode failure): 10126729K->5760080K(13246464K),
> 91.2530340
> > >>>> secs]
> > >>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
> > >>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
> > >>>>
> > >>>> There's a lot of questions there, but I definitely appreciate any
> > advice
> > >>>> or
> > >>>> input anybody else has. Thanks so much!
> > >>>>
> > >>>> -Sean
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Todd Lipcon
> > >>> Software Engineer, Cloudera
> > >>>
> > >>
> >
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Garbage collection issues

Posted by Sean Sechrist <ss...@gmail.com>.

Just an update, in case anyone's interested in our performance numbers:

With the 512MB newSize, our minor GC pauses are generally less than .05s,
although we see a fair amount get up around .15s. We still see some
promotion failures causing full pauses over a minute occasionally. But we
have a script running to automatically restart our regionservers if that
happens. Things seem to be going ok right now.

On a related note: If a region server encounters the GC pause of death, will
all of the writes in its memstore at the time be lost (without using WAL)? I
think it would be.

-Sean

On Mon, Nov 29, 2010 at 4:49 AM, Friso van Vollenhoven <
fvanvollenhoven@xebia.com> wrote:

> On a slightly related note, we've been running with G1 with default
> settings on a 16GB heap for some weeks now. It's never given us trouble, so
> I didn't do any real analysis on the GC times, just some eye balling.
>
> I looked at the longer GCs (everything longer than 1 second: grep -C 5 -i
> real=[1-9] gc-hbase.log), which gives a list of full GCs all around 10s. The
> minor pauses all appear to be around 0.2s. I can pastebin a GC log if anyone
> is interested in the G1 behavior.
>
>
>
> Friso
>
>
>
> On 29 nov 2010, at 09:47, Ryan Rawson wrote:
>
> > I'd love to hear the kinds of minor pauses you get... left alone to
> > it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your
> > xmx is large enough, at that size you are looking at 800ms minor
> > pauses!
> >
> > It's a tough subject.
> >
> > -ryan
> >
> > On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ss...@gmail.com>
> wrote:
> >> Interesting. The settings we tried earlier today slowed jobs
> significantly,
> >> but no failures (yet). We're going to try the 512MB newSize and 60%
> >> CMSInitiatingOccupancyFraction. 1 second pauses here and there would be
> OK
> >> for us.... we just want to avoid the long pauses right now. We'll also
> do
> >> what we can to avoid swapping. The ganglia metrics on on there.
> >>
> >> Thanks,
> >> Sean
> >>
> >> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <to...@cloudera.com> wrote:
> >>
> >>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ssechrist@gmail.com
> >wrote:
> >>>
> >>>> Hey guys,
> >>>>
> >>>> I just want to get an idea about how everyone avoids these long GC
> pauses
> >>>> that cause regionservers to die.
> >>>>
> >>>> What kind of java heap and garbage collection settings do you use?
> >>>>
> >>>> What do you do to make sure that the HBase vm never uses swap? I have
> >>>> heard
> >>>> turning off swap altogether can be dangerous, so right now we have the
> >>>> setting vm.swappiness=0. How do you tell if it's using swap? On
> Ganglia,
> >>>> we
> >>>> see the "CPU wio" metric at around 4.5% before one of our crashes. Is
> that
> >>>> high?
> >>>>
> >>>> To try to avoid using too much memory, is reducing the memstore
> >>>> upper/lower
> >>>> limit, or the block cache size a good idea? Should we just tune down
> >>>> HBase's
> >>>> total heap to try to avoid swap?
> >>>>
> >>>> In terms of our specific problem:
> >>>>
> >>>> We seem to keep running into garbage collection pauses that cause the
> >>>> regionservers to die. We have mix of some random read jobs, as well as
> a
> >>>> few
> >>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions),
> and
> >>>> we
> >>>> are always inserting data. We would rather sacrifice a little speed
> for
> >>>> stability, if that means anything. We have 7 nodes (RS + DN + TT) with
> >>>> 12GB
> >>>> max heap given to HBase, and 24GB memory total.
> >>>>
> >>>> We were using the following garbage collection options:
> >>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
> >>>> -XX:CMSInitiatingOccupancyFraction=75
> >>>>
> >>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we
> are
> >>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
> >>>> CMSInitiatingOccupancyFraction to 50.
> >>>>
> >>>
> >>> Rather than reducing the new size, you should consider increasing new
> size
> >>> if you're OK with higher latency but fewer long GC pauses.
> >>>
> >>> GC is a complicated subject, but here are a few rules of thumb:
> >>>
> >>> - A larger young generation means that the young GC pauses, which are
> >>> stop-the-world, will take longer. In my experience it's somewhere
> around 1
> >>> second per GB of new size. So, if you're OK with periodic 1 second
> pauses, a
> >>> large (1GB) new size should be fine.
> >>> - A larger young generation also means that less data will get tenured
> to
> >>> the old generation. This means that the old generation will have to
> collect
> >>> less often and also that it will become less fragmented.
> >>> - In HBase, the long (45second+) pauses generally happen when promotion
> >>> fails due to heap fragmentation in the old generation. So, it falls
> back to
> >>> stop-the-world compacting collection which takes a long time.
> >>>
> >>> So, in general, a large young gen will reduce the frequency of
> super-long
> >>> pauses, but will increase the frequency of shorter pauses.
> >>>
> >>> It sounds like you may be OK with longer young gen pauses, so maybe
> >>> consider new size at 512M with your 12G total heap?
> >>>
> >>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause
> CMS
> >>> to always be running which isn't that efficient.
> >>>
> >>> -Todd
> >>>
> >>>
> >>>>
> >>>> We see messages like this in our GC logs:
> >>>>
> >>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> >>>>
> >>>>   (concurrent mode failure): 10126729K->5760080K(13246464K),
> 91.2530340
> >>>>> secs]
> >>>>
> >>>>
> >>>>
> >>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew
> (promotion
> >>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
> >>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
> >>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
> >>>>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
> >>>> secs]
> >>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
> >>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
> >>>>
> >>>> There's a lot of questions there, but I definitely appreciate any
> advice
> >>>> or
> >>>> input anybody else has. Thanks so much!
> >>>>
> >>>> -Sean
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Todd Lipcon
> >>> Software Engineer, Cloudera
> >>>
> >>
>
>

Re: Garbage collection issues

Posted by Friso van Vollenhoven <fv...@xebia.com>.

On a slightly related note, we've been running with G1 with default settings on a 16GB heap for some weeks now. It's never given us trouble, so I didn't do any real analysis on the GC times, just some eye balling.

I looked at the longer GCs (everything longer than 1 second: grep -C 5 -i real=[1-9] gc-hbase.log), which gives a list of full GCs all around 10s. The minor pauses all appear to be around 0.2s. I can pastebin a GC log if anyone is interested in the G1 behavior.



Friso



On 29 nov 2010, at 09:47, Ryan Rawson wrote:

> I'd love to hear the kinds of minor pauses you get... left alone to
> it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your
> xmx is large enough, at that size you are looking at 800ms minor
> pauses!
> 
> It's a tough subject.
> 
> -ryan
> 
> On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ss...@gmail.com> wrote:
>> Interesting. The settings we tried earlier today slowed jobs significantly,
>> but no failures (yet). We're going to try the 512MB newSize and 60%
>> CMSInitiatingOccupancyFraction. 1 second pauses here and there would be OK
>> for us.... we just want to avoid the long pauses right now. We'll also do
>> what we can to avoid swapping. The ganglia metrics on on there.
>> 
>> Thanks,
>> Sean
>> 
>> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> 
>>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com>wrote:
>>> 
>>>> Hey guys,
>>>> 
>>>> I just want to get an idea about how everyone avoids these long GC pauses
>>>> that cause regionservers to die.
>>>> 
>>>> What kind of java heap and garbage collection settings do you use?
>>>> 
>>>> What do you do to make sure that the HBase vm never uses swap? I have
>>>> heard
>>>> turning off swap altogether can be dangerous, so right now we have the
>>>> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia,
>>>> we
>>>> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
>>>> high?
>>>> 
>>>> To try to avoid using too much memory, is reducing the memstore
>>>> upper/lower
>>>> limit, or the block cache size a good idea? Should we just tune down
>>>> HBase's
>>>> total heap to try to avoid swap?
>>>> 
>>>> In terms of our specific problem:
>>>> 
>>>> We seem to keep running into garbage collection pauses that cause the
>>>> regionservers to die. We have mix of some random read jobs, as well as a
>>>> few
>>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and
>>>> we
>>>> are always inserting data. We would rather sacrifice a little speed for
>>>> stability, if that means anything. We have 7 nodes (RS + DN + TT) with
>>>> 12GB
>>>> max heap given to HBase, and 24GB memory total.
>>>> 
>>>> We were using the following garbage collection options:
>>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
>>>> -XX:CMSInitiatingOccupancyFraction=75
>>>> 
>>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
>>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
>>>> CMSInitiatingOccupancyFraction to 50.
>>>> 
>>> 
>>> Rather than reducing the new size, you should consider increasing new size
>>> if you're OK with higher latency but fewer long GC pauses.
>>> 
>>> GC is a complicated subject, but here are a few rules of thumb:
>>> 
>>> - A larger young generation means that the young GC pauses, which are
>>> stop-the-world, will take longer. In my experience it's somewhere around 1
>>> second per GB of new size. So, if you're OK with periodic 1 second pauses, a
>>> large (1GB) new size should be fine.
>>> - A larger young generation also means that less data will get tenured to
>>> the old generation. This means that the old generation will have to collect
>>> less often and also that it will become less fragmented.
>>> - In HBase, the long (45second+) pauses generally happen when promotion
>>> fails due to heap fragmentation in the old generation. So, it falls back to
>>> stop-the-world compacting collection which takes a long time.
>>> 
>>> So, in general, a large young gen will reduce the frequency of super-long
>>> pauses, but will increase the frequency of shorter pauses.
>>> 
>>> It sounds like you may be OK with longer young gen pauses, so maybe
>>> consider new size at 512M with your 12G total heap?
>>> 
>>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS
>>> to always be running which isn't that efficient.
>>> 
>>> -Todd
>>> 
>>> 
>>>> 
>>>> We see messages like this in our GC logs:
>>>> 
>>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>>> 
>>>>   (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>>>> secs]
>>>> 
>>>> 
>>>> 
>>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
>>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
>>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>>>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>>> secs]
>>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
>>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
>>>> 
>>>> There's a lot of questions there, but I definitely appreciate any advice
>>>> or
>>>> input anybody else has. Thanks so much!
>>>> 
>>>> -Sean
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>>> 
>>

Re: Garbage collection issues

Posted by Ryan Rawson <ry...@gmail.com>.

I'd love to hear the kinds of minor pauses you get... left alone to
it's devices, 1.6.0_14 or so wants to grow the new gen to 1gb if your
xmx is large enough, at that size you are looking at 800ms minor
pauses!

It's a tough subject.

-ryan

On Wed, Nov 24, 2010 at 12:52 PM, Sean Sechrist <ss...@gmail.com> wrote:
> Interesting. The settings we tried earlier today slowed jobs significantly,
> but no failures (yet). We're going to try the 512MB newSize and 60%
> CMSInitiatingOccupancyFraction. 1 second pauses here and there would be OK
> for us.... we just want to avoid the long pauses right now. We'll also do
> what we can to avoid swapping. The ganglia metrics on on there.
>
> Thanks,
> Sean
>
> On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com>wrote:
>>
>>> Hey guys,
>>>
>>> I just want to get an idea about how everyone avoids these long GC pauses
>>> that cause regionservers to die.
>>>
>>> What kind of java heap and garbage collection settings do you use?
>>>
>>> What do you do to make sure that the HBase vm never uses swap? I have
>>> heard
>>> turning off swap altogether can be dangerous, so right now we have the
>>> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia,
>>> we
>>> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
>>> high?
>>>
>>> To try to avoid using too much memory, is reducing the memstore
>>> upper/lower
>>> limit, or the block cache size a good idea? Should we just tune down
>>> HBase's
>>> total heap to try to avoid swap?
>>>
>>> In terms of our specific problem:
>>>
>>> We seem to keep running into garbage collection pauses that cause the
>>> regionservers to die. We have mix of some random read jobs, as well as a
>>> few
>>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and
>>> we
>>> are always inserting data. We would rather sacrifice a little speed for
>>> stability, if that means anything. We have 7 nodes (RS + DN + TT) with
>>> 12GB
>>> max heap given to HBase, and 24GB memory total.
>>>
>>> We were using the following garbage collection options:
>>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
>>> -XX:CMSInitiatingOccupancyFraction=75
>>>
>>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
>>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
>>> CMSInitiatingOccupancyFraction to 50.
>>>
>>
>> Rather than reducing the new size, you should consider increasing new size
>> if you're OK with higher latency but fewer long GC pauses.
>>
>> GC is a complicated subject, but here are a few rules of thumb:
>>
>> - A larger young generation means that the young GC pauses, which are
>> stop-the-world, will take longer. In my experience it's somewhere around 1
>> second per GB of new size. So, if you're OK with periodic 1 second pauses, a
>> large (1GB) new size should be fine.
>> - A larger young generation also means that less data will get tenured to
>> the old generation. This means that the old generation will have to collect
>> less often and also that it will become less fragmented.
>> - In HBase, the long (45second+) pauses generally happen when promotion
>> fails due to heap fragmentation in the old generation. So, it falls back to
>> stop-the-world compacting collection which takes a long time.
>>
>> So, in general, a large young gen will reduce the frequency of super-long
>> pauses, but will increase the frequency of shorter pauses.
>>
>> It sounds like you may be OK with longer young gen pauses, so maybe
>> consider new size at 512M with your 12G total heap?
>>
>> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS
>> to always be running which isn't that efficient.
>>
>> -Todd
>>
>>
>>>
>>> We see messages like this in our GC logs:
>>>
>>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>>
>>>   (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>>> secs]
>>>
>>>
>>>
>>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
>>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
>>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>> secs]
>>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
>>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
>>>
>>> There's a lot of questions there, but I definitely appreciate any advice
>>> or
>>> input anybody else has. Thanks so much!
>>>
>>> -Sean
>>>
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Re: Garbage collection issues

Posted by Sean Sechrist <ss...@gmail.com>.

Interesting. The settings we tried earlier today slowed jobs significantly,
but no failures (yet). We're going to try the 512MB newSize and 60%
CMSInitiatingOccupancyFraction. 1 second pauses here and there would be OK
for us.... we just want to avoid the long pauses right now. We'll also do
what we can to avoid swapping. The ganglia metrics on on there.

Thanks,
Sean

On Wed, Nov 24, 2010 at 3:34 PM, Todd Lipcon <to...@cloudera.com> wrote:

> On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com>wrote:
>
>> Hey guys,
>>
>> I just want to get an idea about how everyone avoids these long GC pauses
>> that cause regionservers to die.
>>
>> What kind of java heap and garbage collection settings do you use?
>>
>> What do you do to make sure that the HBase vm never uses swap? I have
>> heard
>> turning off swap altogether can be dangerous, so right now we have the
>> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia,
>> we
>> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
>> high?
>>
>> To try to avoid using too much memory, is reducing the memstore
>> upper/lower
>> limit, or the block cache size a good idea? Should we just tune down
>> HBase's
>> total heap to try to avoid swap?
>>
>> In terms of our specific problem:
>>
>> We seem to keep running into garbage collection pauses that cause the
>> regionservers to die. We have mix of some random read jobs, as well as a
>> few
>> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and
>> we
>> are always inserting data. We would rather sacrifice a little speed for
>> stability, if that means anything. We have 7 nodes (RS + DN + TT) with
>> 12GB
>> max heap given to HBase, and 24GB memory total.
>>
>> We were using the following garbage collection options:
>> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
>> -XX:CMSInitiatingOccupancyFraction=75
>>
>> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
>> trying to lower NewSize/MaxNewSize to 6m as well as reducing
>> CMSInitiatingOccupancyFraction to 50.
>>
>
> Rather than reducing the new size, you should consider increasing new size
> if you're OK with higher latency but fewer long GC pauses.
>
> GC is a complicated subject, but here are a few rules of thumb:
>
> - A larger young generation means that the young GC pauses, which are
> stop-the-world, will take longer. In my experience it's somewhere around 1
> second per GB of new size. So, if you're OK with periodic 1 second pauses, a
> large (1GB) new size should be fine.
> - A larger young generation also means that less data will get tenured to
> the old generation. This means that the old generation will have to collect
> less often and also that it will become less fragmented.
> - In HBase, the long (45second+) pauses generally happen when promotion
> fails due to heap fragmentation in the old generation. So, it falls back to
> stop-the-world compacting collection which takes a long time.
>
> So, in general, a large young gen will reduce the frequency of super-long
> pauses, but will increase the frequency of shorter pauses.
>
> It sounds like you may be OK with longer young gen pauses, so maybe
> consider new size at 512M with your 12G total heap?
>
> I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS
> to always be running which isn't that efficient.
>
> -Todd
>
>
>>
>> We see messages like this in our GC logs:
>>
>>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>
>>   (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>>> secs]
>>
>>
>>
>> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
>> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
>> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>> secs]
>> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
>> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
>>
>> There's a lot of questions there, but I definitely appreciate any advice
>> or
>> input anybody else has. Thanks so much!
>>
>> -Sean
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Garbage collection issues

Posted by Todd Lipcon <to...@cloudera.com>.

On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com> wrote:

> Hey guys,
>
> I just want to get an idea about how everyone avoids these long GC pauses
> that cause regionservers to die.
>
> What kind of java heap and garbage collection settings do you use?
>
> What do you do to make sure that the HBase vm never uses swap? I have heard
> turning off swap altogether can be dangerous, so right now we have the
> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia, we
> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
> high?
>
> To try to avoid using too much memory, is reducing the memstore upper/lower
> limit, or the block cache size a good idea? Should we just tune down
> HBase's
> total heap to try to avoid swap?
>
> In terms of our specific problem:
>
> We seem to keep running into garbage collection pauses that cause the
> regionservers to die. We have mix of some random read jobs, as well as a
> few
> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and we
> are always inserting data. We would rather sacrifice a little speed for
> stability, if that means anything. We have 7 nodes (RS + DN + TT) with 12GB
> max heap given to HBase, and 24GB memory total.
>
> We were using the following garbage collection options:
> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
> -XX:CMSInitiatingOccupancyFraction=75
>
> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
> trying to lower NewSize/MaxNewSize to 6m as well as reducing
> CMSInitiatingOccupancyFraction to 50.
>

Rather than reducing the new size, you should consider increasing new size
if you're OK with higher latency but fewer long GC pauses.

GC is a complicated subject, but here are a few rules of thumb:

- A larger young generation means that the young GC pauses, which are
stop-the-world, will take longer. In my experience it's somewhere around 1
second per GB of new size. So, if you're OK with periodic 1 second pauses, a
large (1GB) new size should be fine.
- A larger young generation also means that less data will get tenured to
the old generation. This means that the old generation will have to collect
less often and also that it will become less fragmented.
- In HBase, the long (45second+) pauses generally happen when promotion
fails due to heap fragmentation in the old generation. So, it falls back to
stop-the-world compacting collection which takes a long time.

So, in general, a large young gen will reduce the frequency of super-long
pauses, but will increase the frequency of shorter pauses.

It sounds like you may be OK with longer young gen pauses, so maybe consider
new size at 512M with your 12G total heap?

I also wouldn't tune CMSInitiatingOccupancy below 60% - that will cause CMS
to always be running which isn't that efficient.

-Todd


>
> We see messages like this in our GC logs:
>>
>> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>
>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
>> secs]
>
>
>
> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340
> secs]
> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
>
> There's a lot of questions there, but I definitely appreciate any advice or
> input anybody else has. Thanks so much!
>
> -Sean
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Garbage collection issues

Posted by Jean-Daniel Cryans <jd...@apache.org>.

Setting swappiness to 0 is one thing, but does it swap at all? If so,
then it's definitely a problem and the fact that the real was 4x
higher than the user CPU on that big GC pause strongly indicates
swapping. Setup ganglia, and watch your swap. The typical error is
setting too many tasks per node (or heaps that are too big) so that
when all the reducers and the mappers run at the same time, they blow
out your memory. Giving less memory to HBase could also help, 8GB is
usually the upper bound since the bigger the heap. the longer it takes
to do a full GC.

If you want to slow your jobs, tune down the scanner caching on those
full scan jobs and be sure to set this to false to reduce churn
http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/client/Scan.html#setCacheBlocks(boolean)

J-D

On Wed, Nov 24, 2010 at 7:01 AM, Sean Sechrist <ss...@gmail.com> wrote:
> Hey guys,
>
> I just want to get an idea about how everyone avoids these long GC pauses
> that cause regionservers to die.
>
> What kind of java heap and garbage collection settings do you use?
>
> What do you do to make sure that the HBase vm never uses swap? I have heard
> turning off swap altogether can be dangerous, so right now we have the
> setting vm.swappiness=0. How do you tell if it's using swap? On Ganglia, we
> see the "CPU wio" metric at around 4.5% before one of our crashes. Is that
> high?
>
> To try to avoid using too much memory, is reducing the memstore upper/lower
> limit, or the block cache size a good idea? Should we just tune down HBase's
> total heap to try to avoid swap?
>
> In terms of our specific problem:
>
> We seem to keep running into garbage collection pauses that cause the
> regionservers to die. We have mix of some random read jobs, as well as a few
> full-scan jobs (~1.5 billion rows, 800-900GB of data, 1500 regions), and we
> are always inserting data. We would rather sacrifice a little speed for
> stability, if that means anything. We have 7 nodes (RS + DN + TT) with 12GB
> max heap given to HBase, and 24GB memory total.
>
> We were using the following garbage collection options:
> -XX:+UseConcMarkSweepGC -XX:NewSize=64m -XX:MaxNewSize=64m
> -XX:CMSInitiatingOccupancyFraction=75
>
> After looking at http://wiki.apache.org/hadoop/PerformanceTuning, we are
> trying to lower NewSize/MaxNewSize to 6m as well as reducing
> CMSInitiatingOccupancyFraction to 50.
>
> We see messages like this in our GC logs:
>
> 2010-11-23T14:56:01.383-0500: 61297.449: [GC 61297.449: [ParNew (promotion
> failed): 57425K->57880K(59008K), 0.1880950 secs]61297.637:
> [CMS2010-11-23T14:56:06.336-0500: 61302.402: [CMS-concurrent-mark:
> 8.844/17.169 secs] [Times: user=75.16 sys=1.34, real=17.17 secs]
>  (concurrent mode failure): 10126729K->5760080K(13246464K), 91.2530340 secs]
> 10181961K->5760080K(13305472K), [CMS Perm : 20252K->20241K(33868K)],
> 91.4413320 secs] [Times: user=24.47 sys=1.07, real=91.44 secs]
>
> There's a lot of questions there, but I definitely appreciate any advice or
> input anybody else has. Thanks so much!
>
> -Sean
>