You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shuai Lin <li...@gmail.com> on 2015/01/06 12:32:44 UTC

Region Server OutOfMemory Error

Hi all,

We have a hbase cluster of 5 region servers, each, each hosting 60+
regions.

But under heavy load the region servers crashes for OOME now and then:

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 16820"...

We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
log.  The last few lines of the GC log before each crash are always like
this:

2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
0.8867660 secs]
   [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
7122.7M(22.0G)->5837.2M(22.0G)]
 [Times: user=1.42 sys=0.00, real=0.89 secs]
2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
0.6378260 secs]
   [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
5837.2M(22.0G)->5836.5M(22.0G)]
 [Times: user=0.93 sys=0.00, real=0.63 secs]

>From the last lineI see the heap only occupies 5837MB, and the capacity is
22GB, so how can the OOM happen? Or is my interpretation of the gc log
wrong?

I read some articles and onlhy got some basic concept of G1GC. I've tried
tools like GCViewer, but none gives me useful explanation of the details of
the GC log.


Regards,
Shuai

Re: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Forgot to mention, we are using hbase 0.94.15.

On Tue, Jan 6, 2015 at 7:32 PM, Shuai Lin <li...@gmail.com> wrote:

> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>
>
>

Re: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Yeah, I've read bunches of articles on java GC, including the famous one
you mentioned. We don't pass any specific GC params to JVM, except the
"-XX:+UseG1GC" flag.

On Wed, Jan 7, 2015 at 2:57 AM, Stack <st...@duboce.net> wrote:

> On Tue, Jan 6, 2015 at 3:32 AM, Shuai Lin <li...@gmail.com> wrote:
>
> > Hi all,
> >
> > We have a hbase cluster of 5 region servers, each, each hosting 60+
> > regions.
> >
> > But under heavy load the region servers crashes for OOME now and then:
> >
> >
> What exact message do you get when the OOME happens? What GC params do you
> pass the JVM (None?)? You've seen this blog on hbase up on G1?
>
> https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase
> St.Ack
>
>
>
>
>
> > #
> > # java.lang.OutOfMemoryError: Java heap space
> > # -XX:OnOutOfMemoryError="kill -9 %p"
> > #   Executing /bin/sh -c "kill -9 16820"...
> >
> > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> > log.  The last few lines of the GC log before each crash are always like
> > this:
> >
> > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> > 0.8867660 secs]
> >    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> > 7122.7M(22.0G)->5837.2M(22.0G)]
> >  [Times: user=1.42 sys=0.00, real=0.89 secs]
> > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> > 0.6378260 secs]
> >    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> > 5837.2M(22.0G)->5836.5M(22.0G)]
> >  [Times: user=0.93 sys=0.00, real=0.63 secs]
> >
> > From the last lineI see the heap only occupies 5837MB, and the capacity
> is
> > 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> > wrong?
> >
> > I read some articles and onlhy got some basic concept of G1GC. I've tried
> > tools like GCViewer, but none gives me useful explanation of the details
> of
> > the GC log.
> >
> >
> > Regards,
> > Shuai
> >
>

Re: Region Server OutOfMemory Error

Posted by Stack <st...@duboce.net>.
On Tue, Jan 6, 2015 at 3:32 AM, Shuai Lin <li...@gmail.com> wrote:

> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
>
What exact message do you get when the OOME happens? What GC params do you
pass the JVM (None?)? You've seen this blog on hbase up on G1?
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase
St.Ack





> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>

RE: Region Server OutOfMemory Error

Posted by "Rendon, Carlos (KBB - Irvine)" <Ca...@kbb.com>.
We increased heap by 50%.

After re-reading your message I'm not sure it's the same issue even I ran into OOME crashes with same message as yours. 
My crashes were proceeded by very long garbage collection times and JVM GC logs had "to space exhausted" messages. That doesn't seem to match your description.

Did you check if you just configured for more memory than was actually available on your machine?

-Carlos

-----Original Message-----
From: Shuai Lin [mailto:linshuai2012@gmail.com] 
Sent: Thursday, January 08, 2015 5:28 PM
To: user@hbase.apache.org
Subject: Re: Region Server OutOfMemory Error

Hi Rendon,

Thanks for sharing! I'd like to know how much heap did you give to each RS (before and after you fix the problem)? Does increasing the heap size works well for you?

Regads,
Shuai

On Fri, Jan 9, 2015 at 1:36 AM, Rendon, Carlos (KBB - Irvine) < Carlos.Rendon@kbb.com> wrote:

> I recently ran into this exact same issue on G1GC. In my case I had 
> the luxury of giving HBase more heap space.
> If that is an option for you, you might try it out and see if it helps.
>
> -Carlos
>
> -----Original Message-----
> From: Shuai Lin [mailto:linshuai2012@gmail.com]
> Sent: Tuesday, January 06, 2015 3:33 AM
> To: user@hbase.apache.org
> Subject: Region Server OutOfMemory Error
>
> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+ 
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space # 
> -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses 
> the G1GC (-XX:+UseG1GC). To debug the problem we have turned on the 
> jvm GC log.  The last few lines of the GC log before each crash are 
> always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the 
> capacity is 22GB, so how can the OOM happen? Or is my interpretation 
> of the gc log wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've 
> tried tools like GCViewer, but none gives me useful explanation of the 
> details of the GC log.
>
>
> Regards,
> Shuai
>

Re: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Hi Rendon,

Thanks for sharing! I'd like to know how much heap did you give to each RS
(before and after you fix the problem)? Does increasing the heap size works
well for you?

Regads,
Shuai

On Fri, Jan 9, 2015 at 1:36 AM, Rendon, Carlos (KBB - Irvine) <
Carlos.Rendon@kbb.com> wrote:

> I recently ran into this exact same issue on G1GC. In my case I had the
> luxury of giving HBase more heap space.
> If that is an option for you, you might try it out and see if it helps.
>
> -Carlos
>
> -----Original Message-----
> From: Shuai Lin [mailto:linshuai2012@gmail.com]
> Sent: Tuesday, January 06, 2015 3:33 AM
> To: user@hbase.apache.org
> Subject: Region Server OutOfMemory Error
>
> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space #
> -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>

RE: Region Server OutOfMemory Error

Posted by "Rendon, Carlos (KBB - Irvine)" <Ca...@kbb.com>.
I recently ran into this exact same issue on G1GC. In my case I had the luxury of giving HBase more heap space. 
If that is an option for you, you might try it out and see if it helps.

-Carlos

-----Original Message-----
From: Shuai Lin [mailto:linshuai2012@gmail.com] 
Sent: Tuesday, January 06, 2015 3:33 AM
To: user@hbase.apache.org
Subject: Region Server OutOfMemory Error

Hi all,

We have a hbase cluster of 5 region servers, each, each hosting 60+ regions.

But under heavy load the region servers crashes for OOME now and then:

#
# java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 16820"...

We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC log.  The last few lines of the GC log before each crash are always like
this:

2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
0.8867660 secs]
   [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
7122.7M(22.0G)->5837.2M(22.0G)]
 [Times: user=1.42 sys=0.00, real=0.89 secs]
2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
0.6378260 secs]
   [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
5837.2M(22.0G)->5836.5M(22.0G)]
 [Times: user=0.93 sys=0.00, real=0.63 secs]

From the last lineI see the heap only occupies 5837MB, and the capacity is 22GB, so how can the OOM happen? Or is my interpretation of the gc log wrong?

I read some articles and onlhy got some basic concept of G1GC. I've tried tools like GCViewer, but none gives me useful explanation of the details of the GC log.


Regards,
Shuai

Re: 答复: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Yeah, I know a heap dump would work, but I'm a little worried about dumping
22GB of data on a production server, since it could take quite a while, and
make the recovery more slower.


On Wed, Jan 7, 2015 at 10:51 AM, 谢良 <xi...@xiaomi.com> wrote:

> Could you retry with " -XX:+HeapDumpOnOutOfMemoryError" ?
> the heap dump will make the thing clear
> ________________________________________
> 发件人: Shuai Lin <li...@gmail.com>
> 发送时间: 2015年1月6日 19:32
> 收件人: user@hbase.apache.org
> 主题: Region Server OutOfMemory Error
>
> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>

答复: Region Server OutOfMemory Error

Posted by 谢良 <xi...@xiaomi.com>.
Could you retry with " -XX:+HeapDumpOnOutOfMemoryError" ?
the heap dump will make the thing clear
________________________________________
发件人: Shuai Lin <li...@gmail.com>
发送时间: 2015年1月6日 19:32
收件人: user@hbase.apache.org
主题: Region Server OutOfMemory Error

Hi all,

We have a hbase cluster of 5 region servers, each, each hosting 60+
regions.

But under heavy load the region servers crashes for OOME now and then:

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 16820"...

We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
log.  The last few lines of the GC log before each crash are always like
this:

2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
0.8867660 secs]
   [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
7122.7M(22.0G)->5837.2M(22.0G)]
 [Times: user=1.42 sys=0.00, real=0.89 secs]
2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
0.6378260 secs]
   [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
5837.2M(22.0G)->5836.5M(22.0G)]
 [Times: user=0.93 sys=0.00, real=0.63 secs]

From the last lineI see the heap only occupies 5837MB, and the capacity is
22GB, so how can the OOM happen? Or is my interpretation of the gc log
wrong?

I read some articles and onlhy got some basic concept of G1GC. I've tried
tools like GCViewer, but none gives me useful explanation of the details of
the GC log.


Regards,
Shuai

Re: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Hi Nick,

This is output of the command "java -version"

$ java -version
java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)

We have been using G1GC for quite a long time.



On Wed, Jan 7, 2015 at 1:19 PM, Nick Dimiduk <nd...@gmail.com> wrote:

> I don't think there's many folks using G1 in production for HBase yet. Out
> of curiosity, what JVM and version are you using? I heard G1 got much
> better somewhere after 1.7u60, though I don't have personal experience with
> it.
>
> On Tuesday, January 6, 2015, Shuai Lin <li...@gmail.com> wrote:
>
> > Cool, how can I get a graph like that?
> >
> > On Wed, Jan 7, 2015 at 4:06 AM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com <javascript:;>
> > > wrote:
> >
> > > Hi,
> > >
> > > The first thing I'd want to know is which memory poor is getting
> filled.
> > > There are several in the JVM.
> > > Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd
> > > (this
> > > one is actually from an HBase cluster).  If you see any of the lines at
> > > 100% that's potential trouble.  If it stays at 100% it's trouble (i.e.
> > OOM
> > > about to happen).  If it's constantly close to 100% that's OOM waiting
> to
> > > happen and you should check your GC and CPU graphs and see how much
> time
> > > the JVM is spending on GC.
> > >
> > > Once you know which pool is problematic you'll be better informed and
> may
> > > be able to increase the size of just that pool.
> > >
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > > On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <linshuai2012@gmail.com
> > <javascript:;>> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We have a hbase cluster of 5 region servers, each, each hosting 60+
> > > > regions.
> > > >
> > > > But under heavy load the region servers crashes for OOME now and
> then:
> > > >
> > > > #
> > > > # java.lang.OutOfMemoryError: Java heap space
> > > > # -XX:OnOutOfMemoryError="kill -9 %p"
> > > > #   Executing /bin/sh -c "kill -9 16820"...
> > > >
> > > > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses
> > the
> > > > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm
> GC
> > > > log.  The last few lines of the GC log before each crash are always
> > like
> > > > this:
> > > >
> > > > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> > > > 0.8867660 secs]
> > > >    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B
> Heap:
> > > > 7122.7M(22.0G)->5837.2M(22.0G)]
> > > >  [Times: user=1.42 sys=0.00, real=0.89 secs]
> > > > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> > > > 0.6378260 secs]
> > > >    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> > > > 5837.2M(22.0G)->5836.5M(22.0G)]
> > > >  [Times: user=0.93 sys=0.00, real=0.63 secs]
> > > >
> > > > From the last lineI see the heap only occupies 5837MB, and the
> capacity
> > > is
> > > > 22GB, so how can the OOM happen? Or is my interpretation of the gc
> log
> > > > wrong?
> > > >
> > > > I read some articles and onlhy got some basic concept of G1GC. I've
> > tried
> > > > tools like GCViewer, but none gives me useful explanation of the
> > details
> > > of
> > > > the GC log.
> > > >
> > > >
> > > > Regards,
> > > > Shuai
> > > >
> > >
> >
>

Re: Region Server OutOfMemory Error

Posted by Nick Dimiduk <nd...@gmail.com>.
I don't think there's many folks using G1 in production for HBase yet. Out
of curiosity, what JVM and version are you using? I heard G1 got much
better somewhere after 1.7u60, though I don't have personal experience with
it.

On Tuesday, January 6, 2015, Shuai Lin <li...@gmail.com> wrote:

> Cool, how can I get a graph like that?
>
> On Wed, Jan 7, 2015 at 4:06 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com <javascript:;>
> > wrote:
>
> > Hi,
> >
> > The first thing I'd want to know is which memory poor is getting filled.
> > There are several in the JVM.
> > Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd
> > (this
> > one is actually from an HBase cluster).  If you see any of the lines at
> > 100% that's potential trouble.  If it stays at 100% it's trouble (i.e.
> OOM
> > about to happen).  If it's constantly close to 100% that's OOM waiting to
> > happen and you should check your GC and CPU graphs and see how much time
> > the JVM is spending on GC.
> >
> > Once you know which pool is problematic you'll be better informed and may
> > be able to increase the size of just that pool.
> >
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <linshuai2012@gmail.com
> <javascript:;>> wrote:
> >
> > > Hi all,
> > >
> > > We have a hbase cluster of 5 region servers, each, each hosting 60+
> > > regions.
> > >
> > > But under heavy load the region servers crashes for OOME now and then:
> > >
> > > #
> > > # java.lang.OutOfMemoryError: Java heap space
> > > # -XX:OnOutOfMemoryError="kill -9 %p"
> > > #   Executing /bin/sh -c "kill -9 16820"...
> > >
> > > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses
> the
> > > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> > > log.  The last few lines of the GC log before each crash are always
> like
> > > this:
> > >
> > > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> > > 0.8867660 secs]
> > >    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> > > 7122.7M(22.0G)->5837.2M(22.0G)]
> > >  [Times: user=1.42 sys=0.00, real=0.89 secs]
> > > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> > > 0.6378260 secs]
> > >    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> > > 5837.2M(22.0G)->5836.5M(22.0G)]
> > >  [Times: user=0.93 sys=0.00, real=0.63 secs]
> > >
> > > From the last lineI see the heap only occupies 5837MB, and the capacity
> > is
> > > 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> > > wrong?
> > >
> > > I read some articles and onlhy got some basic concept of G1GC. I've
> tried
> > > tools like GCViewer, but none gives me useful explanation of the
> details
> > of
> > > the GC log.
> > >
> > >
> > > Regards,
> > > Shuai
> > >
> >
>

Re: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Cool, will take a look. Thanks!

On Thu, Jan 8, 2015 at 3:26 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi,
>
> You can get graphs like this (+ alerts, anomaly detection, events, etc.)
> from SPM: http://sematext.com/spm
>
> HBase 0.98 metrics coming later this month.
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Tue, Jan 6, 2015 at 11:42 PM, Shuai Lin <li...@gmail.com> wrote:
>
> > Cool, how can I get a graph like that?
> >
> > On Wed, Jan 7, 2015 at 4:06 AM, Otis Gospodnetic <
> > otis.gospodnetic@gmail.com
> > > wrote:
> >
> > > Hi,
> > >
> > > The first thing I'd want to know is which memory poor is getting
> filled.
> > > There are several in the JVM.
> > > Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd
> > > (this
> > > one is actually from an HBase cluster).  If you see any of the lines at
> > > 100% that's potential trouble.  If it stays at 100% it's trouble (i.e.
> > OOM
> > > about to happen).  If it's constantly close to 100% that's OOM waiting
> to
> > > happen and you should check your GC and CPU graphs and see how much
> time
> > > the JVM is spending on GC.
> > >
> > > Once you know which pool is problematic you'll be better informed and
> may
> > > be able to increase the size of just that pool.
> > >
> > > Otis
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > > On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <li...@gmail.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > We have a hbase cluster of 5 region servers, each, each hosting 60+
> > > > regions.
> > > >
> > > > But under heavy load the region servers crashes for OOME now and
> then:
> > > >
> > > > #
> > > > # java.lang.OutOfMemoryError: Java heap space
> > > > # -XX:OnOutOfMemoryError="kill -9 %p"
> > > > #   Executing /bin/sh -c "kill -9 16820"...
> > > >
> > > > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses
> > the
> > > > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm
> GC
> > > > log.  The last few lines of the GC log before each crash are always
> > like
> > > > this:
> > > >
> > > > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> > > > 0.8867660 secs]
> > > >    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B
> Heap:
> > > > 7122.7M(22.0G)->5837.2M(22.0G)]
> > > >  [Times: user=1.42 sys=0.00, real=0.89 secs]
> > > > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> > > > 0.6378260 secs]
> > > >    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> > > > 5837.2M(22.0G)->5836.5M(22.0G)]
> > > >  [Times: user=0.93 sys=0.00, real=0.63 secs]
> > > >
> > > > From the last lineI see the heap only occupies 5837MB, and the
> capacity
> > > is
> > > > 22GB, so how can the OOM happen? Or is my interpretation of the gc
> log
> > > > wrong?
> > > >
> > > > I read some articles and onlhy got some basic concept of G1GC. I've
> > tried
> > > > tools like GCViewer, but none gives me useful explanation of the
> > details
> > > of
> > > > the GC log.
> > > >
> > > >
> > > > Regards,
> > > > Shuai
> > > >
> > >
> >
>

Re: Region Server OutOfMemory Error

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

You can get graphs like this (+ alerts, anomaly detection, events, etc.)
from SPM: http://sematext.com/spm

HBase 0.98 metrics coming later this month.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Jan 6, 2015 at 11:42 PM, Shuai Lin <li...@gmail.com> wrote:

> Cool, how can I get a graph like that?
>
> On Wed, Jan 7, 2015 at 4:06 AM, Otis Gospodnetic <
> otis.gospodnetic@gmail.com
> > wrote:
>
> > Hi,
> >
> > The first thing I'd want to know is which memory poor is getting filled.
> > There are several in the JVM.
> > Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd
> > (this
> > one is actually from an HBase cluster).  If you see any of the lines at
> > 100% that's potential trouble.  If it stays at 100% it's trouble (i.e.
> OOM
> > about to happen).  If it's constantly close to 100% that's OOM waiting to
> > happen and you should check your GC and CPU graphs and see how much time
> > the JVM is spending on GC.
> >
> > Once you know which pool is problematic you'll be better informed and may
> > be able to increase the size of just that pool.
> >
> > Otis
> > --
> > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <li...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > We have a hbase cluster of 5 region servers, each, each hosting 60+
> > > regions.
> > >
> > > But under heavy load the region servers crashes for OOME now and then:
> > >
> > > #
> > > # java.lang.OutOfMemoryError: Java heap space
> > > # -XX:OnOutOfMemoryError="kill -9 %p"
> > > #   Executing /bin/sh -c "kill -9 16820"...
> > >
> > > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses
> the
> > > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> > > log.  The last few lines of the GC log before each crash are always
> like
> > > this:
> > >
> > > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> > > 0.8867660 secs]
> > >    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> > > 7122.7M(22.0G)->5837.2M(22.0G)]
> > >  [Times: user=1.42 sys=0.00, real=0.89 secs]
> > > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> > > 0.6378260 secs]
> > >    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> > > 5837.2M(22.0G)->5836.5M(22.0G)]
> > >  [Times: user=0.93 sys=0.00, real=0.63 secs]
> > >
> > > From the last lineI see the heap only occupies 5837MB, and the capacity
> > is
> > > 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> > > wrong?
> > >
> > > I read some articles and onlhy got some basic concept of G1GC. I've
> tried
> > > tools like GCViewer, but none gives me useful explanation of the
> details
> > of
> > > the GC log.
> > >
> > >
> > > Regards,
> > > Shuai
> > >
> >
>

Re: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.
Cool, how can I get a graph like that?

On Wed, Jan 7, 2015 at 4:06 AM, Otis Gospodnetic <otis.gospodnetic@gmail.com
> wrote:

> Hi,
>
> The first thing I'd want to know is which memory poor is getting filled.
> There are several in the JVM.
> Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd
> (this
> one is actually from an HBase cluster).  If you see any of the lines at
> 100% that's potential trouble.  If it stays at 100% it's trouble (i.e. OOM
> about to happen).  If it's constantly close to 100% that's OOM waiting to
> happen and you should check your GC and CPU graphs and see how much time
> the JVM is spending on GC.
>
> Once you know which pool is problematic you'll be better informed and may
> be able to increase the size of just that pool.
>
> Otis
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <li...@gmail.com> wrote:
>
> > Hi all,
> >
> > We have a hbase cluster of 5 region servers, each, each hosting 60+
> > regions.
> >
> > But under heavy load the region servers crashes for OOME now and then:
> >
> > #
> > # java.lang.OutOfMemoryError: Java heap space
> > # -XX:OnOutOfMemoryError="kill -9 %p"
> > #   Executing /bin/sh -c "kill -9 16820"...
> >
> > We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> > G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> > log.  The last few lines of the GC log before each crash are always like
> > this:
> >
> > 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> > 0.8867660 secs]
> >    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> > 7122.7M(22.0G)->5837.2M(22.0G)]
> >  [Times: user=1.42 sys=0.00, real=0.89 secs]
> > 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> > 0.6378260 secs]
> >    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> > 5837.2M(22.0G)->5836.5M(22.0G)]
> >  [Times: user=0.93 sys=0.00, real=0.63 secs]
> >
> > From the last lineI see the heap only occupies 5837MB, and the capacity
> is
> > 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> > wrong?
> >
> > I read some articles and onlhy got some basic concept of G1GC. I've tried
> > tools like GCViewer, but none gives me useful explanation of the details
> of
> > the GC log.
> >
> >
> > Regards,
> > Shuai
> >
>

Re: Region Server OutOfMemory Error

Posted by Otis Gospodnetic <ot...@gmail.com>.
Hi,

The first thing I'd want to know is which memory poor is getting filled.
There are several in the JVM.
Here's an example: https://apps.sematext.com/spm-reports/s/kZgBWLsJRd (this
one is actually from an HBase cluster).  If you see any of the lines at
100% that's potential trouble.  If it stays at 100% it's trouble (i.e. OOM
about to happen).  If it's constantly close to 100% that's OOM waiting to
happen and you should check your GC and CPU graphs and see how much time
the JVM is spending on GC.

Once you know which pool is problematic you'll be better informed and may
be able to increase the size of just that pool.

Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Jan 6, 2015 at 6:32 AM, Shuai Lin <li...@gmail.com> wrote:

> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>