You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by 谢良 <xi...@xiaomi.com> on 2015/01/07 03:51:01 UTC

答复: Region Server OutOfMemory Error

Could you retry with " -XX:+HeapDumpOnOutOfMemoryError" ?
the heap dump will make the thing clear
________________________________________
发件人: Shuai Lin <li...@gmail.com>
发送时间: 2015年1月6日 19:32
收件人: user@hbase.apache.org
主题: Region Server OutOfMemory Error

Hi all,

We have a hbase cluster of 5 region servers, each, each hosting 60+
regions.

But under heavy load the region servers crashes for OOME now and then:

#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="kill -9 %p"
#   Executing /bin/sh -c "kill -9 16820"...

We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
log.  The last few lines of the GC log before each crash are always like
this:

2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
0.8867660 secs]
   [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
7122.7M(22.0G)->5837.2M(22.0G)]
 [Times: user=1.42 sys=0.00, real=0.89 secs]
2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
0.6378260 secs]
   [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
5837.2M(22.0G)->5836.5M(22.0G)]
 [Times: user=0.93 sys=0.00, real=0.63 secs]

From the last lineI see the heap only occupies 5837MB, and the capacity is
22GB, so how can the OOM happen? Or is my interpretation of the gc log
wrong?

I read some articles and onlhy got some basic concept of G1GC. I've tried
tools like GCViewer, but none gives me useful explanation of the details of
the GC log.


Regards,
Shuai

Re: 答复: Region Server OutOfMemory Error

Posted by Shuai Lin <li...@gmail.com>.

Yeah, I know a heap dump would work, but I'm a little worried about dumping
22GB of data on a production server, since it could take quite a while, and
make the recovery more slower.


On Wed, Jan 7, 2015 at 10:51 AM, 谢良 <xi...@xiaomi.com> wrote:

> Could you retry with " -XX:+HeapDumpOnOutOfMemoryError" ?
> the heap dump will make the thing clear
> ________________________________________
> 发件人: Shuai Lin <li...@gmail.com>
> 发送时间: 2015年1月6日 19:32
> 收件人: user@hbase.apache.org
> 主题: Region Server OutOfMemory Error
>
> Hi all,
>
> We have a hbase cluster of 5 region servers, each, each hosting 60+
> regions.
>
> But under heavy load the region servers crashes for OOME now and then:
>
> #
> # java.lang.OutOfMemoryError: Java heap space
> # -XX:OnOutOfMemoryError="kill -9 %p"
> #   Executing /bin/sh -c "kill -9 16820"...
>
> We have max heap size set to 22GB (-Xmx22528m) for each RS, and uses the
> G1GC (-XX:+UseG1GC). To debug the problem we have turned on the jvm GC
> log.  The last few lines of the GC log before each crash are always like
> this:
>
> 2015-01-06T11:10:19.087+0000: 5035.720: [Full GC 7122M->5837M(21G),
> 0.8867660 secs]
>    [Eden: 1024.0K(7278.0M)->0.0B(8139.0M) Survivors: 68.0M->0.0B Heap:
> 7122.7M(22.0G)->5837.2M(22.0G)]
>  [Times: user=1.42 sys=0.00, real=0.89 secs]
> 2015-01-06T11:10:19.976+0000: 5036.608: [Full GC 5837M->5836M(21G),
> 0.6378260 secs]
>    [Eden: 0.0B(8139.0M)->0.0B(8139.0M) Survivors: 0.0B->0.0B Heap:
> 5837.2M(22.0G)->5836.5M(22.0G)]
>  [Times: user=0.93 sys=0.00, real=0.63 secs]
>
> From the last lineI see the heap only occupies 5837MB, and the capacity is
> 22GB, so how can the OOM happen? Or is my interpretation of the gc log
> wrong?
>
> I read some articles and onlhy got some basic concept of G1GC. I've tried
> tools like GCViewer, but none gives me useful explanation of the details of
> the GC log.
>
>
> Regards,
> Shuai
>