You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by John <jo...@gmail.com> on 2013/11/07 16:51:49 UTC

RegionServer crash without any errors (compaction?)

Hi,

I have a cluster with 7 regionserver. Some of them are crashing from time
to time wihtout any error message in the hbase log. If I take a look at the
log at the time I found this:

2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
Starting compaction of 2 file(s) in 1 of P_SO,<
http://xmlns.com/foaf/0.1/homepage>,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
i$
2013-11-07 15:29:10,471 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
type for hdfs://
pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
HBase 0.94.6-cdh4.4.0
.... restart

At this time 2 of the 7 RS crashed, both has this compaction message before
they crashed. I don't know exactly what compaction is, but it seems that
this compaction has to do with the crash. What can I do to avoid this
restart/crash?

best regards

Re: RegionServer crash without any errors (compaction?)

Posted by Ishan Chhabra <ic...@rocketfuel.com>.
Even if there is zookeeper timeout due to GC, there should be logging
related to that, right?
Check your ‘/var/log/messages’, it might be that the kernel killed it due
to OOM or something else.


On Thu, Nov 7, 2013 at 8:21 AM, Dhaval Shah <pr...@yahoo.co.in>wrote:

> Operation too slow is generally in the .log file while the GC logs (if you
> enabled GC logging) is in the .out file. You have a very small heap for a
> 1GB HFIle size. You are probably running your region server out of memory.
> Try increasing the heap size and see if that helps
>
> Regards,
> Dhaval
>
>
> ________________________________
>  From: John <jo...@gmail.com>
> To: user@hbase.apache.org; Dhaval Shah <pr...@yahoo.co.in>
> Sent: Thursday, 7 November 2013 11:09 AM
> Subject: Re: RegionServer crash without any errors (compaction?)
>
>
>
> there are no really other logs before. There are a "operationTooSlow"
> message before, but that log is ~50 mins bofre the other:
> http://pastebin.com/EAAubqGB
>
>
>
>
> 2013/11/7 John <jo...@gmail.com>
>
> Hi,
> >
> >thanks for your fast answer. If I take a look at the cloudera manager at
> this time the %-time of using the GC increase at this time, so I think you
> are right. The max heap size is 1GB for this node. The
> hbase.hregion.max.filesize is also 1GB.
> >
> >regards
> >
> >
> >
> >
> >2013/11/7 Dhaval Shah <pr...@yahoo.co.in>
> >
> >Did you look at your GC logs? Probably the compaction process is running
> your region server out of memory. Can you provide more details on your
> setup? Max heap size? Max Region HFile size?
> >>
> >>Regards,
> >>Dhaval
> >>
> >>
> >>________________________________
> >> From: John <jo...@gmail.com>
> >>To: user@hbase.apache.org
> >>Sent: Thursday, 7 November 2013 10:51 AM
> >>Subject: RegionServer crash without any errors (compaction?)
> >>
> >>
> >>
> >>Hi,
> >>
> >>I have a cluster with 7 regionserver. Some of them are crashing from time
> >>to time wihtout any error message in the hbase log. If I take a look at
> the
> >>log at the time I found this:
> >>
> >>2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
> >>Starting compaction of 2 file(s) in 1 of P_SO,<
> >>http://xmlns.com/foaf/0.1/homepage
> >,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
> >>i$
> >>2013-11-07 15:29:10,471 INFO
> >>org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom
> filter
> >>type for hdfs://
> >>
> pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
> >>2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
> >>HBase 0.94.6-cdh4.4.0
> >>.... restart
> >>
> >>At this time 2 of the 7 RS crashed, both has this compaction message
> before
> >>they crashed. I don't know exactly what compaction is, but it seems that
> >>this compaction has to do with the crash. What can I do to avoid this
> >>restart/crash?
> >>
> >>best regards
> >
>



-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Re: RegionServer crash without any errors (compaction?)

Posted by Dhaval Shah <pr...@yahoo.co.in>.
Operation too slow is generally in the .log file while the GC logs (if you enabled GC logging) is in the .out file. You have a very small heap for a 1GB HFIle size. You are probably running your region server out of memory. Try increasing the heap size and see if that helps
 
Regards,
Dhaval


________________________________
 From: John <jo...@gmail.com>
To: user@hbase.apache.org; Dhaval Shah <pr...@yahoo.co.in> 
Sent: Thursday, 7 November 2013 11:09 AM
Subject: Re: RegionServer crash without any errors (compaction?)
 


there are no really other logs before. There are a "operationTooSlow" message before, but that log is ~50 mins bofre the other: http://pastebin.com/EAAubqGB




2013/11/7 John <jo...@gmail.com>

Hi,
>
>thanks for your fast answer. If I take a look at the cloudera manager at this time the %-time of using the GC increase at this time, so I think you are right. The max heap size is 1GB for this node. The hbase.hregion.max.filesize is also 1GB. 
>
>regards
>
>
>
>
>2013/11/7 Dhaval Shah <pr...@yahoo.co.in>
>
>Did you look at your GC logs? Probably the compaction process is running your region server out of memory. Can you provide more details on your setup? Max heap size? Max Region HFile size?
>> 
>>Regards,
>>Dhaval
>>
>>
>>________________________________
>> From: John <jo...@gmail.com>
>>To: user@hbase.apache.org
>>Sent: Thursday, 7 November 2013 10:51 AM
>>Subject: RegionServer crash without any errors (compaction?)
>>
>>
>>
>>Hi,
>>
>>I have a cluster with 7 regionserver. Some of them are crashing from time
>>to time wihtout any error message in the hbase log. If I take a look at the
>>log at the time I found this:
>>
>>2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
>>Starting compaction of 2 file(s) in 1 of P_SO,<
>>http://xmlns.com/foaf/0.1/homepage>,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
>>i$
>>2013-11-07 15:29:10,471 INFO
>>org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
>>type for hdfs://
>>pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
>>2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
>>HBase 0.94.6-cdh4.4.0
>>.... restart
>>
>>At this time 2 of the 7 RS crashed, both has this compaction message before
>>they crashed. I don't know exactly what compaction is, but it seems that
>>this compaction has to do with the crash. What can I do to avoid this
>>restart/crash?
>>
>>best regards
>

Re: RegionServer crash without any errors (compaction?)

Posted by John <jo...@gmail.com>.
there are no really other logs before. There are a "operationTooSlow"
message before, but that log is ~50 mins bofre the other:
http://pastebin.com/EAAubqGB


2013/11/7 John <jo...@gmail.com>

> Hi,
>
> thanks for your fast answer. If I take a look at the cloudera manager at
> this time the %-time of using the GC increase at this time, so I think you
> are right. The max heap size is 1GB for this node. The
> hbase.hregion.max.filesize is also 1GB.
>
> regards
>
>
> 2013/11/7 Dhaval Shah <pr...@yahoo.co.in>
>
>> Did you look at your GC logs? Probably the compaction process is running
>> your region server out of memory. Can you provide more details on your
>> setup? Max heap size? Max Region HFile size?
>>
>> Regards,
>> Dhaval
>>
>>
>> ________________________________
>>  From: John <jo...@gmail.com>
>> To: user@hbase.apache.org
>> Sent: Thursday, 7 November 2013 10:51 AM
>> Subject: RegionServer crash without any errors (compaction?)
>>
>>
>> Hi,
>>
>> I have a cluster with 7 regionserver. Some of them are crashing from time
>> to time wihtout any error message in the hbase log. If I take a look at
>> the
>> log at the time I found this:
>>
>> 2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
>> Starting compaction of 2 file(s) in 1 of P_SO,<
>> http://xmlns.com/foaf/0.1/homepage
>> >,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
>> i$
>> 2013-11-07 15:29:10,471 INFO
>> org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
>> type for hdfs://
>>
>> pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
>> 2013-11-07<http://pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$2013-11-07>15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
>> HBase 0.94.6-cdh4.4.0
>> .... restart
>>
>> At this time 2 of the 7 RS crashed, both has this compaction message
>> before
>> they crashed. I don't know exactly what compaction is, but it seems that
>> this compaction has to do with the crash. What can I do to avoid this
>> restart/crash?
>>
>> best regards
>>
>
>

Re: RegionServer crash without any errors (compaction?)

Posted by John <jo...@gmail.com>.
Hi,

thanks for your fast answer. If I take a look at the cloudera manager at
this time the %-time of using the GC increase at this time, so I think you
are right. The max heap size is 1GB for this node. The
hbase.hregion.max.filesize is also 1GB.

regards


2013/11/7 Dhaval Shah <pr...@yahoo.co.in>

> Did you look at your GC logs? Probably the compaction process is running
> your region server out of memory. Can you provide more details on your
> setup? Max heap size? Max Region HFile size?
>
> Regards,
> Dhaval
>
>
> ________________________________
>  From: John <jo...@gmail.com>
> To: user@hbase.apache.org
> Sent: Thursday, 7 November 2013 10:51 AM
> Subject: RegionServer crash without any errors (compaction?)
>
>
> Hi,
>
> I have a cluster with 7 regionserver. Some of them are crashing from time
> to time wihtout any error message in the hbase log. If I take a look at the
> log at the time I found this:
>
> 2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
> Starting compaction of 2 file(s) in 1 of P_SO,<
> http://xmlns.com/foaf/0.1/homepage
> >,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
> i$
> 2013-11-07 15:29:10,471 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
> type for hdfs://
>
> pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
> 2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
> HBase 0.94.6-cdh4.4.0
> .... restart
>
> At this time 2 of the 7 RS crashed, both has this compaction message before
> they crashed. I don't know exactly what compaction is, but it seems that
> this compaction has to do with the crash. What can I do to avoid this
> restart/crash?
>
> best regards
>

Re: RegionServer crash without any errors (compaction?)

Posted by Dhaval Shah <pr...@yahoo.co.in>.
Did you look at your GC logs? Probably the compaction process is running your region server out of memory. Can you provide more details on your setup? Max heap size? Max Region HFile size?
 
Regards,
Dhaval


________________________________
 From: John <jo...@gmail.com>
To: user@hbase.apache.org 
Sent: Thursday, 7 November 2013 10:51 AM
Subject: RegionServer crash without any errors (compaction?)
 

Hi,

I have a cluster with 7 regionserver. Some of them are crashing from time
to time wihtout any error message in the hbase log. If I take a look at the
log at the time I found this:

2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
Starting compaction of 2 file(s) in 1 of P_SO,<
http://xmlns.com/foaf/0.1/homepage>,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
i$
2013-11-07 15:29:10,471 INFO
org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
type for hdfs://
pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
HBase 0.94.6-cdh4.4.0
.... restart

At this time 2 of the 7 RS crashed, both has this compaction message before
they crashed. I don't know exactly what compaction is, but it seems that
this compaction has to do with the crash. What can I do to avoid this
restart/crash?

best regards

Re: RegionServer crash without any errors (compaction?)

Posted by Ted Yu <yu...@gmail.com>.
Can you pastebin more of the regionserver log before the crash ?

Cheers


On Thu, Nov 7, 2013 at 7:51 AM, John <jo...@gmail.com> wrote:

> Hi,
>
> I have a cluster with 7 regionserver. Some of them are crashing from time
> to time wihtout any error message in the hbase log. If I take a look at the
> log at the time I found this:
>
> 2013-11-07 15:29:02,511 INFO org.apache.hadoop.hbase.regionserver.Store:
> Starting compaction of 2 file(s) in 1 of P_SO,<
> http://xmlns.com/foaf/0.1/homepage
> >,1383188177383.59d0259c87c07dc666a5600ba4d6c916.
> i$
> 2013-11-07 15:29:10,471 INFO
> org.apache.hadoop.hbase.regionserver.StoreFile: Delete Family Bloom filter
> type for hdfs://
>
> pc08.pool.ifis.uni-luebeck.de:8020/hbase/P_SO/59d0259c87c07dc666a5600ba4d6c916/.tmp/f$
> 2013-11-07 15:31:05,944 INFO org.apache.hadoop.hbase.util.VersionInfo:
> HBase 0.94.6-cdh4.4.0
> .... restart
>
> At this time 2 of the 7 RS crashed, both has this compaction message before
> they crashed. I don't know exactly what compaction is, but it seems that
> this compaction has to do with the crash. What can I do to avoid this
> restart/crash?
>
> best regards
>