You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jinsong Hu <ji...@hotmail.com> on 2011/05/23 19:29:35 UTC

hbase hbck error

Hi,

today I run "hbase hbck " to check our production cluster and dev cluster, 
the production cluster comes out clean, but
in our dev cluster, I have seem more than 2K errors like this:

ERROR: Region 
HEARTBEAT_MASTERPATCH,time\x09daily\x092010-08-15\x09uobkayhian_pr
oduction\x09patch-0000694,1287356584131.02f9ec575b19864ae44e714d9245138f. 
found
in META, but not in HDFS, and deployed on m0002040.ppops.net:60020

I checked hbase GUI, and indeed , it is correct, the region is loaded by the 
region server, but the hdfs directory
is not there.

I am running cdh3u0, and I wonder how this can happen. Once it has happened, 
what can I do to recover to bring the table back to healthy state.

Jimmy. 


Re: hbase hbck error

Posted by Stack <st...@duboce.net>.
On Wed, May 25, 2011 at 10:49 AM, Jinsong Hu <ji...@hotmail.com> wrote
> if we add the root region back in, then  essentially the hbck is complaining
> every region is bad,
> which is not true.
>

I did notice and recently fix an issue where HBCK will print an ERROR
for all regions that follow a bad one so rather than just one bad
ERROR message, instead you get an ERROR the bad one and for all the
good (and bad) that follow.


> When you say I print more info, does that mean I need to modify the hbck
> code ? I might do it later
> when I can find some time.
>

Yes.  That is what I was suggesting.  The hbck is client-only
application so you could make changes and try stuff without having to
change your cluster software.


Thanks for digging in.
St.Ack

Re: hbase hbck error

Posted by Jinsong Hu <ji...@hotmail.com>.
Hi, Stack:
  You have a point. I checked the non-hbase machine's hbck's result, and it 
shows :
Summary:
2418 inconsistencies detected.
Status: INCONSISTENT
   That number seems very familiar to me, so I went to the master admin 
page, and found:
Total: 	servers: 6	 	requests=2783, regions=2417

if we add the root region back in, then  essentially the hbck is complaining 
every region is bad,
which is not true.

  On the other hand, the hbase machine hbck says
0 inconsistencies detected.
Status: OK
  that is probably too good to be true too.

I run "hadoop dfs -ls /hbase/table_name | grep region_id" and confirmed that 
in both machine,
the region's directory showed up. In both machine, I was running in hdfs 
account.

When you say I print more info, does that mean I need to modify the hbck 
code ? I might do it later
when I can find some time.

Jimmy.



--------------------------------------------------
From: "Stack" <st...@duboce.net>
Sent: Wednesday, May 25, 2011 10:03 AM
To: <us...@hbase.apache.org>
Subject: Re: hbase hbck error

> On Wed, May 25, 2011 at 9:18 AM, Jinsong Hu <ji...@hotmail.com> 
> wrote:
>> I tried several other non-hbase machines that has proper configuration, 
>> sure
>> enough, all of them complain problems.
>>
>
> This is interesting Jinsong.  For sure the configuration was pointed
> at the right filesystem.  Do you think there could have been a
> suppressed error or some such thing remotely querying the filesystem
> for the presence of region directories?  Can you add in a of
> printf'ing to see whats going on in hbck?
>
> Thanks for digging in on this.
> St.Ack
> 

Re: hbase hbck error

Posted by Stack <st...@duboce.net>.
On Wed, May 25, 2011 at 9:18 AM, Jinsong Hu <ji...@hotmail.com> wrote:
> I tried several other non-hbase machines that has proper configuration, sure
> enough, all of them complain problems.
>

This is interesting Jinsong.  For sure the configuration was pointed
at the right filesystem.  Do you think there could have been a
suppressed error or some such thing remotely querying the filesystem
for the presence of region directories?  Can you add in a of
printf'ing to see whats going on in hbck?

Thanks for digging in on this.
St.Ack

Re: hbase hbck error

Posted by Jinsong Hu <ji...@hotmail.com>.
This is a follow up of what I have found . I exported the several 
complained tables to hdfs, truncate the original table, and import it again, 
and run hbck, and found that the hbck still complain the problem saying the 
hdfs directory is not there. I go to hdfs and take a look, and the region's 
hdfs directory is there. so the hbck's complain is bogus this time.

By accident, I run the same hbck on one of the regionserver, and to my 
surprise, the hbck check comes out clean for all tables !  I then run this 
command in several other regionserver, and then all 3 hbase masters, all of 
the come out clean ,
even for the table that has problem before and I didn't export and import.

I tried several other non-hbase machines that has proper configuration, sure 
enough, all of them complain problems.

So it seems the result of hbck depends on non-hbase machine or hbase 
machine. Judging from the results they show,
none of them is correct. The correct result should be the imported tables 
are clean and non-imported tables are not.

Can anybody explain why hbck have this kind of behavior ?

Jimmy



--------------------------------------------------
From: "Jinsong Hu" <ji...@hotmail.com>
Sent: Monday, May 23, 2011 11:39 AM
To: <us...@hbase.apache.org>
Subject: Re: hbase hbck error

> I checked the master, unfortunately ,  I must have wrong setting that all 
> master log are not there.
> So I checked the regionserver which hosted this region.  I have 14 days 
> log there and I grep this 02f9ec575b19864ae44e714d9245138f,
> and I don't see any log. then I searched all regionserver's log for last 
> several days , and don't see
> any log related to this region either.
>
>
> Jimmy.
>
> --------------------------------------------------
> From: "Jean-Daniel Cryans" <jd...@apache.org>
> Sent: Monday, May 23, 2011 10:53 AM
> To: <us...@hbase.apache.org>
> Subject: Re: hbase hbck error
>
>> I don't remember seeing this sort of issue a lot, or at all... Usually
>> the region would not be on .META. so it looks like a different issue.
>>
>> Could you grep the master logs and see what's the story of that
>> region? Just look for 02f9ec575b19864ae44e714d9245138f and try to
>> figure what happened to that region, might give us a clue.
>>
>> J-D
>>
>> On Mon, May 23, 2011 at 10:29 AM, Jinsong Hu <ji...@hotmail.com> 
>> wrote:
>>> Hi,
>>>
>>> today I run "hbase hbck " to check our production cluster and dev 
>>> cluster,
>>> the production cluster comes out clean, but
>>> in our dev cluster, I have seem more than 2K errors like this:
>>>
>>> ERROR: Region
>>> HEARTBEAT_MASTERPATCH,time\x09daily\x092010-08-15\x09uobkayhian_pr
>>> oduction\x09patch-0000694,1287356584131.02f9ec575b19864ae44e714d9245138f.
>>> found
>>> in META, but not in HDFS, and deployed on m0002040.ppops.net:60020
>>>
>>> I checked hbase GUI, and indeed , it is correct, the region is loaded by 
>>> the
>>> region server, but the hdfs directory
>>> is not there.
>>>
>>> I am running cdh3u0, and I wonder how this can happen. Once it has 
>>> happened,
>>> what can I do to recover to bring the table back to healthy state.
>>>
>>> Jimmy.
>>>
>>
> 

Re: hbase hbck error

Posted by Jinsong Hu <ji...@hotmail.com>.
I checked the master, unfortunately ,  I must have wrong setting that all 
master log are not there.
So I checked the regionserver which hosted this region.  I have 14 days log 
there and I grep this 02f9ec575b19864ae44e714d9245138f,
and I don't see any log. then I searched all regionserver's log for last 
several days , and don't see
any log related to this region either.


Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <jd...@apache.org>
Sent: Monday, May 23, 2011 10:53 AM
To: <us...@hbase.apache.org>
Subject: Re: hbase hbck error

> I don't remember seeing this sort of issue a lot, or at all... Usually
> the region would not be on .META. so it looks like a different issue.
>
> Could you grep the master logs and see what's the story of that
> region? Just look for 02f9ec575b19864ae44e714d9245138f and try to
> figure what happened to that region, might give us a clue.
>
> J-D
>
> On Mon, May 23, 2011 at 10:29 AM, Jinsong Hu <ji...@hotmail.com> 
> wrote:
>> Hi,
>>
>> today I run "hbase hbck " to check our production cluster and dev 
>> cluster,
>> the production cluster comes out clean, but
>> in our dev cluster, I have seem more than 2K errors like this:
>>
>> ERROR: Region
>> HEARTBEAT_MASTERPATCH,time\x09daily\x092010-08-15\x09uobkayhian_pr
>> oduction\x09patch-0000694,1287356584131.02f9ec575b19864ae44e714d9245138f.
>> found
>> in META, but not in HDFS, and deployed on m0002040.ppops.net:60020
>>
>> I checked hbase GUI, and indeed , it is correct, the region is loaded by 
>> the
>> region server, but the hdfs directory
>> is not there.
>>
>> I am running cdh3u0, and I wonder how this can happen. Once it has 
>> happened,
>> what can I do to recover to bring the table back to healthy state.
>>
>> Jimmy.
>>
> 

Re: hbase hbck error

Posted by Jean-Daniel Cryans <jd...@apache.org>.
I don't remember seeing this sort of issue a lot, or at all... Usually
the region would not be on .META. so it looks like a different issue.

Could you grep the master logs and see what's the story of that
region? Just look for 02f9ec575b19864ae44e714d9245138f and try to
figure what happened to that region, might give us a clue.

J-D

On Mon, May 23, 2011 at 10:29 AM, Jinsong Hu <ji...@hotmail.com> wrote:
> Hi,
>
> today I run "hbase hbck " to check our production cluster and dev cluster,
> the production cluster comes out clean, but
> in our dev cluster, I have seem more than 2K errors like this:
>
> ERROR: Region
> HEARTBEAT_MASTERPATCH,time\x09daily\x092010-08-15\x09uobkayhian_pr
> oduction\x09patch-0000694,1287356584131.02f9ec575b19864ae44e714d9245138f.
> found
> in META, but not in HDFS, and deployed on m0002040.ppops.net:60020
>
> I checked hbase GUI, and indeed , it is correct, the region is loaded by the
> region server, but the hdfs directory
> is not there.
>
> I am running cdh3u0, and I wonder how this can happen. Once it has happened,
> what can I do to recover to bring the table back to healthy state.
>
> Jimmy.
>