You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jay Talreja <ja...@oracle.com> on 2013/05/21 13:03:30 UTC

Inconsistent Table HBCK

One of our production clusters had several region server failures. As a 
result one of the tables is in an inconsistent state as reported by hbck.
We have tried using hbck repair commands but none seem to work. There is 
one region that is stuck in a forever pending open state.
The error reported in RS log is about a StoreFile not found. But what is 
really strange is that the store file that is reported as missing does 
not even belong to the region being opened.

We tried to manually create a directory in HDFS and copy the missing 
file but it causes hbck to report about a region in HDFS but not in Meta.

There 4 inconsistencies currently.

ERROR: Region { meta => 
<tableName>,I.1521_D.1361689200_9,1369099149747.2123fc70fac804cd8d48ea4494cc8184., 
hdfs => 
hdfs://host:8020/hbase/tableName/2123fc70fac804cd8d48ea4494cc8184, 
deployed =>  } not deployed on any region server.
ERROR: Region { meta => null, hdfs => 
hdfs://hostname:8020/hbase/tableName/450ed30b410e9d6d54ac53099039cb28, 
deployed =>  } on HDFS, but not listed in META or deployed on any region 
server
13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info entries
ERROR: There is a hole in the region chain between I.1521_D.1361689200_9 
and I.1521_D.1362150000_8.  You need to create a new .regioninfo and 
region dir in hdfs to plug the hole.
ERROR: There is a hole in the region chain between I.1_D.1368392400_9 
and I.2020_D.1338948000_2.  You need to create a new .regioninfo and 
region dir in hdfs to plug the hole.
ERROR: Found inconsistency in table <tableName>

We are running Hbase 0.94 (Apache) on Hadoop 1.0.3

At this stage, we are stuck and are looking for help ! The cluster is in 
an unbalanced state and region servers frequently keep dying.

Thanks,
Jay


RE: Inconsistent Table HBCK

Posted by Tianying Chang <ti...@ebaysf.com>.
Hi, Jean-Marc

We saw this error in our cluster twice, We unblocked our production cluster by manually fixing the problem. I haven't spent more time in root cause or auto fix the issue yet. I will put this in my plan to root cause and fix it later. I will update you after I dig more in it. 

BTW, in your previous email, did you meant hat 0.94.7 won't have this bug? 

Thanks
Tian-Ying 

-----Original Message-----
From: Jean-Marc Spaggiari [mailto:jean-marc@spaggiari.org] 
Sent: Tuesday, May 28, 2013 7:28 AM
To: user@hbase.apache.org
Subject: Re: Inconsistent Table HBCK

Hi Tian-Ying,

I don't think there is already a JIRA for that. The idea was to open a new one and ask for HBCK to be able to fix that. Can you do that?

Do you have an easy way to reproduce your issue? Like by manually creating files in HDFS or something like that?

JM

2013/5/22 Tianying Chang <ti...@ebaysf.com>

> Hi, Jean
>
> What is the jira #?
>
> Thanks
> Tian-Ying
> -----Original Message-----
> From: Jean-Marc Spaggiari [mailto:jean-marc@spaggiari.org]
> Sent: Wednesday, May 22, 2013 7:57 AM
> To: user@hbase.apache.org
> Subject: Re: Inconsistent Table HBCK
>
> Thanks for the feedback Jay.
>
> I helped someone who faced the same issue recently. Might deserve a fix...
> (and so a JIRA with details)
>
> Also, I would recommend you to migrate to a sooner 0.94.x version.
>
> JM
>
> 2013/5/22 Jay Talreja <ja...@oracle.com>
>
> > *0.94.0*
> >
> > The issue (I think) was related to a region split that didn't happen 
> > cleanly. As a result there were references to daughter region 
> > present in HDFS.
> > I believe the -fixSplitParents would have taken care of this but it 
> > is not available in 0.94.0
> >
> > I manually deleted reference files from HDFS and was able to bring 
> > the table back to consistent state.
> >
> > Thanks,
> > Jay
> >
> >
> > On 5/21/13 11:12 PM, Jean-Marc Spaggiari wrote:
> >
> >> Hi Jay,
> >>
> >> Which 0.94 version are you running? 0.94.0? Or 0.94.7?
> >>
> >>
> >> JM
> >>
> >> 2013/5/21 Jay Talreja<jay.talreja@oracle.com**>
> >>
> >>  One of our production clusters had several region server failures.
> >> As a
> >>> result one of the tables is in an inconsistent state as reported 
> >>> by
> hbck.
> >>> We have tried using hbck repair commands but none seem to work.
> >>> There is one region that is stuck in a forever pending open state.
> >>> The error reported in RS log is about a StoreFile not found. But 
> >>> what is really strange is that the store file that is reported as 
> >>> missing does not even belong to the region being opened.
> >>>
> >>> We tried to manually create a directory in HDFS and copy the 
> >>> missing file but it causes hbck to report about a region in HDFS 
> >>> but not in
> Meta.
> >>>
> >>> There 4 inconsistencies currently.
> >>>
> >>> ERROR: Region { meta =>  <tableName>,I.1521_D.**** 
> >>> 1361689200_9,1369099149747.
> >>> ****2123fc70fac804cd8d48ea4494cc81****84., hdfs =>
> >>>  hdfs://host:8020/hbase/**
> >>> tableName/****2123fc70fac804cd8d48ea4494cc81****84, deployed =>   } not
> >>>
> >>> deployed on any region server.
> >>> ERROR: Region { meta =>  null, hdfs =>  hdfs://hostname:8020/hbase/**
> >>> tableName/****450ed30b410e9d6d54ac53099039cb****28, deployed =>   } on
> >>> HDFS,
> >>>
> >>> but not listed in META or deployed on any region server
> >>> 13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info 
> >>> entries
> >>> ERROR: There is a hole in the region chain between
> >>> I.1521_D.1361689200_9 and I.1521_D.1362150000_8.  You need to 
> >>> create a new .regioninfo and region dir in hdfs to plug the hole.
> >>> ERROR: There is a hole in the region chain between
> >>> I.1_D.1368392400_9 and I.2020_D.1338948000_2.  You need to create 
> >>> a new .regioninfo and region dir in hdfs to plug the hole.
> >>> ERROR: Found inconsistency in table<tableName>
> >>>
> >>> We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
> >>>
> >>> At this stage, we are stuck and are looking for help ! The cluster 
> >>> is in an unbalanced state and region servers frequently keep dying.
> >>>
> >>> Thanks,
> >>> Jay
> >>>
> >>>
> >>>
> >
>

Re: Inconsistent Table HBCK

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Tian-Ying,

I don't think there is already a JIRA for that. The idea was to open a new
one and ask for HBCK to be able to fix that. Can you do that?

Do you have an easy way to reproduce your issue? Like by manually creating
files in HDFS or something like that?

JM

2013/5/22 Tianying Chang <ti...@ebaysf.com>

> Hi, Jean
>
> What is the jira #?
>
> Thanks
> Tian-Ying
> -----Original Message-----
> From: Jean-Marc Spaggiari [mailto:jean-marc@spaggiari.org]
> Sent: Wednesday, May 22, 2013 7:57 AM
> To: user@hbase.apache.org
> Subject: Re: Inconsistent Table HBCK
>
> Thanks for the feedback Jay.
>
> I helped someone who faced the same issue recently. Might deserve a fix...
> (and so a JIRA with details)
>
> Also, I would recommend you to migrate to a sooner 0.94.x version.
>
> JM
>
> 2013/5/22 Jay Talreja <ja...@oracle.com>
>
> > *0.94.0*
> >
> > The issue (I think) was related to a region split that didn't happen
> > cleanly. As a result there were references to daughter region present
> > in HDFS.
> > I believe the -fixSplitParents would have taken care of this but it is
> > not available in 0.94.0
> >
> > I manually deleted reference files from HDFS and was able to bring the
> > table back to consistent state.
> >
> > Thanks,
> > Jay
> >
> >
> > On 5/21/13 11:12 PM, Jean-Marc Spaggiari wrote:
> >
> >> Hi Jay,
> >>
> >> Which 0.94 version are you running? 0.94.0? Or 0.94.7?
> >>
> >>
> >> JM
> >>
> >> 2013/5/21 Jay Talreja<jay.talreja@oracle.com**>
> >>
> >>  One of our production clusters had several region server failures.
> >> As a
> >>> result one of the tables is in an inconsistent state as reported by
> hbck.
> >>> We have tried using hbck repair commands but none seem to work.
> >>> There is one region that is stuck in a forever pending open state.
> >>> The error reported in RS log is about a StoreFile not found. But
> >>> what is really strange is that the store file that is reported as
> >>> missing does not even belong to the region being opened.
> >>>
> >>> We tried to manually create a directory in HDFS and copy the missing
> >>> file but it causes hbck to report about a region in HDFS but not in
> Meta.
> >>>
> >>> There 4 inconsistencies currently.
> >>>
> >>> ERROR: Region { meta =>  <tableName>,I.1521_D.****
> >>> 1361689200_9,1369099149747.
> >>> ****2123fc70fac804cd8d48ea4494cc81****84., hdfs =>
> >>>  hdfs://host:8020/hbase/**
> >>> tableName/****2123fc70fac804cd8d48ea4494cc81****84, deployed =>   } not
> >>>
> >>> deployed on any region server.
> >>> ERROR: Region { meta =>  null, hdfs =>  hdfs://hostname:8020/hbase/**
> >>> tableName/****450ed30b410e9d6d54ac53099039cb****28, deployed =>   } on
> >>> HDFS,
> >>>
> >>> but not listed in META or deployed on any region server
> >>> 13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info
> >>> entries
> >>> ERROR: There is a hole in the region chain between
> >>> I.1521_D.1361689200_9 and I.1521_D.1362150000_8.  You need to create
> >>> a new .regioninfo and region dir in hdfs to plug the hole.
> >>> ERROR: There is a hole in the region chain between
> >>> I.1_D.1368392400_9 and I.2020_D.1338948000_2.  You need to create a
> >>> new .regioninfo and region dir in hdfs to plug the hole.
> >>> ERROR: Found inconsistency in table<tableName>
> >>>
> >>> We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
> >>>
> >>> At this stage, we are stuck and are looking for help ! The cluster
> >>> is in an unbalanced state and region servers frequently keep dying.
> >>>
> >>> Thanks,
> >>> Jay
> >>>
> >>>
> >>>
> >
>

RE: Inconsistent Table HBCK

Posted by Tianying Chang <ti...@ebaysf.com>.
Hi, Jean

What is the jira #? 

Thanks
Tian-Ying
-----Original Message-----
From: Jean-Marc Spaggiari [mailto:jean-marc@spaggiari.org] 
Sent: Wednesday, May 22, 2013 7:57 AM
To: user@hbase.apache.org
Subject: Re: Inconsistent Table HBCK

Thanks for the feedback Jay.

I helped someone who faced the same issue recently. Might deserve a fix...
(and so a JIRA with details)

Also, I would recommend you to migrate to a sooner 0.94.x version.

JM

2013/5/22 Jay Talreja <ja...@oracle.com>

> *0.94.0*
>
> The issue (I think) was related to a region split that didn't happen 
> cleanly. As a result there were references to daughter region present 
> in HDFS.
> I believe the -fixSplitParents would have taken care of this but it is 
> not available in 0.94.0
>
> I manually deleted reference files from HDFS and was able to bring the 
> table back to consistent state.
>
> Thanks,
> Jay
>
>
> On 5/21/13 11:12 PM, Jean-Marc Spaggiari wrote:
>
>> Hi Jay,
>>
>> Which 0.94 version are you running? 0.94.0? Or 0.94.7?
>>
>>
>> JM
>>
>> 2013/5/21 Jay Talreja<jay.talreja@oracle.com**>
>>
>>  One of our production clusters had several region server failures. 
>> As a
>>> result one of the tables is in an inconsistent state as reported by hbck.
>>> We have tried using hbck repair commands but none seem to work. 
>>> There is one region that is stuck in a forever pending open state.
>>> The error reported in RS log is about a StoreFile not found. But 
>>> what is really strange is that the store file that is reported as 
>>> missing does not even belong to the region being opened.
>>>
>>> We tried to manually create a directory in HDFS and copy the missing 
>>> file but it causes hbck to report about a region in HDFS but not in Meta.
>>>
>>> There 4 inconsistencies currently.
>>>
>>> ERROR: Region { meta =>  <tableName>,I.1521_D.**** 
>>> 1361689200_9,1369099149747.
>>> ****2123fc70fac804cd8d48ea4494cc81****84., hdfs =>
>>>  hdfs://host:8020/hbase/**
>>> tableName/****2123fc70fac804cd8d48ea4494cc81****84, deployed =>   } not
>>>
>>> deployed on any region server.
>>> ERROR: Region { meta =>  null, hdfs =>  hdfs://hostname:8020/hbase/**
>>> tableName/****450ed30b410e9d6d54ac53099039cb****28, deployed =>   } on
>>> HDFS,
>>>
>>> but not listed in META or deployed on any region server
>>> 13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info 
>>> entries
>>> ERROR: There is a hole in the region chain between 
>>> I.1521_D.1361689200_9 and I.1521_D.1362150000_8.  You need to create 
>>> a new .regioninfo and region dir in hdfs to plug the hole.
>>> ERROR: There is a hole in the region chain between 
>>> I.1_D.1368392400_9 and I.2020_D.1338948000_2.  You need to create a 
>>> new .regioninfo and region dir in hdfs to plug the hole.
>>> ERROR: Found inconsistency in table<tableName>
>>>
>>> We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
>>>
>>> At this stage, we are stuck and are looking for help ! The cluster 
>>> is in an unbalanced state and region servers frequently keep dying.
>>>
>>> Thanks,
>>> Jay
>>>
>>>
>>>
>

Re: Inconsistent Table HBCK

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Thanks for the feedback Jay.

I helped someone who faced the same issue recently. Might deserve a fix...
(and so a JIRA with details)

Also, I would recommend you to migrate to a sooner 0.94.x version.

JM

2013/5/22 Jay Talreja <ja...@oracle.com>

> *0.94.0*
>
> The issue (I think) was related to a region split that didn't happen
> cleanly. As a result there were references to daughter region present in
> HDFS.
> I believe the -fixSplitParents would have taken care of this but it is not
> available in 0.94.0
>
> I manually deleted reference files from HDFS and was able to bring the
> table back to consistent state.
>
> Thanks,
> Jay
>
>
> On 5/21/13 11:12 PM, Jean-Marc Spaggiari wrote:
>
>> Hi Jay,
>>
>> Which 0.94 version are you running? 0.94.0? Or 0.94.7?
>>
>>
>> JM
>>
>> 2013/5/21 Jay Talreja<jay.talreja@oracle.com**>
>>
>>  One of our production clusters had several region server failures. As a
>>> result one of the tables is in an inconsistent state as reported by hbck.
>>> We have tried using hbck repair commands but none seem to work. There is
>>> one region that is stuck in a forever pending open state.
>>> The error reported in RS log is about a StoreFile not found. But what is
>>> really strange is that the store file that is reported as missing does
>>> not
>>> even belong to the region being opened.
>>>
>>> We tried to manually create a directory in HDFS and copy the missing file
>>> but it causes hbck to report about a region in HDFS but not in Meta.
>>>
>>> There 4 inconsistencies currently.
>>>
>>> ERROR: Region { meta =>  <tableName>,I.1521_D.****
>>> 1361689200_9,1369099149747.
>>> ****2123fc70fac804cd8d48ea4494cc81****84., hdfs =>
>>>  hdfs://host:8020/hbase/**
>>> tableName/****2123fc70fac804cd8d48ea4494cc81****84, deployed =>   } not
>>>
>>> deployed on any region server.
>>> ERROR: Region { meta =>  null, hdfs =>  hdfs://hostname:8020/hbase/**
>>> tableName/****450ed30b410e9d6d54ac53099039cb****28, deployed =>   } on
>>> HDFS,
>>>
>>> but not listed in META or deployed on any region server
>>> 13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info
>>> entries
>>> ERROR: There is a hole in the region chain between I.1521_D.1361689200_9
>>> and I.1521_D.1362150000_8.  You need to create a new .regioninfo and
>>> region
>>> dir in hdfs to plug the hole.
>>> ERROR: There is a hole in the region chain between I.1_D.1368392400_9 and
>>> I.2020_D.1338948000_2.  You need to create a new .regioninfo and region
>>> dir
>>> in hdfs to plug the hole.
>>> ERROR: Found inconsistency in table<tableName>
>>>
>>> We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
>>>
>>> At this stage, we are stuck and are looking for help ! The cluster is in
>>> an unbalanced state and region servers frequently keep dying.
>>>
>>> Thanks,
>>> Jay
>>>
>>>
>>>
>

Re: Inconsistent Table HBCK

Posted by Jay Talreja <ja...@oracle.com>.
*0.94.0*

The issue (I think) was related to a region split that didn't happen 
cleanly. As a result there were references to daughter region present in 
HDFS.
I believe the -fixSplitParents would have taken care of this but it is 
not available in 0.94.0

I manually deleted reference files from HDFS and was able to bring the 
table back to consistent state.

Thanks,
Jay

On 5/21/13 11:12 PM, Jean-Marc Spaggiari wrote:
> Hi Jay,
>
> Which 0.94 version are you running? 0.94.0? Or 0.94.7?
>
>
> JM
>
> 2013/5/21 Jay Talreja<ja...@oracle.com>
>
>> One of our production clusters had several region server failures. As a
>> result one of the tables is in an inconsistent state as reported by hbck.
>> We have tried using hbck repair commands but none seem to work. There is
>> one region that is stuck in a forever pending open state.
>> The error reported in RS log is about a StoreFile not found. But what is
>> really strange is that the store file that is reported as missing does not
>> even belong to the region being opened.
>>
>> We tried to manually create a directory in HDFS and copy the missing file
>> but it causes hbck to report about a region in HDFS but not in Meta.
>>
>> There 4 inconsistencies currently.
>>
>> ERROR: Region { meta =>  <tableName>,I.1521_D.**1361689200_9,1369099149747.
>> **2123fc70fac804cd8d48ea4494cc81**84., hdfs =>  hdfs://host:8020/hbase/**
>> tableName/**2123fc70fac804cd8d48ea4494cc81**84, deployed =>   } not
>> deployed on any region server.
>> ERROR: Region { meta =>  null, hdfs =>  hdfs://hostname:8020/hbase/**
>> tableName/**450ed30b410e9d6d54ac53099039cb**28, deployed =>   } on HDFS,
>> but not listed in META or deployed on any region server
>> 13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info entries
>> ERROR: There is a hole in the region chain between I.1521_D.1361689200_9
>> and I.1521_D.1362150000_8.  You need to create a new .regioninfo and region
>> dir in hdfs to plug the hole.
>> ERROR: There is a hole in the region chain between I.1_D.1368392400_9 and
>> I.2020_D.1338948000_2.  You need to create a new .regioninfo and region dir
>> in hdfs to plug the hole.
>> ERROR: Found inconsistency in table<tableName>
>>
>> We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
>>
>> At this stage, we are stuck and are looking for help ! The cluster is in
>> an unbalanced state and region servers frequently keep dying.
>>
>> Thanks,
>> Jay
>>
>>


Re: Inconsistent Table HBCK

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Jay,

Which 0.94 version are you running? 0.94.0? Or 0.94.7?


JM

2013/5/21 Jay Talreja <ja...@oracle.com>

> One of our production clusters had several region server failures. As a
> result one of the tables is in an inconsistent state as reported by hbck.
> We have tried using hbck repair commands but none seem to work. There is
> one region that is stuck in a forever pending open state.
> The error reported in RS log is about a StoreFile not found. But what is
> really strange is that the store file that is reported as missing does not
> even belong to the region being opened.
>
> We tried to manually create a directory in HDFS and copy the missing file
> but it causes hbck to report about a region in HDFS but not in Meta.
>
> There 4 inconsistencies currently.
>
> ERROR: Region { meta => <tableName>,I.1521_D.**1361689200_9,1369099149747.
> **2123fc70fac804cd8d48ea4494cc81**84., hdfs => hdfs://host:8020/hbase/**
> tableName/**2123fc70fac804cd8d48ea4494cc81**84, deployed =>  } not
> deployed on any region server.
> ERROR: Region { meta => null, hdfs => hdfs://hostname:8020/hbase/**
> tableName/**450ed30b410e9d6d54ac53099039cb**28, deployed =>  } on HDFS,
> but not listed in META or deployed on any region server
> 13/05/21 10:51:11 DEBUG util.HBaseFsck: There are 1769 region info entries
> ERROR: There is a hole in the region chain between I.1521_D.1361689200_9
> and I.1521_D.1362150000_8.  You need to create a new .regioninfo and region
> dir in hdfs to plug the hole.
> ERROR: There is a hole in the region chain between I.1_D.1368392400_9 and
> I.2020_D.1338948000_2.  You need to create a new .regioninfo and region dir
> in hdfs to plug the hole.
> ERROR: Found inconsistency in table <tableName>
>
> We are running Hbase 0.94 (Apache) on Hadoop 1.0.3
>
> At this stage, we are stuck and are looking for help ! The cluster is in
> an unbalanced state and region servers frequently keep dying.
>
> Thanks,
> Jay
>
>