You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Jay Whittaker <ja...@stealthintelligence.co.uk> on 2012/07/04 17:38:56 UTC

hbase hbck logging

Hey,

I have been getting the following in thrift logs.

2012-07-04 15:41:05,903 WARN org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table:
org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: finalfrontier, row=finalfrontier,,99999999999999

Which made us think it's a META table error. So we ran 'bin/hbase hbck' and 'bin/hbase hbck –details' and both seem to hang after a 'HregionInfo read' line before dropping to the CLI with no error or debug info.

We presume it is a Hregion read hanging but can not find it logged anywhere. Is there a way to see where it is hanging?

It may also be worth pointing out we have tried the –fix –fixMeta and –repair tags with no change.

Thanks,

Jay

Re: hbase hbck logging

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Lack of hbck?  Do you mean missing the patch?

One point I didn't make clear in the last email was that you could use an
updated hbck from a newer jar to run against an existing hbase instance --
no need to upgrade the entire cluster.

Jon.

On Thu, Jul 5, 2012 at 12:35 PM, Jay Whittaker <
jay@stealthintelligence.co.uk> wrote:

> Other than the lack of hbck is there any thing else it prevents from
> working?
>
>
> If not it may be our best option to pull the data and make a new table
>
> On 05/07/2012 18:39, "Jonathan Hsieh" <jo...@cloudera.com> wrote:
>
> >Jay,
> >
> >What version are you on?
> >
> >You may have hit this: (do you have >50 regions?)
> >https://issues.apache.org/jira/browse/HBASE-6018
> >
> >At the moment this isn't in an apache release yet (we're working on it!),
> >but we were able to get it into cdh4.0.0.  A tarball with is here:
> >http://archive.cloudera.com/cdh4/cdh/4/hbase-0.92.1-cdh4.0.0.tar.gz
> >
> >It is compiled against hadoop 2.0 so you will need to recompile the
> >tarball
> >to generate a new jar if you are on hadoop 1.0/cdh3 hadoop.
> >
> >Jon.
> >
> >On Thu, Jul 5, 2012 at 9:17 AM, Jay Whittaker
> ><jay@stealthintelligence.co.uk
> >> wrote:
> >
> >> We do however see this further up the hbck output.
> >>
> >> 12/07/05 16:24:45 DEBUG util.HBaseFsck: HRegionInfo read: {NAME =>
> >> 'thefinalfrontier,com|1|1245884400|agencedupontdugard.com
> >> ,1339788822565.838
> >> b9c14a918a97e584dc35537c50b22.', STARTKEY =>
> >> 'com|1|1245884400|agencedupontdugard.com', ENDKEY =>
> >> 'com|1|1263427200|jbheart.com', ENCODED =>
> >> 838b9c14a918a97e584dc35537c50b22,}
> >> Exception in thread "main"
> >>java.util.concurrent.RejectedExecutionException
> >>  at
> >>
> >>java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Thr
> >>ea
> >> dPoolExecutor.java:1768)
> >>  at
> >>
> >>java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:76
> >>7)
> >>  at
> >>
> >>java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:6
> >>58
> >> )
> >>  at
> >>
> >>org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java
> >>:6
> >> 33)
> >>  at
> >>
> >>org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.
> >>ja
> >> va:354)
> >>  at
> >>org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
> >>  at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
> >>
> >>
> >>
> >>
> >
> >
> >--
> >// Jonathan Hsieh (shay)
> >// Software Engineer, Cloudera
> >// jon@cloudera.com
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: hbase hbck logging

Posted by Jay Whittaker <ja...@stealthintelligence.co.uk>.

Other than the lack of hbck is there any thing else it prevents from
working?


If not it may be our best option to pull the data and make a new table

On 05/07/2012 18:39, "Jonathan Hsieh" <jo...@cloudera.com> wrote:

>Jay,
>
>What version are you on?
>
>You may have hit this: (do you have >50 regions?)
>https://issues.apache.org/jira/browse/HBASE-6018
>
>At the moment this isn't in an apache release yet (we're working on it!),
>but we were able to get it into cdh4.0.0.  A tarball with is here:
>http://archive.cloudera.com/cdh4/cdh/4/hbase-0.92.1-cdh4.0.0.tar.gz
>
>It is compiled against hadoop 2.0 so you will need to recompile the
>tarball
>to generate a new jar if you are on hadoop 1.0/cdh3 hadoop.
>
>Jon.
>
>On Thu, Jul 5, 2012 at 9:17 AM, Jay Whittaker
><jay@stealthintelligence.co.uk
>> wrote:
>
>> We do however see this further up the hbck output.
>>
>> 12/07/05 16:24:45 DEBUG util.HBaseFsck: HRegionInfo read: {NAME =>
>> 'thefinalfrontier,com|1|1245884400|agencedupontdugard.com
>> ,1339788822565.838
>> b9c14a918a97e584dc35537c50b22.', STARTKEY =>
>> 'com|1|1245884400|agencedupontdugard.com', ENDKEY =>
>> 'com|1|1263427200|jbheart.com', ENCODED =>
>> 838b9c14a918a97e584dc35537c50b22,}
>> Exception in thread "main"
>>java.util.concurrent.RejectedExecutionException
>>  at
>> 
>>java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Thr
>>ea
>> dPoolExecutor.java:1768)
>>  at
>> 
>>java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:76
>>7)
>>  at
>> 
>>java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:6
>>58
>> )
>>  at
>> 
>>org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java
>>:6
>> 33)
>>  at
>> 
>>org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.
>>ja
>> va:354)
>>  at 
>>org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>>  at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
>>
>>
>>
>>
>
>
>-- 
>// Jonathan Hsieh (shay)
>// Software Engineer, Cloudera
>// jon@cloudera.com

Re: hbase hbck logging

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Jay,

What version are you on?

You may have hit this: (do you have >50 regions?)
https://issues.apache.org/jira/browse/HBASE-6018

At the moment this isn't in an apache release yet (we're working on it!),
but we were able to get it into cdh4.0.0.  A tarball with is here:
http://archive.cloudera.com/cdh4/cdh/4/hbase-0.92.1-cdh4.0.0.tar.gz

It is compiled against hadoop 2.0 so you will need to recompile the tarball
to generate a new jar if you are on hadoop 1.0/cdh3 hadoop.

Jon.

On Thu, Jul 5, 2012 at 9:17 AM, Jay Whittaker <jay@stealthintelligence.co.uk
> wrote:

> We do however see this further up the hbck output.
>
> 12/07/05 16:24:45 DEBUG util.HBaseFsck: HRegionInfo read: {NAME =>
> 'thefinalfrontier,com|1|1245884400|agencedupontdugard.com
> ,1339788822565.838
> b9c14a918a97e584dc35537c50b22.', STARTKEY =>
> 'com|1|1245884400|agencedupontdugard.com', ENDKEY =>
> 'com|1|1263427200|jbheart.com', ENCODED =>
> 838b9c14a918a97e584dc35537c50b22,}
> Exception in thread "main" java.util.concurrent.RejectedExecutionException
>  at
> java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Threa
> dPoolExecutor.java:1768)
>  at
> java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
>  at
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658
> )
>  at
> org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:6
> 33)
>  at
> org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.ja
> va:354)
>  at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
>  at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)
>
>
>
>


-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

Re: hbase hbck logging

Posted by Jay Whittaker <ja...@stealthintelligence.co.uk>.

-Metaonly returns a summary which I presume is what the other flags are
meant to return

Summary:
  -ROOT- is okay.
    Number of regions: 1
    Deployed on:  datanode006.si.lan,60020,1341357895747
  .META. is okay.
    Number of regions: 1
    Deployed on:  datanode013.si.lan,60020,1341357896016
0 inconsistencies detected.
Status: OK

In terms of logs the namenode shows no log of hbck nor do the nodes.

The only signification we have of an error is that there is no summary.

We do however see this further up the hbck output.

12/07/05 16:24:45 DEBUG util.HBaseFsck: HRegionInfo read: {NAME =>
'thefinalfrontier,com|1|1245884400|agencedupontdugard.com,1339788822565.838
b9c14a918a97e584dc35537c50b22.', STARTKEY =>
'com|1|1245884400|agencedupontdugard.com', ENDKEY =>
'com|1|1263427200|jbheart.com', ENCODED =>
838b9c14a918a97e584dc35537c50b22,}
Exception in thread "main" java.util.concurrent.RejectedExecutionException
 at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Threa
dPoolExecutor.java:1768)
 at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
 at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658
)
 at 
org.apache.hadoop.hbase.util.HBaseFsck.loadHdfsRegionInfos(HBaseFsck.java:6
33)
 at 
org.apache.hadoop.hbase.util.HBaseFsck.onlineConsistencyRepair(HBaseFsck.ja
va:354)
 at org.apache.hadoop.hbase.util.HBaseFsck.onlineHbck(HBaseFsck.java:382)
 at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3120)





And we were getting the xciever errors below at similar times to running
the hbck. So we increased the xcievers to 4096.



2012-07-05 15:07:34,392 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(188.94.23.23:50010,
storageID=DS-1551164377-188.94.23.23-50010-1335800300163, infoPort=50075,
ipcPort=50020):Got exception while serving blk_-3876969825338062337_132429
to /188.94.23.26:
java.io.IOException: Block blk_-3876969825338062337_132429 is not valid.
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.jav
a:1072)
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:1
035)
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset
.java:1045)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:
94)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.ja
va:189)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
        at java.lang.Thread.run(Thread.java:662)

2012-07-05 15:07:34,392 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(188.94.23.23:50010,
storageID=DS-1551164377-188.94.23.23-50010-1335800300163, infoPort=50075,
ipcPort=50020):DataXceiver
java.io.IOException: Block blk_-3876969825338062337_132429 is not valid.
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getBlockFile(FSDataset.jav
a:1072)
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getLength(FSDataset.java:1
035)
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.getVisibleLength(FSDataset
.java:1045)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:
94)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.ja
va:189)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
        at java.lang.Thread.run(Thread.java:662)





They now seem to have gone and been replaced with the below at random
intervals.

2012-07-05 17:04:02,501 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
DatanodeRegistration(188.94.23.23:50010,
storageID=DS-1551164377-188.94.23.23-50010-1335800300163, infoPort=50075,
ipcPort=50020):DataXceiver
java.net.SocketTimeoutException: 480000 millis timeout while waiting for
channel to be ready for write. ch :
java.nio.channels.SocketChannel[connected local=/188.94.23.23:50010
remote=/188.94.23.20:56619]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.jav
a:246)
        at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream
.java:159)
        at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream
.java:198)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.j
ava:350)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.ja
va:436)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.ja
va:197)
        at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
        at java.lang.Thread.run(Thread.java:662)




On 05/07/2012 08:39, "Jonathan Hsieh" <jo...@cloudera.com> wrote:

>Jay,
>
>Have you tried the -metaOnly hbck option (possibly in conjunction with
>-fixAssignments/-fix)?  It could be that meta is out of whack which
>prevents everything else from making progress.
>
>If that doesn't work please share more logs -- it will help us figure out
>where it got stuck.
>
>Thanks,
>Jon.
>
>On Wed, Jul 4, 2012 at 8:38 AM, Jay Whittaker
><jay@stealthintelligence.co.uk
>> wrote:
>
>> Hey,
>>
>> I have been getting the following in thrift logs.
>>
>> 2012-07-04 15:41:05,903 WARN
>> 
>>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementati
>>on:
>> Encountered problems when prefetch META table:
>> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in
>>.META.
>> for table: finalfrontier, row=finalfrontier,,99999999999999
>>
>> Which made us think it's a META table error. So we ran 'bin/hbase hbck'
>> and 'bin/hbase hbck details' and both seem to hang after a 'HregionInfo
>> read' line before dropping to the CLI with no error or debug info.
>>
>> We presume it is a Hregion read hanging but can not find it logged
>> anywhere. Is there a way to see where it is hanging?
>>
>> It may also be worth pointing out we have tried the fix fixMeta and
>> repair tags with no change.
>>
>> Thanks,
>>
>> Jay
>>
>
>
>
>-- 
>// Jonathan Hsieh (shay)
>// Software Engineer, Cloudera
>// jon@cloudera.com

Re: hbase hbck logging

Posted by Jonathan Hsieh <jo...@cloudera.com>.

Jay,

Have you tried the -metaOnly hbck option (possibly in conjunction with
-fixAssignments/-fix)?  It could be that meta is out of whack which
prevents everything else from making progress.

If that doesn't work please share more logs -- it will help us figure out
where it got stuck.

Thanks,
Jon.

On Wed, Jul 4, 2012 at 8:38 AM, Jay Whittaker <jay@stealthintelligence.co.uk
> wrote:

> Hey,
>
> I have been getting the following in thrift logs.
>
> 2012-07-04 15:41:05,903 WARN
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Encountered problems when prefetch META table:
> org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META.
> for table: finalfrontier, row=finalfrontier,,99999999999999
>
> Which made us think it's a META table error. So we ran 'bin/hbase hbck'
> and 'bin/hbase hbck –details' and both seem to hang after a 'HregionInfo
> read' line before dropping to the CLI with no error or debug info.
>
> We presume it is a Hregion read hanging but can not find it logged
> anywhere. Is there a way to see where it is hanging?
>
> It may also be worth pointing out we have tried the –fix –fixMeta and
> –repair tags with no change.
>
> Thanks,
>
> Jay
>



-- 
// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com