You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mike Dillon <mi...@synctree.com> on 2015/03/18 01:04:27 UTC

Recovering from corrupt blocks in HFile

Hi all-

I've got an HFile that's reporting a corrupt block in "hadoop fsck" and was
hoping to get some advice on recovering as much data as possible.

When I examined the blk-* file on the three data nodes that have a replica
of the affected block, I saw that the replicas on two of the datanodes had
the same SHA-1 checksum and that the replica on the other datanode was a
truncated version of the replica found on the other nodes (as reported by a
difference at EOF by "cmp"). The size of the two identical blocks is
67108864, the same as most of the other blocks in the file.

Given that there were two datanodes with the same data and another with
truncated data, I made a backup of the truncated file and dropped the
full-length copy of the block in its place directly on the data mount,
hoping that this would cause HDFS to no longer report the file as corrupt.
Unfortunately, this didn't seem to have any effect.

Looking through the Hadoop source code, it looks like there is a
CorruptReplicasMap internally that tracks which nodes have "corrupt" copies
of a block. In HDFS-6663 <https://issues.apache.org/jira/browse/HDFS-6663>,
a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
reason that a block ids is considered corrupt, but that wasn't added until
Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.

I also had a look at running the "HFile" tool on the affected file (cf.
section 9.7.5.2.2 at http://hbase.apache.org/0.94/book/regions.arch.html).
When I did that, I was able to see the data up to the corrupted block as
far as I could tell, but then it started repeatedly looping back to the
first row and starting over. I believe this is related to the behavior
described in https://issues.apache.org/jira/browse/HBASE-12949

My goal is to determine whether the block in question is actually corrupt
and, if so, in what way. If it's possible to recover all of the file except
a portion of the affected block, that would be OK too. I just don't want to
be in the position of having to lose all 3 gigs of data in this particular
region, given that most of it appears to be intact. I just can't find the
right low-level tools to let me determine the diagnose the exact state and
structure of the block data I have for this file.

Any help or direction that someone could provide would be much appreciated.
For reference, I'll repeat that our client is running Hadoop 2.0.0-cdh4.6.0
and add that the HBase version is 0.94.15-cdh4.6.0.

Thanks!

-md

Re: Recovering from corrupt blocks in HFile

Posted by Stack <st...@duboce.net>.

On Tue, Mar 17, 2015 at 11:42 PM, Mike Dillon <mi...@synctree.com>
wrote:

> Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that
> short-circuit reads are not turned on, but I'll double check when I follow
> up on this.
>
> The main issue that actually led to me being asked to look into this issue
> was that the cluster had a datanode running at 100% disk usage on all its
> mounts. Since it was already in a compromised state and I didn't fully
> understand what restarting it would do, I haven't done that yet.
>
>
Understood.



> It turned out that at least part of the reason that the node got to 100%
> capacity was that major compactions had been silently failing for a couple
> weeks due to the aforementioned corrupt block. When I looked into the logs
> of the node at capacity, I was seeing "compaction failed" error messages
> for a particular region, caused by BlockMissingExceptions for a particular
> block. That's what let me to fsck that block file and start digging into
> the underlying data. The weird thing is that the at-capacity node actually
> had one of the good copies of the failed block and it was a different node
> that had the broken one.
>
>
Ok. HDFS gets a little unpredictable when full or, to put it another way,
it has not been well tested at this extreme.

Please paste the exceptions in here when you get a chance. Will help with
https://issues.apache.org/jira/browse/HBASE-12949


> And of course, the logs for when this broken HFile was created have already
> been aged out, so I'm left to chase shadows to some extent.
>

Of course.

Let us try and help out.

St.Ack

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

Thanks. I'll look into those suggestions tomorrow. I'm pretty sure that
short-circuit reads are not turned on, but I'll double check when I follow
up on this.

The main issue that actually led to me being asked to look into this issue
was that the cluster had a datanode running at 100% disk usage on all its
mounts. Since it was already in a compromised state and I didn't fully
understand what restarting it would do, I haven't done that yet.

It turned out that at least part of the reason that the node got to 100%
capacity was that major compactions had been silently failing for a couple
weeks due to the aforementioned corrupt block. When I looked into the logs
of the node at capacity, I was seeing "compaction failed" error messages
for a particular region, caused by BlockMissingExceptions for a particular
block. That's what let me to fsck that block file and start digging into
the underlying data. The weird thing is that the at-capacity node actually
had one of the good copies of the failed block and it was a different node
that had the broken one.

And of course, the logs for when this broken HFile was created have already
been aged out, so I'm left to chase shadows to some extent.

-md


On Tue, Mar 17, 2015 at 10:35 PM, Stack <st...@duboce.net> wrote:

> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>
> > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mi...@synctree.com>
> > wrote:
> >
> >> Hi all-
> >>
> >> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and
> >> was
> >> hoping to get some advice on recovering as much data as possible.
> >>
> >> When I examined the blk-* file on the three data nodes that have a
> replica
> >> of the affected block, I saw that the replicas on two of the datanodes
> had
> >> the same SHA-1 checksum and that the replica on the other datanode was a
> >> truncated version of the replica found on the other nodes (as reported
> by
> >> a
> >> difference at EOF by "cmp"). The size of the two identical blocks is
> >> 67108864, the same as most of the other blocks in the file.
> >>
> >> Given that there were two datanodes with the same data and another with
> >> truncated data, I made a backup of the truncated file and dropped the
> >> full-length copy of the block in its place directly on the data mount,
> >> hoping that this would cause HDFS to no longer report the file as
> corrupt.
> >> Unfortunately, this didn't seem to have any effect.
> >>
> >>
> > That seems like a reasonable thing to do.
> >
> > Did you restart the DN that was serving this block before you ran fsck?
> > (Fsck asks namenode what blocks are bad; it likely is still reporting off
> > old info).
> >
> >
> >
> >> Looking through the Hadoop source code, it looks like there is a
> >> CorruptReplicasMap internally that tracks which nodes have "corrupt"
> >> copies
> >> of a block. In HDFS-6663 <
> https://issues.apache.org/jira/browse/HDFS-6663
> >> >,
> >> a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
> >> reason that a block ids is considered corrupt, but that wasn't added
> until
> >> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> >>
> >>
> > Good digging.
> >
> >
> >
> >> I also had a look at running the "HFile" tool on the affected file (cf.
> >> section 9.7.5.2.2 at
> http://hbase.apache.org/0.94/book/regions.arch.html
> >> ).
> >> When I did that, I was able to see the data up to the corrupted block as
> >> far as I could tell, but then it started repeatedly looping back to the
> >> first row and starting over. I believe this is related to the behavior
> >> described in https://issues.apache.org/jira/browse/HBASE-12949
> >
> >
> >
> > So, your file is 3G and your blocks are 128M?
> >
> > The dfsclient should just pass over the bad replica and move on to the
> > good one so it would seem to indicate all replicas are bad for you.
> >
> > If you enable DFSClient DEBUG level logging it should report which blocks
> > it is reading from. For example, here I am reading the start of the index
> > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> emissions
> > only:
> >
> > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > DFSClient
> > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > org.apache.hadoop.util.PureJavaCrc32 available
> > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > org.apache.hadoop.util.PureJavaCrc32C available
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > CacheConfig:disabled
> > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > LocatedBlocks{
> >   fileLength=108633903
> >   underConstruction=false
> >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >   isLastBlockComplete=true}
> > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.27:50011
> > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.27:50011
> > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > LocatedBlocks{
> >   fileLength=108633903
> >   underConstruction=false
> >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >   isLastBlockComplete=true}
> > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.30:50011
> > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode 10.20.84.27:50011
> >
> > Do you see it reading from 'good' or 'bad' blocks?
> >
> > I added this line to hbase log4j.properties to enable DFSClient DEBUG:
> >
> > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> >
> > On HBASE-12949, what exception is coming up?  Dump it in here.
> >
> >
> >
> >> My goal is to determine whether the block in question is actually
> corrupt
> >> and, if so, in what way.
> >
> >
> > What happens if you just try to copy the file local or elsewhere in the
> > filesystem using dfs shell. Do you get a pure dfs exception unhampered by
> > hbaseyness?
> >
> >
> >
> >> If it's possible to recover all of the file except
> >> a portion of the affected block, that would be OK too.
> >
> >
> > I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> > add it so you can recover all but the bad block (we should figure how to
> > skip the bad section also).
> >
> >
> >
> >> I just don't want to
> >> be in the position of having to lose all 3 gigs of data in this
> particular
> >> region, given that most of it appears to be intact. I just can't find
> the
> >> right low-level tools to let me determine the diagnose the exact state
> and
> >> structure of the block data I have for this file.
> >>
> >>
> > Nod.
> >
> >
> >
> >> Any help or direction that someone could provide would be much
> >> appreciated.
> >> For reference, I'll repeat that our client is running Hadoop
> >> 2.0.0-cdh4.6.0
> >> and add that the HBase version is 0.94.15-cdh4.6.0.
> >>
> >>
> > See if any of the above helps. I'll try and dig up some more tools in
> > meantime.
> >
>
> I asked some folks who know better and they suggested and asked various:
>
> + Are you doing short-circuit reads?  If so, this may be frustrating
> DFSClient moving to good block.
> + In later versions of hadoop (cdh5.2.1 for example), you could do hdfs
> dfsadmin -triggerBlockReport  DN:PORT.. this is probably of no use to you
> so you might have to restart the DN to have NN notice change in blocks.
> + This might be better than what I suggested above:
> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat /interesting_file
>
> St.Ack
>

Re: Recovering from corrupt blocks in HFile

Posted by Stack <st...@duboce.net>.

On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:

> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mi...@synctree.com>
> wrote:
>
>> Hi all-
>>
>> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and
>> was
>> hoping to get some advice on recovering as much data as possible.
>>
>> When I examined the blk-* file on the three data nodes that have a replica
>> of the affected block, I saw that the replicas on two of the datanodes had
>> the same SHA-1 checksum and that the replica on the other datanode was a
>> truncated version of the replica found on the other nodes (as reported by
>> a
>> difference at EOF by "cmp"). The size of the two identical blocks is
>> 67108864, the same as most of the other blocks in the file.
>>
>> Given that there were two datanodes with the same data and another with
>> truncated data, I made a backup of the truncated file and dropped the
>> full-length copy of the block in its place directly on the data mount,
>> hoping that this would cause HDFS to no longer report the file as corrupt.
>> Unfortunately, this didn't seem to have any effect.
>>
>>
> That seems like a reasonable thing to do.
>
> Did you restart the DN that was serving this block before you ran fsck?
> (Fsck asks namenode what blocks are bad; it likely is still reporting off
> old info).
>
>
>
>> Looking through the Hadoop source code, it looks like there is a
>> CorruptReplicasMap internally that tracks which nodes have "corrupt"
>> copies
>> of a block. In HDFS-6663 <https://issues.apache.org/jira/browse/HDFS-6663
>> >,
>> a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
>> reason that a block ids is considered corrupt, but that wasn't added until
>> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
>>
>>
> Good digging.
>
>
>
>> I also had a look at running the "HFile" tool on the affected file (cf.
>> section 9.7.5.2.2 at http://hbase.apache.org/0.94/book/regions.arch.html
>> ).
>> When I did that, I was able to see the data up to the corrupted block as
>> far as I could tell, but then it started repeatedly looping back to the
>> first row and starting over. I believe this is related to the behavior
>> described in https://issues.apache.org/jira/browse/HBASE-12949
>
>
>
> So, your file is 3G and your blocks are 128M?
>
> The dfsclient should just pass over the bad replica and move on to the
> good one so it would seem to indicate all replicas are bad for you.
>
> If you enable DFSClient DEBUG level logging it should report which blocks
> it is reading from. For example, here I am reading the start of the index
> blocks with DFSClient DEBUG enabled but I grep out the DFSClient emissions
> only:
>
> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> org.apache.hadoop.hbase.io.hfile.HFile -h -f
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> DFSClient
> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> org.apache.hadoop.util.PureJavaCrc32 available
> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> org.apache.hadoop.util.PureJavaCrc32C available
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> CacheConfig:disabled
> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> LocatedBlocks{
>   fileLength=108633903
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.27:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.30:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>   isLastBlockComplete=true}
> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode 10.20.84.27:50011
> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode 10.20.84.27:50011
> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> LocatedBlocks{
>   fileLength=108633903
>   underConstruction=false
>
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.30:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.27:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>   isLastBlockComplete=true}
> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode 10.20.84.30:50011
> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode 10.20.84.27:50011
>
> Do you see it reading from 'good' or 'bad' blocks?
>
> I added this line to hbase log4j.properties to enable DFSClient DEBUG:
>
> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
>
> On HBASE-12949, what exception is coming up?  Dump it in here.
>
>
>
>> My goal is to determine whether the block in question is actually corrupt
>> and, if so, in what way.
>
>
> What happens if you just try to copy the file local or elsewhere in the
> filesystem using dfs shell. Do you get a pure dfs exception unhampered by
> hbaseyness?
>
>
>
>> If it's possible to recover all of the file except
>> a portion of the affected block, that would be OK too.
>
>
> I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> add it so you can recover all but the bad block (we should figure how to
> skip the bad section also).
>
>
>
>> I just don't want to
>> be in the position of having to lose all 3 gigs of data in this particular
>> region, given that most of it appears to be intact. I just can't find the
>> right low-level tools to let me determine the diagnose the exact state and
>> structure of the block data I have for this file.
>>
>>
> Nod.
>
>
>
>> Any help or direction that someone could provide would be much
>> appreciated.
>> For reference, I'll repeat that our client is running Hadoop
>> 2.0.0-cdh4.6.0
>> and add that the HBase version is 0.94.15-cdh4.6.0.
>>
>>
> See if any of the above helps. I'll try and dig up some more tools in
> meantime.
>

I asked some folks who know better and they suggested and asked various:

+ Are you doing short-circuit reads?  If so, this may be frustrating
DFSClient moving to good block.
+ In later versions of hadoop (cdh5.2.1 for example), you could do hdfs
dfsadmin -triggerBlockReport  DN:PORT.. this is probably of no use to you
so you might have to restart the DN to have NN notice change in blocks.
+ This might be better than what I suggested above:
HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat /interesting_file

St.Ack

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

I didn't see any problems in my preliminary testing, but I'll let you know
if the team that works with this data reports anything weird. It seemed to
just skip past the missing data from what I saw.

-md

On Fri, Mar 20, 2015 at 12:56 PM, Jerry He <je...@gmail.com> wrote:

> Hi, Mike Dillon
>
> Do you see any problems after removing the corrupted hfile?  HBase region
> store keeps an internal list of hfiles for each store.
> You can 'close' the region, then 'assign' it again to refresh the internal
> list so that you won't see no more annoying exceptions.  The command 'move'
> will do the same for a region.
> It is normally not recommended to manually change the underlining hfiles.
> But I understand you have a special case.  I did the same.
>
> Jerry
>
> On Fri, Mar 20, 2015 at 11:41 AM, Mike Dillon <mi...@synctree.com>
> wrote:
>
> > I wish it were possible to take that step back and determine the root
> cause
> > in this case, but I wasn't asked to look into the situation until a few
> > weeks after the corruption took place (as far as I can tell). At that
> > point, the logs that would have said what was happening at the time had
> > been rotated out and were not being warehoused or monitored.
> >
> > As you asked, the corrupt file did indeed have a single corrupt block out
> > of 42. I think it's reasonable to think that this happened during
> > compaction, but I can't be sure.
> >
> > I'm not sure what the state of the data was at the time of
> > compaction/corruption, but I can say that when I looked at the data,
> there
> > were two different versions of the block. One of those versions
> > was 67108864 bytes long and had two replicas, the other was a truncated
> > version of the same data. This block was in the middle of the file and
> all
> > the other blocks except the final one had a size of 67108864 as well.
> HDFS
> > considered both versions of the block to be corrupt, but at one point I
> did
> > replace the truncated data on the one node with the full-length data (to
> no
> > avail).
> >
> > -md
> >
> > On Thu, Mar 19, 2015 at 6:49 PM, Michael Segel <
> michael_segel@hotmail.com>
> > wrote:
> >
> > > Sorry,
> > >
> > > Can we take a step back? I’m a little slow this evening….
> > > (FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink
> > > too much Bourbon. I take no responsibility and blame my friends who are
> > > named Joe. ;-)
> > >
> > > What caused the block to be corrupt?
> > > Was it your typical HDFS where one block was corrupt in one file?
> > > From skimming your posts, it sounded like the corruption occurred when
> > > there was a compaction.
> > >
> > > Does that mean that during the compaction, it tried to read and
> compact a
> > > bad block and ignored the other two copies of the bad block that could
> > have
> > > been good?
> > > Was it that at the time of writing the compacted data, there was a
> > > corruption that then was passed on two the other two copies?
> > >
> > > I guess the point I’m trying to raise is that trying to solve the
> problem
> > > after the fact may end up not being the right choice but to see if you
> > can
> > > catch the bad block before trying to compact the data in the file.
> > > (Assuming you ended up trying to use a corrupted block)
> > >
> > > Does that make sense?
> > >
> > >
> > > -Mike
> > >
> > > > On Mar 19, 2015, at 2:27 PM, Mike Dillon <mi...@synctree.com>
> > > wrote:
> > > >
> > > > So, it turns out that the client has an archived data source that can
> > > > recreate the HBase data in question if needed, so the need for me to
> > > > actually recover this HFile has diminished to the point where it's
> > > probably
> > > > not worth investing my time in creating a custom tool to extract the
> > > data.
> > > >
> > > > Given that they're willing to lose the data in this region and
> recreate
> > > it
> > > > if necessary, do I simply need to delete the HFile to make HDFS happy
> > or
> > > is
> > > > there something I need to do at the HBase level to tell it that data
> > will
> > > > be going away?
> > > >
> > > > Thanks so much everyone for your help on this issue!
> > > >
> > > > -md
> > > >
> > > > On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com>
> wrote:
> > > >
> > > >> From HBase perspective, since we don't have a ready tool, the
> general
> > > idea
> > > >> will need you to have access to HBase source code and write your own
> > > tool.
> > > >> On the high level, the tool will read/scan the KVs from the hfile
> > > similar
> > > >> to what the HFile tool does, while opening a HFileWriter to dump the
> > > good
> > > >> data until you are not able to do so.
> > > >> Then you will close the HFileWriter with the necessary meta file
> info.
> > > >> There are APIs in HBase to do so, but they may not be external
> public
> > > API.
> > > >>
> > > >> Jerry
> > > >>
> > > >> On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <
> > mike.dillon@synctree.com>
> > > >> wrote:
> > > >>
> > > >>> I've had a chance to try out Stack's passed along suggestion of
> > > >>> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to
> get
> > > >> this:
> > > >>> https://gist.github.com/md5/d42e97ab7a0bd656f09a
> > > >>>
> > > >>> After knowing what to look for, I was able to find the same
> checksum
> > > >>> failures in the logs during the major compaction failures.
> > > >>>
> > > >>> I'm willing to accept that all the data after that point in the
> > corrupt
> > > >>> block is lost, so any specific advice for how to replace that block
> > > with
> > > >> a
> > > >>> partial one containing only the good data would be appreciated. I'm
> > > aware
> > > >>> that there may be other checksum failures in the subsequent blocks
> as
> > > >> well,
> > > >>> since nothing is currently able to read past the first corruption
> > > point,
> > > >>> but I'll just have to wash, rinse, and repeat to see how much good
> > data
> > > >> is
> > > >>> left is the file as a whole.
> > > >>>
> > > >>> -md
> > > >>>
> > > >>> On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com>
> > wrote:
> > > >>>
> > > >>>> For a 'fix' and 'recover' hfile tool at HBase level,  the
> relatively
> > > >> easy
> > > >>>> thing we can recover is probably the data (KVs) up to the point
> when
> > > we
> > > >>> hit
> > > >>>> the first corruption caused exception.
> > > >>>> After that, it will not be as easy.  For example, if the current
> key
> > > >>> length
> > > >>>> or value length is bad, there is no way to skip to the next KV.
> We
> > > >> will
> > > >>>> probably need to skip the whole current hblock, and go to the next
> > > >> block
> > > >>>> for KVs assuming the hblock index is still good.
> > > >>>>
> > > >>>> HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949>
> > does
> > > >> an
> > > >>>> incremental improvement to make sure we do get a corruption caused
> > > >>>> exception so that the scan/read will not go into an infinite loop.
> > > >>>>
> > > >>>> Jerry
> > > >>>>
> > > >>>> On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> > > >> mike.dillon@synctree.com>
> > > >>>> wrote:
> > > >>>>
> > > >>>>> I haven't filed one myself, but I can do so if my investigation
> > ends
> > > >> up
> > > >>>>> finding something bug-worthy as opposed to just random failures
> due
> > > >> to
> > > >>>>> out-of-disk scenarios.
> > > >>>>>
> > > >>>>> Unfortunately, I had to prioritize some other work this morning,
> > so I
> > > >>>>> haven't made it back to the bad node yet.
> > > >>>>>
> > > >>>>> I did attempt restarting the datanode to see if I could make
> hadoop
> > > >>> fsck
> > > >>>>> happy, but that didn't have any noticeable effect. I'm hoping to
> > have
> > > >>>> more
> > > >>>>> time this afternoon to investigate the other suggestions from
> this
> > > >>>> thread.
> > > >>>>>
> > > >>>>> -md
> > > >>>>>
> > > >>>>> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> > > >> apurtell@apache.org>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> 
> > > >>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net>
> wrote:
> > > >>>>>>>
> > > >>>>>>>> If it's possible to recover all of the file except
> > > >>>>>>>> a portion of the affected block, that would be OK too.
> > > >>>>>>>
> > > >>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool.
> We
> > > >>> need
> > > >>>>> to
> > > >>>>>>> add it so you can recover all but the bad block (we should
> figure
> > > >>> how
> > > >>>>> to
> > > >>>>>>> skip the bad section also).
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> I was just getting caught up on this thread and had the same
> > > >>> thought.
> > > >>>> Is
> > > >>>>>> there an issue filed for this?
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net>
> wrote:
> > > >>>>>>
> > > >>>>>>> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > > >>>> mike.dillon@synctree.com
> > > >>>>>>
> > > >>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Hi all-
> > > >>>>>>>>
> > > >>>>>>>> I've got an HFile that's reporting a corrupt block in "hadoop
> > > >>> fsck"
> > > >>>>> and
> > > >>>>>>> was
> > > >>>>>>>> hoping to get some advice on recovering as much data as
> > > >> possible.
> > > >>>>>>>>
> > > >>>>>>>> When I examined the blk-* file on the three data nodes that
> > > >> have
> > > >>> a
> > > >>>>>>> replica
> > > >>>>>>>> of the affected block, I saw that the replicas on two of the
> > > >>>>> datanodes
> > > >>>>>>> had
> > > >>>>>>>> the same SHA-1 checksum and that the replica on the other
> > > >>> datanode
> > > >>>>> was
> > > >>>>>> a
> > > >>>>>>>> truncated version of the replica found on the other nodes (as
> > > >>>>> reported
> > > >>>>>>> by a
> > > >>>>>>>> difference at EOF by "cmp"). The size of the two identical
> > > >> blocks
> > > >>>> is
> > > >>>>>>>> 67108864, the same as most of the other blocks in the file.
> > > >>>>>>>>
> > > >>>>>>>> Given that there were two datanodes with the same data and
> > > >>> another
> > > >>>>> with
> > > >>>>>>>> truncated data, I made a backup of the truncated file and
> > > >> dropped
> > > >>>> the
> > > >>>>>>>> full-length copy of the block in its place directly on the
> data
> > > >>>>> mount,
> > > >>>>>>>> hoping that this would cause HDFS to no longer report the file
> > > >> as
> > > >>>>>>> corrupt.
> > > >>>>>>>> Unfortunately, this didn't seem to have any effect.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>> That seems like a reasonable thing to do.
> > > >>>>>>>
> > > >>>>>>> Did you restart the DN that was serving this block before you
> ran
> > > >>>> fsck?
> > > >>>>>>> (Fsck asks namenode what blocks are bad; it likely is still
> > > >>> reporting
> > > >>>>> off
> > > >>>>>>> old info).
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Looking through the Hadoop source code, it looks like there is
> > > >> a
> > > >>>>>>>> CorruptReplicasMap internally that tracks which nodes have
> > > >>>> "corrupt"
> > > >>>>>>> copies
> > > >>>>>>>> of a block. In HDFS-6663 <
> > > >>>>>>> https://issues.apache.org/jira/browse/HDFS-6663
> > > >>>>>>>>> ,
> > > >>>>>>>> a "-blockId" parameter was added to "hadoop fsck" to allow
> > > >>> dumping
> > > >>>>> the
> > > >>>>>>>> reason that a block ids is considered corrupt, but that wasn't
> > > >>>> added
> > > >>>>>>> until
> > > >>>>>>>> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>> Good digging.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> I also had a look at running the "HFile" tool on the affected
> > > >>> file
> > > >>>>> (cf.
> > > >>>>>>>> section 9.7.5.2.2 at
> > > >>>>>> http://hbase.apache.org/0.94/book/regions.arch.html
> > > >>>>>>> ).
> > > >>>>>>>> When I did that, I was able to see the data up to the
> corrupted
> > > >>>> block
> > > >>>>>> as
> > > >>>>>>>> far as I could tell, but then it started repeatedly looping
> > > >> back
> > > >>> to
> > > >>>>> the
> > > >>>>>>>> first row and starting over. I believe this is related to the
> > > >>>>> behavior
> > > >>>>>>>> described in
> https://issues.apache.org/jira/browse/HBASE-12949
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> So, your file is 3G and your blocks are 128M?
> > > >>>>>>>
> > > >>>>>>> The dfsclient should just pass over the bad replica and move on
> > > >> to
> > > >>>> the
> > > >>>>>> good
> > > >>>>>>> one so it would seem to indicate all replicas are bad for you.
> > > >>>>>>>
> > > >>>>>>> If you enable DFSClient DEBUG level logging it should report
> > > >> which
> > > >>>>> blocks
> > > >>>>>>> it is reading from. For example, here I am reading the start of
> > > >> the
> > > >>>>> index
> > > >>>>>>> blocks with DFSClient DEBUG enabled but I grep out the
> DFSClient
> > > >>>>>> emissions
> > > >>>>>>> only:
> > > >>>>>>>
> > > >>>>>>> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > >>>>>>> org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > >>>>>>> DFSClient
> > > >>>>>>> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > >>>>>>> org.apache.hadoop.util.PureJavaCrc32 available
> > > >>>>>>> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > >>>>>>> org.apache.hadoop.util.PureJavaCrc32C available
> > > >>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
> > > >>>>>>> SLF4J: Found binding in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > >>>>>>> SLF4J: Found binding in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
> for
> > > >>> an
> > > >>>>>>> explanation.
> > > >>>>>>> SLF4J: Actual binding is of type
> > > >>> [org.slf4j.impl.Log4jLoggerFactory]
> > > >>>>>>> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > >>>>>>> CacheConfig:disabled
> > > >>>>>>> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > >>>>>>> LocatedBlocks{
> > > >>>>>>>  fileLength=108633903
> > > >>>>>>>  underConstruction=false
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
> > > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
> > > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > >>>>>>>  isLastBlockComplete=true}
> > > >>>>>>> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
> > > >> to
> > > >>>>>> datanode
> > > >>>>>>> 10.20.84.27:50011
> > > >>>>>>> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
> > > >> to
> > > >>>>>> datanode
> > > >>>>>>> 10.20.84.27:50011
> > > >>>>>>> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > >>>>>>> LocatedBlocks{
> > > >>>>>>>  fileLength=108633903
> > > >>>>>>>  underConstruction=false
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
> > > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > >>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
> > > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > >>>>>>>  isLastBlockComplete=true}
> > > >>>>>>> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
> > > >> to
> > > >>>>>> datanode
> > > >>>>>>> 10.20.84.30:50011
> > > >>>>>>> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
> > > >> to
> > > >>>>>> datanode
> > > >>>>>>> 10.20.84.27:50011
> > > >>>>>>>
> > > >>>>>>> Do you see it reading from 'good' or 'bad' blocks?
> > > >>>>>>>
> > > >>>>>>> I added this line to hbase log4j.properties to enable DFSClient
> > > >>>> DEBUG:
> > > >>>>>>>
> > > >>>>>>> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > >>>>>>>
> > > >>>>>>> On HBASE-12949, what exception is coming up?  Dump it in here.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> My goal is to determine whether the block in question is
> > > >> actually
> > > >>>>>> corrupt
> > > >>>>>>>> and, if so, in what way.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> What happens if you just try to copy the file local or
> elsewhere
> > > >> in
> > > >>>> the
> > > >>>>>>> filesystem using dfs shell. Do you get a pure dfs exception
> > > >>>> unhampered
> > > >>>>> by
> > > >>>>>>> hbaseyness?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> If it's possible to recover all of the file except
> > > >>>>>>>> a portion of the affected block, that would be OK too.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool.
> We
> > > >>> need
> > > >>>>> to
> > > >>>>>>> add it so you can recover all but the bad block (we should
> figure
> > > >>> how
> > > >>>>> to
> > > >>>>>>> skip the bad section also).
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> I just don't want to
> > > >>>>>>>> be in the position of having to lose all 3 gigs of data in
> this
> > > >>>>>>> particular
> > > >>>>>>>> region, given that most of it appears to be intact. I just
> > > >> can't
> > > >>>> find
> > > >>>>>> the
> > > >>>>>>>> right low-level tools to let me determine the diagnose the
> > > >> exact
> > > >>>>> state
> > > >>>>>>> and
> > > >>>>>>>> structure of the block data I have for this file.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>> Nod.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Any help or direction that someone could provide would be much
> > > >>>>>>> appreciated.
> > > >>>>>>>> For reference, I'll repeat that our client is running Hadoop
> > > >>>>>>> 2.0.0-cdh4.6.0
> > > >>>>>>>> and add that the HBase version is 0.94.15-cdh4.6.0.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>> See if any of the above helps. I'll try and dig up some more
> > > >> tools
> > > >>> in
> > > >>>>>>> meantime.
> > > >>>>>>> St.Ack
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Thanks!
> > > >>>>>>>>
> > > >>>>>>>> -md
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Best regards,
> > > >>>>>>
> > > >>>>>>   - Andy
> > > >>>>>>
> > > >>>>>> Problems worthy of attack prove their worth by hitting back. -
> > Piet
> > > >>>> Hein
> > > >>>>>> (via Tom White)
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > > The opinions expressed here are mine, while they may reflect a
> cognitive
> > > thought, that is purely accidental.
> > > Use at your own risk.
> > > Michael Segel
> > > michael_segel (AT) hotmail.com
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Michael Segel <mi...@hotmail.com>.

Ok, 
I’m still a bit slow this morning … coffee is not helping…. ;-) 

Are we talking HFile or just a single block in the HFile? 

While it may be too late for Mike Dillon, here’s the question that the HBase Devs are going to have to think… 

How and when do you check on the correctness of the hdfs blocks? 
How do you correct? 

I’m working under the impression that HBase only deals with one copy of the replicated data and the question that I have is what happens when the block in a file copy that HBase uses is the corrupted block? 

What’ happening today? 

Thx

-Mike

> On Mar 20, 2015, at 2:56 PM, Jerry He <je...@gmail.com> wrote:
> 
> Hi, Mike Dillon
> 
> Do you see any problems after removing the corrupted hfile?  HBase region
> store keeps an internal list of hfiles for each store.
> You can 'close' the region, then 'assign' it again to refresh the internal
> list so that you won't see no more annoying exceptions.  The command 'move'
> will do the same for a region.
> It is normally not recommended to manually change the underlining hfiles.
> But I understand you have a special case.  I did the same.
> 
> Jerry
> 
> On Fri, Mar 20, 2015 at 11:41 AM, Mike Dillon <mi...@synctree.com>
> wrote:
> 
>> I wish it were possible to take that step back and determine the root cause
>> in this case, but I wasn't asked to look into the situation until a few
>> weeks after the corruption took place (as far as I can tell). At that
>> point, the logs that would have said what was happening at the time had
>> been rotated out and were not being warehoused or monitored.
>> 
>> As you asked, the corrupt file did indeed have a single corrupt block out
>> of 42. I think it's reasonable to think that this happened during
>> compaction, but I can't be sure.
>> 
>> I'm not sure what the state of the data was at the time of
>> compaction/corruption, but I can say that when I looked at the data, there
>> were two different versions of the block. One of those versions
>> was 67108864 bytes long and had two replicas, the other was a truncated
>> version of the same data. This block was in the middle of the file and all
>> the other blocks except the final one had a size of 67108864 as well. HDFS
>> considered both versions of the block to be corrupt, but at one point I did
>> replace the truncated data on the one node with the full-length data (to no
>> avail).
>> 
>> -md
>> 
>> On Thu, Mar 19, 2015 at 6:49 PM, Michael Segel <mi...@hotmail.com>
>> wrote:
>> 
>>> Sorry,
>>> 
>>> Can we take a step back? I’m a little slow this evening….
>>> (FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink
>>> too much Bourbon. I take no responsibility and blame my friends who are
>>> named Joe. ;-)
>>> 
>>> What caused the block to be corrupt?
>>> Was it your typical HDFS where one block was corrupt in one file?
>>> From skimming your posts, it sounded like the corruption occurred when
>>> there was a compaction.
>>> 
>>> Does that mean that during the compaction, it tried to read and compact a
>>> bad block and ignored the other two copies of the bad block that could
>> have
>>> been good?
>>> Was it that at the time of writing the compacted data, there was a
>>> corruption that then was passed on two the other two copies?
>>> 
>>> I guess the point I’m trying to raise is that trying to solve the problem
>>> after the fact may end up not being the right choice but to see if you
>> can
>>> catch the bad block before trying to compact the data in the file.
>>> (Assuming you ended up trying to use a corrupted block)
>>> 
>>> Does that make sense?
>>> 
>>> 
>>> -Mike
>>> 
>>>> On Mar 19, 2015, at 2:27 PM, Mike Dillon <mi...@synctree.com>
>>> wrote:
>>>> 
>>>> So, it turns out that the client has an archived data source that can
>>>> recreate the HBase data in question if needed, so the need for me to
>>>> actually recover this HFile has diminished to the point where it's
>>> probably
>>>> not worth investing my time in creating a custom tool to extract the
>>> data.
>>>> 
>>>> Given that they're willing to lose the data in this region and recreate
>>> it
>>>> if necessary, do I simply need to delete the HFile to make HDFS happy
>> or
>>> is
>>>> there something I need to do at the HBase level to tell it that data
>> will
>>>> be going away?
>>>> 
>>>> Thanks so much everyone for your help on this issue!
>>>> 
>>>> -md
>>>> 
>>>> On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:
>>>> 
>>>>> From HBase perspective, since we don't have a ready tool, the general
>>> idea
>>>>> will need you to have access to HBase source code and write your own
>>> tool.
>>>>> On the high level, the tool will read/scan the KVs from the hfile
>>> similar
>>>>> to what the HFile tool does, while opening a HFileWriter to dump the
>>> good
>>>>> data until you are not able to do so.
>>>>> Then you will close the HFileWriter with the necessary meta file info.
>>>>> There are APIs in HBase to do so, but they may not be external public
>>> API.
>>>>> 
>>>>> Jerry
>>>>> 
>>>>> On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <
>> mike.dillon@synctree.com>
>>>>> wrote:
>>>>> 
>>>>>> I've had a chance to try out Stack's passed along suggestion of
>>>>>> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
>>>>> this:
>>>>>> https://gist.github.com/md5/d42e97ab7a0bd656f09a
>>>>>> 
>>>>>> After knowing what to look for, I was able to find the same checksum
>>>>>> failures in the logs during the major compaction failures.
>>>>>> 
>>>>>> I'm willing to accept that all the data after that point in the
>> corrupt
>>>>>> block is lost, so any specific advice for how to replace that block
>>> with
>>>>> a
>>>>>> partial one containing only the good data would be appreciated. I'm
>>> aware
>>>>>> that there may be other checksum failures in the subsequent blocks as
>>>>> well,
>>>>>> since nothing is currently able to read past the first corruption
>>> point,
>>>>>> but I'll just have to wash, rinse, and repeat to see how much good
>> data
>>>>> is
>>>>>> left is the file as a whole.
>>>>>> 
>>>>>> -md
>>>>>> 
>>>>>> On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
>>>>> easy
>>>>>>> thing we can recover is probably the data (KVs) up to the point when
>>> we
>>>>>> hit
>>>>>>> the first corruption caused exception.
>>>>>>> After that, it will not be as easy.  For example, if the current key
>>>>>> length
>>>>>>> or value length is bad, there is no way to skip to the next KV.  We
>>>>> will
>>>>>>> probably need to skip the whole current hblock, and go to the next
>>>>> block
>>>>>>> for KVs assuming the hblock index is still good.
>>>>>>> 
>>>>>>> HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949>
>> does
>>>>> an
>>>>>>> incremental improvement to make sure we do get a corruption caused
>>>>>>> exception so that the scan/read will not go into an infinite loop.
>>>>>>> 
>>>>>>> Jerry
>>>>>>> 
>>>>>>> On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
>>>>> mike.dillon@synctree.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> I haven't filed one myself, but I can do so if my investigation
>> ends
>>>>> up
>>>>>>>> finding something bug-worthy as opposed to just random failures due
>>>>> to
>>>>>>>> out-of-disk scenarios.
>>>>>>>> 
>>>>>>>> Unfortunately, I had to prioritize some other work this morning,
>> so I
>>>>>>>> haven't made it back to the bad node yet.
>>>>>>>> 
>>>>>>>> I did attempt restarting the datanode to see if I could make hadoop
>>>>>> fsck
>>>>>>>> happy, but that didn't have any noticeable effect. I'm hoping to
>> have
>>>>>>> more
>>>>>>>> time this afternoon to investigate the other suggestions from this
>>>>>>> thread.
>>>>>>>> 
>>>>>>>> -md
>>>>>>>> 
>>>>>>>> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
>>>>> apurtell@apache.org>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>>> 
>>>>>>>>>>> If it's possible to recover all of the file except
>>>>>>>>>>> a portion of the affected block, that would be OK too.
>>>>>>>>>> 
>>>>>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
>>>>>> need
>>>>>>>> to
>>>>>>>>>> add it so you can recover all but the bad block (we should figure
>>>>>> how
>>>>>>>> to
>>>>>>>>>> skip the bad section also).
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> I was just getting caught up on this thread and had the same
>>>>>> thought.
>>>>>>> Is
>>>>>>>>> there an issue filed for this?
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>>>>>>>>> 
>>>>>>>>>> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
>>>>>>> mike.dillon@synctree.com
>>>>>>>>> 
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hi all-
>>>>>>>>>>> 
>>>>>>>>>>> I've got an HFile that's reporting a corrupt block in "hadoop
>>>>>> fsck"
>>>>>>>> and
>>>>>>>>>> was
>>>>>>>>>>> hoping to get some advice on recovering as much data as
>>>>> possible.
>>>>>>>>>>> 
>>>>>>>>>>> When I examined the blk-* file on the three data nodes that
>>>>> have
>>>>>> a
>>>>>>>>>> replica
>>>>>>>>>>> of the affected block, I saw that the replicas on two of the
>>>>>>>> datanodes
>>>>>>>>>> had
>>>>>>>>>>> the same SHA-1 checksum and that the replica on the other
>>>>>> datanode
>>>>>>>> was
>>>>>>>>> a
>>>>>>>>>>> truncated version of the replica found on the other nodes (as
>>>>>>>> reported
>>>>>>>>>> by a
>>>>>>>>>>> difference at EOF by "cmp"). The size of the two identical
>>>>> blocks
>>>>>>> is
>>>>>>>>>>> 67108864, the same as most of the other blocks in the file.
>>>>>>>>>>> 
>>>>>>>>>>> Given that there were two datanodes with the same data and
>>>>>> another
>>>>>>>> with
>>>>>>>>>>> truncated data, I made a backup of the truncated file and
>>>>> dropped
>>>>>>> the
>>>>>>>>>>> full-length copy of the block in its place directly on the data
>>>>>>>> mount,
>>>>>>>>>>> hoping that this would cause HDFS to no longer report the file
>>>>> as
>>>>>>>>>> corrupt.
>>>>>>>>>>> Unfortunately, this didn't seem to have any effect.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> That seems like a reasonable thing to do.
>>>>>>>>>> 
>>>>>>>>>> Did you restart the DN that was serving this block before you ran
>>>>>>> fsck?
>>>>>>>>>> (Fsck asks namenode what blocks are bad; it likely is still
>>>>>> reporting
>>>>>>>> off
>>>>>>>>>> old info).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Looking through the Hadoop source code, it looks like there is
>>>>> a
>>>>>>>>>>> CorruptReplicasMap internally that tracks which nodes have
>>>>>>> "corrupt"
>>>>>>>>>> copies
>>>>>>>>>>> of a block. In HDFS-6663 <
>>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-6663
>>>>>>>>>>>> ,
>>>>>>>>>>> a "-blockId" parameter was added to "hadoop fsck" to allow
>>>>>> dumping
>>>>>>>> the
>>>>>>>>>>> reason that a block ids is considered corrupt, but that wasn't
>>>>>>> added
>>>>>>>>>> until
>>>>>>>>>>> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> Good digging.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> I also had a look at running the "HFile" tool on the affected
>>>>>> file
>>>>>>>> (cf.
>>>>>>>>>>> section 9.7.5.2.2 at
>>>>>>>>> http://hbase.apache.org/0.94/book/regions.arch.html
>>>>>>>>>> ).
>>>>>>>>>>> When I did that, I was able to see the data up to the corrupted
>>>>>>> block
>>>>>>>>> as
>>>>>>>>>>> far as I could tell, but then it started repeatedly looping
>>>>> back
>>>>>> to
>>>>>>>> the
>>>>>>>>>>> first row and starting over. I believe this is related to the
>>>>>>>> behavior
>>>>>>>>>>> described in https://issues.apache.org/jira/browse/HBASE-12949
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> So, your file is 3G and your blocks are 128M?
>>>>>>>>>> 
>>>>>>>>>> The dfsclient should just pass over the bad replica and move on
>>>>> to
>>>>>>> the
>>>>>>>>> good
>>>>>>>>>> one so it would seem to indicate all replicas are bad for you.
>>>>>>>>>> 
>>>>>>>>>> If you enable DFSClient DEBUG level logging it should report
>>>>> which
>>>>>>>> blocks
>>>>>>>>>> it is reading from. For example, here I am reading the start of
>>>>> the
>>>>>>>> index
>>>>>>>>>> blocks with DFSClient DEBUG enabled but I grep out the DFSClient
>>>>>>>>> emissions
>>>>>>>>>> only:
>>>>>>>>>> 
>>>>>>>>>> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
>>>>>>>>>> org.apache.hadoop.hbase.io.hfile.HFile -h -f
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
>>>>>>>>>> DFSClient
>>>>>>>>>> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
>>>>>>>>>> org.apache.hadoop.util.PureJavaCrc32 available
>>>>>>>>>> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
>>>>>>>>>> org.apache.hadoop.util.PureJavaCrc32C available
>>>>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>>>>> SLF4J: Found binding in
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>>>> SLF4J: Found binding in
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
>>>>>> an
>>>>>>>>>> explanation.
>>>>>>>>>> SLF4J: Actual binding is of type
>>>>>> [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>>>>> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
>>>>>>>>>> CacheConfig:disabled
>>>>>>>>>> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
>>>>>>>>>> LocatedBlocks{
>>>>>>>>>> fileLength=108633903
>>>>>>>>>> underConstruction=false
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>>>>>>>>>> isLastBlockComplete=true}
>>>>>>>>>> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
>>>>> to
>>>>>>>>> datanode
>>>>>>>>>> 10.20.84.27:50011
>>>>>>>>>> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
>>>>> to
>>>>>>>>> datanode
>>>>>>>>>> 10.20.84.27:50011
>>>>>>>>>> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
>>>>>>>>>> LocatedBlocks{
>>>>>>>>>> fileLength=108633903
>>>>>>>>>> underConstruction=false
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>>>>>>>>>> isLastBlockComplete=true}
>>>>>>>>>> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
>>>>> to
>>>>>>>>> datanode
>>>>>>>>>> 10.20.84.30:50011
>>>>>>>>>> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
>>>>> to
>>>>>>>>> datanode
>>>>>>>>>> 10.20.84.27:50011
>>>>>>>>>> 
>>>>>>>>>> Do you see it reading from 'good' or 'bad' blocks?
>>>>>>>>>> 
>>>>>>>>>> I added this line to hbase log4j.properties to enable DFSClient
>>>>>>> DEBUG:
>>>>>>>>>> 
>>>>>>>>>> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
>>>>>>>>>> 
>>>>>>>>>> On HBASE-12949, what exception is coming up?  Dump it in here.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> My goal is to determine whether the block in question is
>>>>> actually
>>>>>>>>> corrupt
>>>>>>>>>>> and, if so, in what way.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> What happens if you just try to copy the file local or elsewhere
>>>>> in
>>>>>>> the
>>>>>>>>>> filesystem using dfs shell. Do you get a pure dfs exception
>>>>>>> unhampered
>>>>>>>> by
>>>>>>>>>> hbaseyness?
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> If it's possible to recover all of the file except
>>>>>>>>>>> a portion of the affected block, that would be OK too.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
>>>>>> need
>>>>>>>> to
>>>>>>>>>> add it so you can recover all but the bad block (we should figure
>>>>>> how
>>>>>>>> to
>>>>>>>>>> skip the bad section also).
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> I just don't want to
>>>>>>>>>>> be in the position of having to lose all 3 gigs of data in this
>>>>>>>>>> particular
>>>>>>>>>>> region, given that most of it appears to be intact. I just
>>>>> can't
>>>>>>> find
>>>>>>>>> the
>>>>>>>>>>> right low-level tools to let me determine the diagnose the
>>>>> exact
>>>>>>>> state
>>>>>>>>>> and
>>>>>>>>>>> structure of the block data I have for this file.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> Nod.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Any help or direction that someone could provide would be much
>>>>>>>>>> appreciated.
>>>>>>>>>>> For reference, I'll repeat that our client is running Hadoop
>>>>>>>>>> 2.0.0-cdh4.6.0
>>>>>>>>>>> and add that the HBase version is 0.94.15-cdh4.6.0.
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> See if any of the above helps. I'll try and dig up some more
>>>>> tools
>>>>>> in
>>>>>>>>>> meantime.
>>>>>>>>>> St.Ack
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> Thanks!
>>>>>>>>>>> 
>>>>>>>>>>> -md
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Best regards,
>>>>>>>>> 
>>>>>>>>>  - Andy
>>>>>>>>> 
>>>>>>>>> Problems worthy of attack prove their worth by hitting back. -
>> Piet
>>>>>>> Hein
>>>>>>>>> (via Tom White)
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>> 
>>> The opinions expressed here are mine, while they may reflect a cognitive
>>> thought, that is purely accidental.
>>> Use at your own risk.
>>> Michael Segel
>>> michael_segel (AT) hotmail.com
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Recovering from corrupt blocks in HFile

Posted by Jerry He <je...@gmail.com>.

Hi, Mike Dillon

Do you see any problems after removing the corrupted hfile?  HBase region
store keeps an internal list of hfiles for each store.
You can 'close' the region, then 'assign' it again to refresh the internal
list so that you won't see no more annoying exceptions.  The command 'move'
will do the same for a region.
It is normally not recommended to manually change the underlining hfiles.
But I understand you have a special case.  I did the same.

Jerry

On Fri, Mar 20, 2015 at 11:41 AM, Mike Dillon <mi...@synctree.com>
wrote:

> I wish it were possible to take that step back and determine the root cause
> in this case, but I wasn't asked to look into the situation until a few
> weeks after the corruption took place (as far as I can tell). At that
> point, the logs that would have said what was happening at the time had
> been rotated out and were not being warehoused or monitored.
>
> As you asked, the corrupt file did indeed have a single corrupt block out
> of 42. I think it's reasonable to think that this happened during
> compaction, but I can't be sure.
>
> I'm not sure what the state of the data was at the time of
> compaction/corruption, but I can say that when I looked at the data, there
> were two different versions of the block. One of those versions
> was 67108864 bytes long and had two replicas, the other was a truncated
> version of the same data. This block was in the middle of the file and all
> the other blocks except the final one had a size of 67108864 as well. HDFS
> considered both versions of the block to be corrupt, but at one point I did
> replace the truncated data on the one node with the full-length data (to no
> avail).
>
> -md
>
> On Thu, Mar 19, 2015 at 6:49 PM, Michael Segel <mi...@hotmail.com>
> wrote:
>
> > Sorry,
> >
> > Can we take a step back? I’m a little slow this evening….
> > (FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink
> > too much Bourbon. I take no responsibility and blame my friends who are
> > named Joe. ;-)
> >
> > What caused the block to be corrupt?
> > Was it your typical HDFS where one block was corrupt in one file?
> > From skimming your posts, it sounded like the corruption occurred when
> > there was a compaction.
> >
> > Does that mean that during the compaction, it tried to read and compact a
> > bad block and ignored the other two copies of the bad block that could
> have
> > been good?
> > Was it that at the time of writing the compacted data, there was a
> > corruption that then was passed on two the other two copies?
> >
> > I guess the point I’m trying to raise is that trying to solve the problem
> > after the fact may end up not being the right choice but to see if you
> can
> > catch the bad block before trying to compact the data in the file.
> > (Assuming you ended up trying to use a corrupted block)
> >
> > Does that make sense?
> >
> >
> > -Mike
> >
> > > On Mar 19, 2015, at 2:27 PM, Mike Dillon <mi...@synctree.com>
> > wrote:
> > >
> > > So, it turns out that the client has an archived data source that can
> > > recreate the HBase data in question if needed, so the need for me to
> > > actually recover this HFile has diminished to the point where it's
> > probably
> > > not worth investing my time in creating a custom tool to extract the
> > data.
> > >
> > > Given that they're willing to lose the data in this region and recreate
> > it
> > > if necessary, do I simply need to delete the HFile to make HDFS happy
> or
> > is
> > > there something I need to do at the HBase level to tell it that data
> will
> > > be going away?
> > >
> > > Thanks so much everyone for your help on this issue!
> > >
> > > -md
> > >
> > > On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:
> > >
> > >> From HBase perspective, since we don't have a ready tool, the general
> > idea
> > >> will need you to have access to HBase source code and write your own
> > tool.
> > >> On the high level, the tool will read/scan the KVs from the hfile
> > similar
> > >> to what the HFile tool does, while opening a HFileWriter to dump the
> > good
> > >> data until you are not able to do so.
> > >> Then you will close the HFileWriter with the necessary meta file info.
> > >> There are APIs in HBase to do so, but they may not be external public
> > API.
> > >>
> > >> Jerry
> > >>
> > >> On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <
> mike.dillon@synctree.com>
> > >> wrote:
> > >>
> > >>> I've had a chance to try out Stack's passed along suggestion of
> > >>> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
> > >> this:
> > >>> https://gist.github.com/md5/d42e97ab7a0bd656f09a
> > >>>
> > >>> After knowing what to look for, I was able to find the same checksum
> > >>> failures in the logs during the major compaction failures.
> > >>>
> > >>> I'm willing to accept that all the data after that point in the
> corrupt
> > >>> block is lost, so any specific advice for how to replace that block
> > with
> > >> a
> > >>> partial one containing only the good data would be appreciated. I'm
> > aware
> > >>> that there may be other checksum failures in the subsequent blocks as
> > >> well,
> > >>> since nothing is currently able to read past the first corruption
> > point,
> > >>> but I'll just have to wash, rinse, and repeat to see how much good
> data
> > >> is
> > >>> left is the file as a whole.
> > >>>
> > >>> -md
> > >>>
> > >>> On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com>
> wrote:
> > >>>
> > >>>> For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
> > >> easy
> > >>>> thing we can recover is probably the data (KVs) up to the point when
> > we
> > >>> hit
> > >>>> the first corruption caused exception.
> > >>>> After that, it will not be as easy.  For example, if the current key
> > >>> length
> > >>>> or value length is bad, there is no way to skip to the next KV.  We
> > >> will
> > >>>> probably need to skip the whole current hblock, and go to the next
> > >> block
> > >>>> for KVs assuming the hblock index is still good.
> > >>>>
> > >>>> HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949>
> does
> > >> an
> > >>>> incremental improvement to make sure we do get a corruption caused
> > >>>> exception so that the scan/read will not go into an infinite loop.
> > >>>>
> > >>>> Jerry
> > >>>>
> > >>>> On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> > >> mike.dillon@synctree.com>
> > >>>> wrote:
> > >>>>
> > >>>>> I haven't filed one myself, but I can do so if my investigation
> ends
> > >> up
> > >>>>> finding something bug-worthy as opposed to just random failures due
> > >> to
> > >>>>> out-of-disk scenarios.
> > >>>>>
> > >>>>> Unfortunately, I had to prioritize some other work this morning,
> so I
> > >>>>> haven't made it back to the bad node yet.
> > >>>>>
> > >>>>> I did attempt restarting the datanode to see if I could make hadoop
> > >>> fsck
> > >>>>> happy, but that didn't have any noticeable effect. I'm hoping to
> have
> > >>>> more
> > >>>>> time this afternoon to investigate the other suggestions from this
> > >>>> thread.
> > >>>>>
> > >>>>> -md
> > >>>>>
> > >>>>> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> > >> apurtell@apache.org>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> 
> > >>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > >>>>>>>
> > >>>>>>>> If it's possible to recover all of the file except
> > >>>>>>>> a portion of the affected block, that would be OK too.
> > >>>>>>>
> > >>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
> > >>> need
> > >>>>> to
> > >>>>>>> add it so you can recover all but the bad block (we should figure
> > >>> how
> > >>>>> to
> > >>>>>>> skip the bad section also).
> > >>>>>>
> > >>>>>>
> > >>>>>> I was just getting caught up on this thread and had the same
> > >>> thought.
> > >>>> Is
> > >>>>>> there an issue filed for this?
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > >>>>>>
> > >>>>>>> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > >>>> mike.dillon@synctree.com
> > >>>>>>
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Hi all-
> > >>>>>>>>
> > >>>>>>>> I've got an HFile that's reporting a corrupt block in "hadoop
> > >>> fsck"
> > >>>>> and
> > >>>>>>> was
> > >>>>>>>> hoping to get some advice on recovering as much data as
> > >> possible.
> > >>>>>>>>
> > >>>>>>>> When I examined the blk-* file on the three data nodes that
> > >> have
> > >>> a
> > >>>>>>> replica
> > >>>>>>>> of the affected block, I saw that the replicas on two of the
> > >>>>> datanodes
> > >>>>>>> had
> > >>>>>>>> the same SHA-1 checksum and that the replica on the other
> > >>> datanode
> > >>>>> was
> > >>>>>> a
> > >>>>>>>> truncated version of the replica found on the other nodes (as
> > >>>>> reported
> > >>>>>>> by a
> > >>>>>>>> difference at EOF by "cmp"). The size of the two identical
> > >> blocks
> > >>>> is
> > >>>>>>>> 67108864, the same as most of the other blocks in the file.
> > >>>>>>>>
> > >>>>>>>> Given that there were two datanodes with the same data and
> > >>> another
> > >>>>> with
> > >>>>>>>> truncated data, I made a backup of the truncated file and
> > >> dropped
> > >>>> the
> > >>>>>>>> full-length copy of the block in its place directly on the data
> > >>>>> mount,
> > >>>>>>>> hoping that this would cause HDFS to no longer report the file
> > >> as
> > >>>>>>> corrupt.
> > >>>>>>>> Unfortunately, this didn't seem to have any effect.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> That seems like a reasonable thing to do.
> > >>>>>>>
> > >>>>>>> Did you restart the DN that was serving this block before you ran
> > >>>> fsck?
> > >>>>>>> (Fsck asks namenode what blocks are bad; it likely is still
> > >>> reporting
> > >>>>> off
> > >>>>>>> old info).
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Looking through the Hadoop source code, it looks like there is
> > >> a
> > >>>>>>>> CorruptReplicasMap internally that tracks which nodes have
> > >>>> "corrupt"
> > >>>>>>> copies
> > >>>>>>>> of a block. In HDFS-6663 <
> > >>>>>>> https://issues.apache.org/jira/browse/HDFS-6663
> > >>>>>>>>> ,
> > >>>>>>>> a "-blockId" parameter was added to "hadoop fsck" to allow
> > >>> dumping
> > >>>>> the
> > >>>>>>>> reason that a block ids is considered corrupt, but that wasn't
> > >>>> added
> > >>>>>>> until
> > >>>>>>>> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> Good digging.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> I also had a look at running the "HFile" tool on the affected
> > >>> file
> > >>>>> (cf.
> > >>>>>>>> section 9.7.5.2.2 at
> > >>>>>> http://hbase.apache.org/0.94/book/regions.arch.html
> > >>>>>>> ).
> > >>>>>>>> When I did that, I was able to see the data up to the corrupted
> > >>>> block
> > >>>>>> as
> > >>>>>>>> far as I could tell, but then it started repeatedly looping
> > >> back
> > >>> to
> > >>>>> the
> > >>>>>>>> first row and starting over. I believe this is related to the
> > >>>>> behavior
> > >>>>>>>> described in https://issues.apache.org/jira/browse/HBASE-12949
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> So, your file is 3G and your blocks are 128M?
> > >>>>>>>
> > >>>>>>> The dfsclient should just pass over the bad replica and move on
> > >> to
> > >>>> the
> > >>>>>> good
> > >>>>>>> one so it would seem to indicate all replicas are bad for you.
> > >>>>>>>
> > >>>>>>> If you enable DFSClient DEBUG level logging it should report
> > >> which
> > >>>>> blocks
> > >>>>>>> it is reading from. For example, here I am reading the start of
> > >> the
> > >>>>> index
> > >>>>>>> blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> > >>>>>> emissions
> > >>>>>>> only:
> > >>>>>>>
> > >>>>>>> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > >>>>>>> org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > >>>>>>> DFSClient
> > >>>>>>> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > >>>>>>> org.apache.hadoop.util.PureJavaCrc32 available
> > >>>>>>> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > >>>>>>> org.apache.hadoop.util.PureJavaCrc32C available
> > >>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
> > >>>>>>> SLF4J: Found binding in
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>>>> SLF4J: Found binding in
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
> > >>> an
> > >>>>>>> explanation.
> > >>>>>>> SLF4J: Actual binding is of type
> > >>> [org.slf4j.impl.Log4jLoggerFactory]
> > >>>>>>> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > >>>>>>> CacheConfig:disabled
> > >>>>>>> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > >>>>>>> LocatedBlocks{
> > >>>>>>>  fileLength=108633903
> > >>>>>>>  underConstruction=false
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
> > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
> > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > >>>>>>>  isLastBlockComplete=true}
> > >>>>>>> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
> > >> to
> > >>>>>> datanode
> > >>>>>>> 10.20.84.27:50011
> > >>>>>>> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
> > >> to
> > >>>>>> datanode
> > >>>>>>> 10.20.84.27:50011
> > >>>>>>> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > >>>>>>> LocatedBlocks{
> > >>>>>>>  fileLength=108633903
> > >>>>>>>  underConstruction=false
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
> > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> > >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> > >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > >>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
> > >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > >>>>>>>  isLastBlockComplete=true}
> > >>>>>>> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
> > >> to
> > >>>>>> datanode
> > >>>>>>> 10.20.84.30:50011
> > >>>>>>> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
> > >> to
> > >>>>>> datanode
> > >>>>>>> 10.20.84.27:50011
> > >>>>>>>
> > >>>>>>> Do you see it reading from 'good' or 'bad' blocks?
> > >>>>>>>
> > >>>>>>> I added this line to hbase log4j.properties to enable DFSClient
> > >>>> DEBUG:
> > >>>>>>>
> > >>>>>>> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > >>>>>>>
> > >>>>>>> On HBASE-12949, what exception is coming up?  Dump it in here.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> My goal is to determine whether the block in question is
> > >> actually
> > >>>>>> corrupt
> > >>>>>>>> and, if so, in what way.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> What happens if you just try to copy the file local or elsewhere
> > >> in
> > >>>> the
> > >>>>>>> filesystem using dfs shell. Do you get a pure dfs exception
> > >>>> unhampered
> > >>>>> by
> > >>>>>>> hbaseyness?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> If it's possible to recover all of the file except
> > >>>>>>>> a portion of the affected block, that would be OK too.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
> > >>> need
> > >>>>> to
> > >>>>>>> add it so you can recover all but the bad block (we should figure
> > >>> how
> > >>>>> to
> > >>>>>>> skip the bad section also).
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> I just don't want to
> > >>>>>>>> be in the position of having to lose all 3 gigs of data in this
> > >>>>>>> particular
> > >>>>>>>> region, given that most of it appears to be intact. I just
> > >> can't
> > >>>> find
> > >>>>>> the
> > >>>>>>>> right low-level tools to let me determine the diagnose the
> > >> exact
> > >>>>> state
> > >>>>>>> and
> > >>>>>>>> structure of the block data I have for this file.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> Nod.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Any help or direction that someone could provide would be much
> > >>>>>>> appreciated.
> > >>>>>>>> For reference, I'll repeat that our client is running Hadoop
> > >>>>>>> 2.0.0-cdh4.6.0
> > >>>>>>>> and add that the HBase version is 0.94.15-cdh4.6.0.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>> See if any of the above helps. I'll try and dig up some more
> > >> tools
> > >>> in
> > >>>>>>> meantime.
> > >>>>>>> St.Ack
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Thanks!
> > >>>>>>>>
> > >>>>>>>> -md
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> --
> > >>>>>> Best regards,
> > >>>>>>
> > >>>>>>   - Andy
> > >>>>>>
> > >>>>>> Problems worthy of attack prove their worth by hitting back. -
> Piet
> > >>>> Hein
> > >>>>>> (via Tom White)
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> > The opinions expressed here are mine, while they may reflect a cognitive
> > thought, that is purely accidental.
> > Use at your own risk.
> > Michael Segel
> > michael_segel (AT) hotmail.com
> >
> >
> >
> >
> >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

I wish it were possible to take that step back and determine the root cause
in this case, but I wasn't asked to look into the situation until a few
weeks after the corruption took place (as far as I can tell). At that
point, the logs that would have said what was happening at the time had
been rotated out and were not being warehoused or monitored.

As you asked, the corrupt file did indeed have a single corrupt block out
of 42. I think it's reasonable to think that this happened during
compaction, but I can't be sure.

I'm not sure what the state of the data was at the time of
compaction/corruption, but I can say that when I looked at the data, there
were two different versions of the block. One of those versions
was 67108864 bytes long and had two replicas, the other was a truncated
version of the same data. This block was in the middle of the file and all
the other blocks except the final one had a size of 67108864 as well. HDFS
considered both versions of the block to be corrupt, but at one point I did
replace the truncated data on the one node with the full-length data (to no
avail).

-md

On Thu, Mar 19, 2015 at 6:49 PM, Michael Segel <mi...@hotmail.com>
wrote:

> Sorry,
>
> Can we take a step back? I’m a little slow this evening….
> (FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink
> too much Bourbon. I take no responsibility and blame my friends who are
> named Joe. ;-)
>
> What caused the block to be corrupt?
> Was it your typical HDFS where one block was corrupt in one file?
> From skimming your posts, it sounded like the corruption occurred when
> there was a compaction.
>
> Does that mean that during the compaction, it tried to read and compact a
> bad block and ignored the other two copies of the bad block that could have
> been good?
> Was it that at the time of writing the compacted data, there was a
> corruption that then was passed on two the other two copies?
>
> I guess the point I’m trying to raise is that trying to solve the problem
> after the fact may end up not being the right choice but to see if you can
> catch the bad block before trying to compact the data in the file.
> (Assuming you ended up trying to use a corrupted block)
>
> Does that make sense?
>
>
> -Mike
>
> > On Mar 19, 2015, at 2:27 PM, Mike Dillon <mi...@synctree.com>
> wrote:
> >
> > So, it turns out that the client has an archived data source that can
> > recreate the HBase data in question if needed, so the need for me to
> > actually recover this HFile has diminished to the point where it's
> probably
> > not worth investing my time in creating a custom tool to extract the
> data.
> >
> > Given that they're willing to lose the data in this region and recreate
> it
> > if necessary, do I simply need to delete the HFile to make HDFS happy or
> is
> > there something I need to do at the HBase level to tell it that data will
> > be going away?
> >
> > Thanks so much everyone for your help on this issue!
> >
> > -md
> >
> > On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:
> >
> >> From HBase perspective, since we don't have a ready tool, the general
> idea
> >> will need you to have access to HBase source code and write your own
> tool.
> >> On the high level, the tool will read/scan the KVs from the hfile
> similar
> >> to what the HFile tool does, while opening a HFileWriter to dump the
> good
> >> data until you are not able to do so.
> >> Then you will close the HFileWriter with the necessary meta file info.
> >> There are APIs in HBase to do so, but they may not be external public
> API.
> >>
> >> Jerry
> >>
> >> On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mi...@synctree.com>
> >> wrote:
> >>
> >>> I've had a chance to try out Stack's passed along suggestion of
> >>> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
> >> this:
> >>> https://gist.github.com/md5/d42e97ab7a0bd656f09a
> >>>
> >>> After knowing what to look for, I was able to find the same checksum
> >>> failures in the logs during the major compaction failures.
> >>>
> >>> I'm willing to accept that all the data after that point in the corrupt
> >>> block is lost, so any specific advice for how to replace that block
> with
> >> a
> >>> partial one containing only the good data would be appreciated. I'm
> aware
> >>> that there may be other checksum failures in the subsequent blocks as
> >> well,
> >>> since nothing is currently able to read past the first corruption
> point,
> >>> but I'll just have to wash, rinse, and repeat to see how much good data
> >> is
> >>> left is the file as a whole.
> >>>
> >>> -md
> >>>
> >>> On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com> wrote:
> >>>
> >>>> For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
> >> easy
> >>>> thing we can recover is probably the data (KVs) up to the point when
> we
> >>> hit
> >>>> the first corruption caused exception.
> >>>> After that, it will not be as easy.  For example, if the current key
> >>> length
> >>>> or value length is bad, there is no way to skip to the next KV.  We
> >> will
> >>>> probably need to skip the whole current hblock, and go to the next
> >> block
> >>>> for KVs assuming the hblock index is still good.
> >>>>
> >>>> HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does
> >> an
> >>>> incremental improvement to make sure we do get a corruption caused
> >>>> exception so that the scan/read will not go into an infinite loop.
> >>>>
> >>>> Jerry
> >>>>
> >>>> On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> >> mike.dillon@synctree.com>
> >>>> wrote:
> >>>>
> >>>>> I haven't filed one myself, but I can do so if my investigation ends
> >> up
> >>>>> finding something bug-worthy as opposed to just random failures due
> >> to
> >>>>> out-of-disk scenarios.
> >>>>>
> >>>>> Unfortunately, I had to prioritize some other work this morning, so I
> >>>>> haven't made it back to the bad node yet.
> >>>>>
> >>>>> I did attempt restarting the datanode to see if I could make hadoop
> >>> fsck
> >>>>> happy, but that didn't have any noticeable effect. I'm hoping to have
> >>>> more
> >>>>> time this afternoon to investigate the other suggestions from this
> >>>> thread.
> >>>>>
> >>>>> -md
> >>>>>
> >>>>> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> >> apurtell@apache.org>
> >>>>> wrote:
> >>>>>
> >>>>>> 
> >>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> >>>>>>>
> >>>>>>>> If it's possible to recover all of the file except
> >>>>>>>> a portion of the affected block, that would be OK too.
> >>>>>>>
> >>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
> >>> need
> >>>>> to
> >>>>>>> add it so you can recover all but the bad block (we should figure
> >>> how
> >>>>> to
> >>>>>>> skip the bad section also).
> >>>>>>
> >>>>>>
> >>>>>> I was just getting caught up on this thread and had the same
> >>> thought.
> >>>> Is
> >>>>>> there an issue filed for this?
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> >>>>>>
> >>>>>>> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> >>>> mike.dillon@synctree.com
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi all-
> >>>>>>>>
> >>>>>>>> I've got an HFile that's reporting a corrupt block in "hadoop
> >>> fsck"
> >>>>> and
> >>>>>>> was
> >>>>>>>> hoping to get some advice on recovering as much data as
> >> possible.
> >>>>>>>>
> >>>>>>>> When I examined the blk-* file on the three data nodes that
> >> have
> >>> a
> >>>>>>> replica
> >>>>>>>> of the affected block, I saw that the replicas on two of the
> >>>>> datanodes
> >>>>>>> had
> >>>>>>>> the same SHA-1 checksum and that the replica on the other
> >>> datanode
> >>>>> was
> >>>>>> a
> >>>>>>>> truncated version of the replica found on the other nodes (as
> >>>>> reported
> >>>>>>> by a
> >>>>>>>> difference at EOF by "cmp"). The size of the two identical
> >> blocks
> >>>> is
> >>>>>>>> 67108864, the same as most of the other blocks in the file.
> >>>>>>>>
> >>>>>>>> Given that there were two datanodes with the same data and
> >>> another
> >>>>> with
> >>>>>>>> truncated data, I made a backup of the truncated file and
> >> dropped
> >>>> the
> >>>>>>>> full-length copy of the block in its place directly on the data
> >>>>> mount,
> >>>>>>>> hoping that this would cause HDFS to no longer report the file
> >> as
> >>>>>>> corrupt.
> >>>>>>>> Unfortunately, this didn't seem to have any effect.
> >>>>>>>>
> >>>>>>>>
> >>>>>>> That seems like a reasonable thing to do.
> >>>>>>>
> >>>>>>> Did you restart the DN that was serving this block before you ran
> >>>> fsck?
> >>>>>>> (Fsck asks namenode what blocks are bad; it likely is still
> >>> reporting
> >>>>> off
> >>>>>>> old info).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Looking through the Hadoop source code, it looks like there is
> >> a
> >>>>>>>> CorruptReplicasMap internally that tracks which nodes have
> >>>> "corrupt"
> >>>>>>> copies
> >>>>>>>> of a block. In HDFS-6663 <
> >>>>>>> https://issues.apache.org/jira/browse/HDFS-6663
> >>>>>>>>> ,
> >>>>>>>> a "-blockId" parameter was added to "hadoop fsck" to allow
> >>> dumping
> >>>>> the
> >>>>>>>> reason that a block ids is considered corrupt, but that wasn't
> >>>> added
> >>>>>>> until
> >>>>>>>> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Good digging.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> I also had a look at running the "HFile" tool on the affected
> >>> file
> >>>>> (cf.
> >>>>>>>> section 9.7.5.2.2 at
> >>>>>> http://hbase.apache.org/0.94/book/regions.arch.html
> >>>>>>> ).
> >>>>>>>> When I did that, I was able to see the data up to the corrupted
> >>>> block
> >>>>>> as
> >>>>>>>> far as I could tell, but then it started repeatedly looping
> >> back
> >>> to
> >>>>> the
> >>>>>>>> first row and starting over. I believe this is related to the
> >>>>> behavior
> >>>>>>>> described in https://issues.apache.org/jira/browse/HBASE-12949
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> So, your file is 3G and your blocks are 128M?
> >>>>>>>
> >>>>>>> The dfsclient should just pass over the bad replica and move on
> >> to
> >>>> the
> >>>>>> good
> >>>>>>> one so it would seem to indicate all replicas are bad for you.
> >>>>>>>
> >>>>>>> If you enable DFSClient DEBUG level logging it should report
> >> which
> >>>>> blocks
> >>>>>>> it is reading from. For example, here I am reading the start of
> >> the
> >>>>> index
> >>>>>>> blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> >>>>>> emissions
> >>>>>>> only:
> >>>>>>>
> >>>>>>> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> >>>>>>> org.apache.hadoop.hbase.io.hfile.HFile -h -f
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> >>>>>>> DFSClient
> >>>>>>> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> >>>>>>> org.apache.hadoop.util.PureJavaCrc32 available
> >>>>>>> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> >>>>>>> org.apache.hadoop.util.PureJavaCrc32C available
> >>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
> >>>>>>> SLF4J: Found binding in
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>>>> SLF4J: Found binding in
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> >>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
> >>> an
> >>>>>>> explanation.
> >>>>>>> SLF4J: Actual binding is of type
> >>> [org.slf4j.impl.Log4jLoggerFactory]
> >>>>>>> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> >>>>>>> CacheConfig:disabled
> >>>>>>> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> >>>>>>> LocatedBlocks{
> >>>>>>>  fileLength=108633903
> >>>>>>>  underConstruction=false
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
> >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
> >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >>>>>>>  isLastBlockComplete=true}
> >>>>>>> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
> >> to
> >>>>>> datanode
> >>>>>>> 10.20.84.27:50011
> >>>>>>> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
> >> to
> >>>>>> datanode
> >>>>>>> 10.20.84.27:50011
> >>>>>>> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> >>>>>>> LocatedBlocks{
> >>>>>>>  fileLength=108633903
> >>>>>>>  underConstruction=false
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
> >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> >>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
> >>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> >>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
> >>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> >>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
> >>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >>>>>>>  isLastBlockComplete=true}
> >>>>>>> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
> >> to
> >>>>>> datanode
> >>>>>>> 10.20.84.30:50011
> >>>>>>> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
> >> to
> >>>>>> datanode
> >>>>>>> 10.20.84.27:50011
> >>>>>>>
> >>>>>>> Do you see it reading from 'good' or 'bad' blocks?
> >>>>>>>
> >>>>>>> I added this line to hbase log4j.properties to enable DFSClient
> >>>> DEBUG:
> >>>>>>>
> >>>>>>> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> >>>>>>>
> >>>>>>> On HBASE-12949, what exception is coming up?  Dump it in here.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> My goal is to determine whether the block in question is
> >> actually
> >>>>>> corrupt
> >>>>>>>> and, if so, in what way.
> >>>>>>>
> >>>>>>>
> >>>>>>> What happens if you just try to copy the file local or elsewhere
> >> in
> >>>> the
> >>>>>>> filesystem using dfs shell. Do you get a pure dfs exception
> >>>> unhampered
> >>>>> by
> >>>>>>> hbaseyness?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> If it's possible to recover all of the file except
> >>>>>>>> a portion of the affected block, that would be OK too.
> >>>>>>>
> >>>>>>>
> >>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
> >>> need
> >>>>> to
> >>>>>>> add it so you can recover all but the bad block (we should figure
> >>> how
> >>>>> to
> >>>>>>> skip the bad section also).
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> I just don't want to
> >>>>>>>> be in the position of having to lose all 3 gigs of data in this
> >>>>>>> particular
> >>>>>>>> region, given that most of it appears to be intact. I just
> >> can't
> >>>> find
> >>>>>> the
> >>>>>>>> right low-level tools to let me determine the diagnose the
> >> exact
> >>>>> state
> >>>>>>> and
> >>>>>>>> structure of the block data I have for this file.
> >>>>>>>>
> >>>>>>>>
> >>>>>>> Nod.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Any help or direction that someone could provide would be much
> >>>>>>> appreciated.
> >>>>>>>> For reference, I'll repeat that our client is running Hadoop
> >>>>>>> 2.0.0-cdh4.6.0
> >>>>>>>> and add that the HBase version is 0.94.15-cdh4.6.0.
> >>>>>>>>
> >>>>>>>>
> >>>>>>> See if any of the above helps. I'll try and dig up some more
> >> tools
> >>> in
> >>>>>>> meantime.
> >>>>>>> St.Ack
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>> Thanks!
> >>>>>>>>
> >>>>>>>> -md
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Best regards,
> >>>>>>
> >>>>>>   - Andy
> >>>>>>
> >>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
> >>>> Hein
> >>>>>> (via Tom White)
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Re: Recovering from corrupt blocks in HFile

Posted by Michael Segel <mi...@hotmail.com>.

Sorry, 

Can we take a step back? I’m a little slow this evening…. 
(FYI… today is St. Joseph’s Day and I was kidnapped and forced to drink too much Bourbon. I take no responsibility and blame my friends who are named Joe. ;-)

What caused the block to be corrupt? 
Was it your typical HDFS where one block was corrupt in one file? 
From skimming your posts, it sounded like the corruption occurred when there was a compaction. 

Does that mean that during the compaction, it tried to read and compact a bad block and ignored the other two copies of the bad block that could have been good? 
Was it that at the time of writing the compacted data, there was a corruption that then was passed on two the other two copies? 

I guess the point I’m trying to raise is that trying to solve the problem after the fact may end up not being the right choice but to see if you can catch the bad block before trying to compact the data in the file. (Assuming you ended up trying to use a corrupted block)

Does that make sense? 

 
-Mike

> On Mar 19, 2015, at 2:27 PM, Mike Dillon <mi...@synctree.com> wrote:
> 
> So, it turns out that the client has an archived data source that can
> recreate the HBase data in question if needed, so the need for me to
> actually recover this HFile has diminished to the point where it's probably
> not worth investing my time in creating a custom tool to extract the data.
> 
> Given that they're willing to lose the data in this region and recreate it
> if necessary, do I simply need to delete the HFile to make HDFS happy or is
> there something I need to do at the HBase level to tell it that data will
> be going away?
> 
> Thanks so much everyone for your help on this issue!
> 
> -md
> 
> On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:
> 
>> From HBase perspective, since we don't have a ready tool, the general idea
>> will need you to have access to HBase source code and write your own tool.
>> On the high level, the tool will read/scan the KVs from the hfile similar
>> to what the HFile tool does, while opening a HFileWriter to dump the good
>> data until you are not able to do so.
>> Then you will close the HFileWriter with the necessary meta file info.
>> There are APIs in HBase to do so, but they may not be external public API.
>> 
>> Jerry
>> 
>> On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mi...@synctree.com>
>> wrote:
>> 
>>> I've had a chance to try out Stack's passed along suggestion of
>>> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
>> this:
>>> https://gist.github.com/md5/d42e97ab7a0bd656f09a
>>> 
>>> After knowing what to look for, I was able to find the same checksum
>>> failures in the logs during the major compaction failures.
>>> 
>>> I'm willing to accept that all the data after that point in the corrupt
>>> block is lost, so any specific advice for how to replace that block with
>> a
>>> partial one containing only the good data would be appreciated. I'm aware
>>> that there may be other checksum failures in the subsequent blocks as
>> well,
>>> since nothing is currently able to read past the first corruption point,
>>> but I'll just have to wash, rinse, and repeat to see how much good data
>> is
>>> left is the file as a whole.
>>> 
>>> -md
>>> 
>>> On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com> wrote:
>>> 
>>>> For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
>> easy
>>>> thing we can recover is probably the data (KVs) up to the point when we
>>> hit
>>>> the first corruption caused exception.
>>>> After that, it will not be as easy.  For example, if the current key
>>> length
>>>> or value length is bad, there is no way to skip to the next KV.  We
>> will
>>>> probably need to skip the whole current hblock, and go to the next
>> block
>>>> for KVs assuming the hblock index is still good.
>>>> 
>>>> HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does
>> an
>>>> incremental improvement to make sure we do get a corruption caused
>>>> exception so that the scan/read will not go into an infinite loop.
>>>> 
>>>> Jerry
>>>> 
>>>> On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
>> mike.dillon@synctree.com>
>>>> wrote:
>>>> 
>>>>> I haven't filed one myself, but I can do so if my investigation ends
>> up
>>>>> finding something bug-worthy as opposed to just random failures due
>> to
>>>>> out-of-disk scenarios.
>>>>> 
>>>>> Unfortunately, I had to prioritize some other work this morning, so I
>>>>> haven't made it back to the bad node yet.
>>>>> 
>>>>> I did attempt restarting the datanode to see if I could make hadoop
>>> fsck
>>>>> happy, but that didn't have any noticeable effect. I'm hoping to have
>>>> more
>>>>> time this afternoon to investigate the other suggestions from this
>>>> thread.
>>>>> 
>>>>> -md
>>>>> 
>>>>> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
>> apurtell@apache.org>
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>>>>>>> 
>>>>>>>> If it's possible to recover all of the file except
>>>>>>>> a portion of the affected block, that would be OK too.
>>>>>>> 
>>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
>>> need
>>>>> to
>>>>>>> add it so you can recover all but the bad block (we should figure
>>> how
>>>>> to
>>>>>>> skip the bad section also).
>>>>>> 
>>>>>> 
>>>>>> I was just getting caught up on this thread and had the same
>>> thought.
>>>> Is
>>>>>> there an issue filed for this?
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>>>>>> 
>>>>>>> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
>>>> mike.dillon@synctree.com
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi all-
>>>>>>>> 
>>>>>>>> I've got an HFile that's reporting a corrupt block in "hadoop
>>> fsck"
>>>>> and
>>>>>>> was
>>>>>>>> hoping to get some advice on recovering as much data as
>> possible.
>>>>>>>> 
>>>>>>>> When I examined the blk-* file on the three data nodes that
>> have
>>> a
>>>>>>> replica
>>>>>>>> of the affected block, I saw that the replicas on two of the
>>>>> datanodes
>>>>>>> had
>>>>>>>> the same SHA-1 checksum and that the replica on the other
>>> datanode
>>>>> was
>>>>>> a
>>>>>>>> truncated version of the replica found on the other nodes (as
>>>>> reported
>>>>>>> by a
>>>>>>>> difference at EOF by "cmp"). The size of the two identical
>> blocks
>>>> is
>>>>>>>> 67108864, the same as most of the other blocks in the file.
>>>>>>>> 
>>>>>>>> Given that there were two datanodes with the same data and
>>> another
>>>>> with
>>>>>>>> truncated data, I made a backup of the truncated file and
>> dropped
>>>> the
>>>>>>>> full-length copy of the block in its place directly on the data
>>>>> mount,
>>>>>>>> hoping that this would cause HDFS to no longer report the file
>> as
>>>>>>> corrupt.
>>>>>>>> Unfortunately, this didn't seem to have any effect.
>>>>>>>> 
>>>>>>>> 
>>>>>>> That seems like a reasonable thing to do.
>>>>>>> 
>>>>>>> Did you restart the DN that was serving this block before you ran
>>>> fsck?
>>>>>>> (Fsck asks namenode what blocks are bad; it likely is still
>>> reporting
>>>>> off
>>>>>>> old info).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Looking through the Hadoop source code, it looks like there is
>> a
>>>>>>>> CorruptReplicasMap internally that tracks which nodes have
>>>> "corrupt"
>>>>>>> copies
>>>>>>>> of a block. In HDFS-6663 <
>>>>>>> https://issues.apache.org/jira/browse/HDFS-6663
>>>>>>>>> ,
>>>>>>>> a "-blockId" parameter was added to "hadoop fsck" to allow
>>> dumping
>>>>> the
>>>>>>>> reason that a block ids is considered corrupt, but that wasn't
>>>> added
>>>>>>> until
>>>>>>>> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
>>>>>>>> 
>>>>>>>> 
>>>>>>> Good digging.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> I also had a look at running the "HFile" tool on the affected
>>> file
>>>>> (cf.
>>>>>>>> section 9.7.5.2.2 at
>>>>>> http://hbase.apache.org/0.94/book/regions.arch.html
>>>>>>> ).
>>>>>>>> When I did that, I was able to see the data up to the corrupted
>>>> block
>>>>>> as
>>>>>>>> far as I could tell, but then it started repeatedly looping
>> back
>>> to
>>>>> the
>>>>>>>> first row and starting over. I believe this is related to the
>>>>> behavior
>>>>>>>> described in https://issues.apache.org/jira/browse/HBASE-12949
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> So, your file is 3G and your blocks are 128M?
>>>>>>> 
>>>>>>> The dfsclient should just pass over the bad replica and move on
>> to
>>>> the
>>>>>> good
>>>>>>> one so it would seem to indicate all replicas are bad for you.
>>>>>>> 
>>>>>>> If you enable DFSClient DEBUG level logging it should report
>> which
>>>>> blocks
>>>>>>> it is reading from. For example, here I am reading the start of
>> the
>>>>> index
>>>>>>> blocks with DFSClient DEBUG enabled but I grep out the DFSClient
>>>>>> emissions
>>>>>>> only:
>>>>>>> 
>>>>>>> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
>>>>>>> org.apache.hadoop.hbase.io.hfile.HFile -h -f
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
>>>>>>> DFSClient
>>>>>>> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
>>>>>>> org.apache.hadoop.util.PureJavaCrc32 available
>>>>>>> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
>>>>>>> org.apache.hadoop.util.PureJavaCrc32C available
>>>>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>>>>> SLF4J: Found binding in
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: Found binding in
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
>>> an
>>>>>>> explanation.
>>>>>>> SLF4J: Actual binding is of type
>>> [org.slf4j.impl.Log4jLoggerFactory]
>>>>>>> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
>>>>>>> CacheConfig:disabled
>>>>>>> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
>>>>>>> LocatedBlocks{
>>>>>>>  fileLength=108633903
>>>>>>>  underConstruction=false
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>>>>>>>  isLastBlockComplete=true}
>>>>>>> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
>> to
>>>>>> datanode
>>>>>>> 10.20.84.27:50011
>>>>>>> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
>> to
>>>>>> datanode
>>>>>>> 10.20.84.27:50011
>>>>>>> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
>>>>>>> LocatedBlocks{
>>>>>>>  fileLength=108633903
>>>>>>>  underConstruction=false
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
>>>>>>> getBlockSize()=108633903; corrupt=false; offset=0;
>>>>>>> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
>>>>>>> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.31:50011
>>>>>>> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
>>>>>>> DatanodeInfoWithStorage[10.20.84.30:50011
>>>>>>> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>>>>>>>  isLastBlockComplete=true}
>>>>>>> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
>> to
>>>>>> datanode
>>>>>>> 10.20.84.30:50011
>>>>>>> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
>> to
>>>>>> datanode
>>>>>>> 10.20.84.27:50011
>>>>>>> 
>>>>>>> Do you see it reading from 'good' or 'bad' blocks?
>>>>>>> 
>>>>>>> I added this line to hbase log4j.properties to enable DFSClient
>>>> DEBUG:
>>>>>>> 
>>>>>>> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
>>>>>>> 
>>>>>>> On HBASE-12949, what exception is coming up?  Dump it in here.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> My goal is to determine whether the block in question is
>> actually
>>>>>> corrupt
>>>>>>>> and, if so, in what way.
>>>>>>> 
>>>>>>> 
>>>>>>> What happens if you just try to copy the file local or elsewhere
>> in
>>>> the
>>>>>>> filesystem using dfs shell. Do you get a pure dfs exception
>>>> unhampered
>>>>> by
>>>>>>> hbaseyness?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> If it's possible to recover all of the file except
>>>>>>>> a portion of the affected block, that would be OK too.
>>>>>>> 
>>>>>>> 
>>>>>>> I actually do not see a 'fix' or 'recover' on the hfile tool. We
>>> need
>>>>> to
>>>>>>> add it so you can recover all but the bad block (we should figure
>>> how
>>>>> to
>>>>>>> skip the bad section also).
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> I just don't want to
>>>>>>>> be in the position of having to lose all 3 gigs of data in this
>>>>>>> particular
>>>>>>>> region, given that most of it appears to be intact. I just
>> can't
>>>> find
>>>>>> the
>>>>>>>> right low-level tools to let me determine the diagnose the
>> exact
>>>>> state
>>>>>>> and
>>>>>>>> structure of the block data I have for this file.
>>>>>>>> 
>>>>>>>> 
>>>>>>> Nod.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Any help or direction that someone could provide would be much
>>>>>>> appreciated.
>>>>>>>> For reference, I'll repeat that our client is running Hadoop
>>>>>>> 2.0.0-cdh4.6.0
>>>>>>>> and add that the HBase version is 0.94.15-cdh4.6.0.
>>>>>>>> 
>>>>>>>> 
>>>>>>> See if any of the above helps. I'll try and dig up some more
>> tools
>>> in
>>>>>>> meantime.
>>>>>>> St.Ack
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> Thanks!
>>>>>>>> 
>>>>>>>> -md
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Best regards,
>>>>>> 
>>>>>>   - Andy
>>>>>> 
>>>>>> Problems worthy of attack prove their worth by hitting back. - Piet
>>>> Hein
>>>>>> (via Tom White)
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 

The opinions expressed here are mine, while they may reflect a cognitive thought, that is purely accidental. 
Use at your own risk. 
Michael Segel
michael_segel (AT) hotmail.com

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

Thank you!

On Thu, Mar 19, 2015 at 1:48 PM, Jerry He <je...@gmail.com> wrote:

> It is ok to delete the hfile in question with hadoop file system command.
> No restart of hbase is needed.  You may see some error exceptions if there
> are things (user scan, compaction) on the fly.  But it will be ok.
>
> Jerry
>
> On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon <mi...@synctree.com>
> wrote:
>
> > So, it turns out that the client has an archived data source that can
> > recreate the HBase data in question if needed, so the need for me to
> > actually recover this HFile has diminished to the point where it's
> probably
> > not worth investing my time in creating a custom tool to extract the
> data.
> >
> > Given that they're willing to lose the data in this region and recreate
> it
> > if necessary, do I simply need to delete the HFile to make HDFS happy or
> is
> > there something I need to do at the HBase level to tell it that data will
> > be going away?
> >
> > Thanks so much everyone for your help on this issue!
> >
> > -md
> >
> > On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:
> >
> > > From HBase perspective, since we don't have a ready tool, the general
> > idea
> > > will need you to have access to HBase source code and write your own
> > tool.
> > > On the high level, the tool will read/scan the KVs from the hfile
> similar
> > > to what the HFile tool does, while opening a HFileWriter to dump the
> good
> > > data until you are not able to do so.
> > > Then you will close the HFileWriter with the necessary meta file info.
> > > There are APIs in HBase to do so, but they may not be external public
> > API.
> > >
> > > Jerry
> > >
> > > On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mike.dillon@synctree.com
> >
> > > wrote:
> > >
> > > > I've had a chance to try out Stack's passed along suggestion of
> > > > HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
> > > this:
> > > > https://gist.github.com/md5/d42e97ab7a0bd656f09a
> > > >
> > > > After knowing what to look for, I was able to find the same checksum
> > > > failures in the logs during the major compaction failures.
> > > >
> > > > I'm willing to accept that all the data after that point in the
> corrupt
> > > > block is lost, so any specific advice for how to replace that block
> > with
> > > a
> > > > partial one containing only the good data would be appreciated. I'm
> > aware
> > > > that there may be other checksum failures in the subsequent blocks as
> > > well,
> > > > since nothing is currently able to read past the first corruption
> > point,
> > > > but I'll just have to wash, rinse, and repeat to see how much good
> data
> > > is
> > > > left is the file as a whole.
> > > >
> > > > -md
> > > >
> > > > On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com>
> wrote:
> > > >
> > > > > For a 'fix' and 'recover' hfile tool at HBase level,  the
> relatively
> > > easy
> > > > > thing we can recover is probably the data (KVs) up to the point
> when
> > we
> > > > hit
> > > > > the first corruption caused exception.
> > > > > After that, it will not be as easy.  For example, if the current
> key
> > > > length
> > > > > or value length is bad, there is no way to skip to the next KV.  We
> > > will
> > > > > probably need to skip the whole current hblock, and go to the next
> > > block
> > > > > for KVs assuming the hblock index is still good.
> > > > >
> > > > > HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949>
> does
> > > an
> > > > > incremental improvement to make sure we do get a corruption caused
> > > > > exception so that the scan/read will not go into an infinite loop.
> > > > >
> > > > > Jerry
> > > > >
> > > > > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> > > mike.dillon@synctree.com>
> > > > > wrote:
> > > > >
> > > > > > I haven't filed one myself, but I can do so if my investigation
> > ends
> > > up
> > > > > > finding something bug-worthy as opposed to just random failures
> due
> > > to
> > > > > > out-of-disk scenarios.
> > > > > >
> > > > > > Unfortunately, I had to prioritize some other work this morning,
> > so I
> > > > > > haven't made it back to the bad node yet.
> > > > > >
> > > > > > I did attempt restarting the datanode to see if I could make
> hadoop
> > > > fsck
> > > > > > happy, but that didn't have any noticeable effect. I'm hoping to
> > have
> > > > > more
> > > > > > time this afternoon to investigate the other suggestions from
> this
> > > > > thread.
> > > > > >
> > > > > > -md
> > > > > >
> > > > > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> > > apurtell@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > 
> > > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net>
> wrote:
> > > > > > > >
> > > > > > > > > If it's possible to recover all of the file except
> > > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > > >
> > > > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool.
> > We
> > > > need
> > > > > > to
> > > > > > > > add it so you can recover all but the bad block (we should
> > figure
> > > > how
> > > > > > to
> > > > > > > > skip the bad section also).
> > > > > > >
> > > > > > >
> > > > > > > I was just getting caught up on this thread and had the same
> > > > thought.
> > > > > Is
> > > > > > > there an issue filed for this?
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net>
> wrote:
> > > > > > >
> > > > > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > > > > mike.dillon@synctree.com
> > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all-
> > > > > > > > >
> > > > > > > > > I've got an HFile that's reporting a corrupt block in
> "hadoop
> > > > fsck"
> > > > > > and
> > > > > > > > was
> > > > > > > > > hoping to get some advice on recovering as much data as
> > > possible.
> > > > > > > > >
> > > > > > > > > When I examined the blk-* file on the three data nodes that
> > > have
> > > > a
> > > > > > > > replica
> > > > > > > > > of the affected block, I saw that the replicas on two of
> the
> > > > > > datanodes
> > > > > > > > had
> > > > > > > > > the same SHA-1 checksum and that the replica on the other
> > > > datanode
> > > > > > was
> > > > > > > a
> > > > > > > > > truncated version of the replica found on the other nodes
> (as
> > > > > > reported
> > > > > > > > by a
> > > > > > > > > difference at EOF by "cmp"). The size of the two identical
> > > blocks
> > > > > is
> > > > > > > > > 67108864, the same as most of the other blocks in the file.
> > > > > > > > >
> > > > > > > > > Given that there were two datanodes with the same data and
> > > > another
> > > > > > with
> > > > > > > > > truncated data, I made a backup of the truncated file and
> > > dropped
> > > > > the
> > > > > > > > > full-length copy of the block in its place directly on the
> > data
> > > > > > mount,
> > > > > > > > > hoping that this would cause HDFS to no longer report the
> > file
> > > as
> > > > > > > > corrupt.
> > > > > > > > > Unfortunately, this didn't seem to have any effect.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > That seems like a reasonable thing to do.
> > > > > > > >
> > > > > > > > Did you restart the DN that was serving this block before you
> > ran
> > > > > fsck?
> > > > > > > > (Fsck asks namenode what blocks are bad; it likely is still
> > > > reporting
> > > > > > off
> > > > > > > > old info).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Looking through the Hadoop source code, it looks like there
> > is
> > > a
> > > > > > > > > CorruptReplicasMap internally that tracks which nodes have
> > > > > "corrupt"
> > > > > > > > copies
> > > > > > > > > of a block. In HDFS-6663 <
> > > > > > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > > > > > >,
> > > > > > > > > a "-blockId" parameter was added to "hadoop fsck" to allow
> > > > dumping
> > > > > > the
> > > > > > > > > reason that a block ids is considered corrupt, but that
> > wasn't
> > > > > added
> > > > > > > > until
> > > > > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Good digging.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > I also had a look at running the "HFile" tool on the
> affected
> > > > file
> > > > > > (cf.
> > > > > > > > > section 9.7.5.2.2 at
> > > > > > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > > > > > ).
> > > > > > > > > When I did that, I was able to see the data up to the
> > corrupted
> > > > > block
> > > > > > > as
> > > > > > > > > far as I could tell, but then it started repeatedly looping
> > > back
> > > > to
> > > > > > the
> > > > > > > > > first row and starting over. I believe this is related to
> the
> > > > > > behavior
> > > > > > > > > described in
> > https://issues.apache.org/jira/browse/HBASE-12949
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > So, your file is 3G and your blocks are 128M?
> > > > > > > >
> > > > > > > > The dfsclient should just pass over the bad replica and move
> on
> > > to
> > > > > the
> > > > > > > good
> > > > > > > > one so it would seem to indicate all replicas are bad for
> you.
> > > > > > > >
> > > > > > > > If you enable DFSClient DEBUG level logging it should report
> > > which
> > > > > > blocks
> > > > > > > > it is reading from. For example, here I am reading the start
> of
> > > the
> > > > > > index
> > > > > > > > blocks with DFSClient DEBUG enabled but I grep out the
> > DFSClient
> > > > > > > emissions
> > > > > > > > only:
> > > > > > > >
> > > > > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > > > > > > DFSClient
> > > > > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > > > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > > > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > > > > SLF4J: Found binding in
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > > SLF4J: Found binding in
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
> > for
> > > > an
> > > > > > > > explanation.
> > > > > > > > SLF4J: Actual binding is of type
> > > > [org.slf4j.impl.Log4jLoggerFactory]
> > > > > > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > > > > > CacheConfig:disabled
> > > > > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo
> =
> > > > > > > > LocatedBlocks{
> > > > > > > >   fileLength=108633903
> > > > > > > >   underConstruction=false
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > > >   isLastBlockComplete=true}
> > > > > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.27:50011
> > > > > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.27:50011
> > > > > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo
> =
> > > > > > > > LocatedBlocks{
> > > > > > > >   fileLength=108633903
> > > > > > > >   underConstruction=false
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > > >   isLastBlockComplete=true}
> > > > > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.30:50011
> > > > > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient:
> Connecting
> > > to
> > > > > > > datanode
> > > > > > > > 10.20.84.27:50011
> > > > > > > >
> > > > > > > > Do you see it reading from 'good' or 'bad' blocks?
> > > > > > > >
> > > > > > > > I added this line to hbase log4j.properties to enable
> DFSClient
> > > > > DEBUG:
> > > > > > > >
> > > > > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > > > > > >
> > > > > > > > On HBASE-12949, what exception is coming up?  Dump it in
> here.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > My goal is to determine whether the block in question is
> > > actually
> > > > > > > corrupt
> > > > > > > > > and, if so, in what way.
> > > > > > > >
> > > > > > > >
> > > > > > > > What happens if you just try to copy the file local or
> > elsewhere
> > > in
> > > > > the
> > > > > > > > filesystem using dfs shell. Do you get a pure dfs exception
> > > > > unhampered
> > > > > > by
> > > > > > > > hbaseyness?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > If it's possible to recover all of the file except
> > > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > > >
> > > > > > > >
> > > > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool.
> > We
> > > > need
> > > > > > to
> > > > > > > > add it so you can recover all but the bad block (we should
> > figure
> > > > how
> > > > > > to
> > > > > > > > skip the bad section also).
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > I just don't want to
> > > > > > > > > be in the position of having to lose all 3 gigs of data in
> > this
> > > > > > > > particular
> > > > > > > > > region, given that most of it appears to be intact. I just
> > > can't
> > > > > find
> > > > > > > the
> > > > > > > > > right low-level tools to let me determine the diagnose the
> > > exact
> > > > > > state
> > > > > > > > and
> > > > > > > > > structure of the block data I have for this file.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > Nod.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Any help or direction that someone could provide would be
> > much
> > > > > > > > appreciated.
> > > > > > > > > For reference, I'll repeat that our client is running
> Hadoop
> > > > > > > > 2.0.0-cdh4.6.0
> > > > > > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > See if any of the above helps. I'll try and dig up some more
> > > tools
> > > > in
> > > > > > > > meantime.
> > > > > > > > St.Ack
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > >
> > > > > > > > > -md
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >
> > > > > > >    - Andy
> > > > > > >
> > > > > > > Problems worthy of attack prove their worth by hitting back. -
> > Piet
> > > > > Hein
> > > > > > > (via Tom White)
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Jerry He <je...@gmail.com>.

It is ok to delete the hfile in question with hadoop file system command.
No restart of hbase is needed.  You may see some error exceptions if there
are things (user scan, compaction) on the fly.  But it will be ok.

Jerry

On Thu, Mar 19, 2015 at 12:27 PM, Mike Dillon <mi...@synctree.com>
wrote:

> So, it turns out that the client has an archived data source that can
> recreate the HBase data in question if needed, so the need for me to
> actually recover this HFile has diminished to the point where it's probably
> not worth investing my time in creating a custom tool to extract the data.
>
> Given that they're willing to lose the data in this region and recreate it
> if necessary, do I simply need to delete the HFile to make HDFS happy or is
> there something I need to do at the HBase level to tell it that data will
> be going away?
>
> Thanks so much everyone for your help on this issue!
>
> -md
>
> On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:
>
> > From HBase perspective, since we don't have a ready tool, the general
> idea
> > will need you to have access to HBase source code and write your own
> tool.
> > On the high level, the tool will read/scan the KVs from the hfile similar
> > to what the HFile tool does, while opening a HFileWriter to dump the good
> > data until you are not able to do so.
> > Then you will close the HFileWriter with the necessary meta file info.
> > There are APIs in HBase to do so, but they may not be external public
> API.
> >
> > Jerry
> >
> > On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mi...@synctree.com>
> > wrote:
> >
> > > I've had a chance to try out Stack's passed along suggestion of
> > > HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
> > this:
> > > https://gist.github.com/md5/d42e97ab7a0bd656f09a
> > >
> > > After knowing what to look for, I was able to find the same checksum
> > > failures in the logs during the major compaction failures.
> > >
> > > I'm willing to accept that all the data after that point in the corrupt
> > > block is lost, so any specific advice for how to replace that block
> with
> > a
> > > partial one containing only the good data would be appreciated. I'm
> aware
> > > that there may be other checksum failures in the subsequent blocks as
> > well,
> > > since nothing is currently able to read past the first corruption
> point,
> > > but I'll just have to wash, rinse, and repeat to see how much good data
> > is
> > > left is the file as a whole.
> > >
> > > -md
> > >
> > > On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com> wrote:
> > >
> > > > For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
> > easy
> > > > thing we can recover is probably the data (KVs) up to the point when
> we
> > > hit
> > > > the first corruption caused exception.
> > > > After that, it will not be as easy.  For example, if the current key
> > > length
> > > > or value length is bad, there is no way to skip to the next KV.  We
> > will
> > > > probably need to skip the whole current hblock, and go to the next
> > block
> > > > for KVs assuming the hblock index is still good.
> > > >
> > > > HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does
> > an
> > > > incremental improvement to make sure we do get a corruption caused
> > > > exception so that the scan/read will not go into an infinite loop.
> > > >
> > > > Jerry
> > > >
> > > > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> > mike.dillon@synctree.com>
> > > > wrote:
> > > >
> > > > > I haven't filed one myself, but I can do so if my investigation
> ends
> > up
> > > > > finding something bug-worthy as opposed to just random failures due
> > to
> > > > > out-of-disk scenarios.
> > > > >
> > > > > Unfortunately, I had to prioritize some other work this morning,
> so I
> > > > > haven't made it back to the bad node yet.
> > > > >
> > > > > I did attempt restarting the datanode to see if I could make hadoop
> > > fsck
> > > > > happy, but that didn't have any noticeable effect. I'm hoping to
> have
> > > > more
> > > > > time this afternoon to investigate the other suggestions from this
> > > > thread.
> > > > >
> > > > > -md
> > > > >
> > > > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> > apurtell@apache.org>
> > > > > wrote:
> > > > >
> > > > > > 
> > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > > > > >
> > > > > > > > If it's possible to recover all of the file except
> > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > >
> > > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool.
> We
> > > need
> > > > > to
> > > > > > > add it so you can recover all but the bad block (we should
> figure
> > > how
> > > > > to
> > > > > > > skip the bad section also).
> > > > > >
> > > > > >
> > > > > > I was just getting caught up on this thread and had the same
> > > thought.
> > > > Is
> > > > > > there an issue filed for this?
> > > > > >
> > > > > >
> > > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > > > >
> > > > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > > > mike.dillon@synctree.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi all-
> > > > > > > >
> > > > > > > > I've got an HFile that's reporting a corrupt block in "hadoop
> > > fsck"
> > > > > and
> > > > > > > was
> > > > > > > > hoping to get some advice on recovering as much data as
> > possible.
> > > > > > > >
> > > > > > > > When I examined the blk-* file on the three data nodes that
> > have
> > > a
> > > > > > > replica
> > > > > > > > of the affected block, I saw that the replicas on two of the
> > > > > datanodes
> > > > > > > had
> > > > > > > > the same SHA-1 checksum and that the replica on the other
> > > datanode
> > > > > was
> > > > > > a
> > > > > > > > truncated version of the replica found on the other nodes (as
> > > > > reported
> > > > > > > by a
> > > > > > > > difference at EOF by "cmp"). The size of the two identical
> > blocks
> > > > is
> > > > > > > > 67108864, the same as most of the other blocks in the file.
> > > > > > > >
> > > > > > > > Given that there were two datanodes with the same data and
> > > another
> > > > > with
> > > > > > > > truncated data, I made a backup of the truncated file and
> > dropped
> > > > the
> > > > > > > > full-length copy of the block in its place directly on the
> data
> > > > > mount,
> > > > > > > > hoping that this would cause HDFS to no longer report the
> file
> > as
> > > > > > > corrupt.
> > > > > > > > Unfortunately, this didn't seem to have any effect.
> > > > > > > >
> > > > > > > >
> > > > > > > That seems like a reasonable thing to do.
> > > > > > >
> > > > > > > Did you restart the DN that was serving this block before you
> ran
> > > > fsck?
> > > > > > > (Fsck asks namenode what blocks are bad; it likely is still
> > > reporting
> > > > > off
> > > > > > > old info).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Looking through the Hadoop source code, it looks like there
> is
> > a
> > > > > > > > CorruptReplicasMap internally that tracks which nodes have
> > > > "corrupt"
> > > > > > > copies
> > > > > > > > of a block. In HDFS-6663 <
> > > > > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > > > > >,
> > > > > > > > a "-blockId" parameter was added to "hadoop fsck" to allow
> > > dumping
> > > > > the
> > > > > > > > reason that a block ids is considered corrupt, but that
> wasn't
> > > > added
> > > > > > > until
> > > > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > > > > >
> > > > > > > >
> > > > > > > Good digging.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > I also had a look at running the "HFile" tool on the affected
> > > file
> > > > > (cf.
> > > > > > > > section 9.7.5.2.2 at
> > > > > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > > > > ).
> > > > > > > > When I did that, I was able to see the data up to the
> corrupted
> > > > block
> > > > > > as
> > > > > > > > far as I could tell, but then it started repeatedly looping
> > back
> > > to
> > > > > the
> > > > > > > > first row and starting over. I believe this is related to the
> > > > > behavior
> > > > > > > > described in
> https://issues.apache.org/jira/browse/HBASE-12949
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > So, your file is 3G and your blocks are 128M?
> > > > > > >
> > > > > > > The dfsclient should just pass over the bad replica and move on
> > to
> > > > the
> > > > > > good
> > > > > > > one so it would seem to indicate all replicas are bad for you.
> > > > > > >
> > > > > > > If you enable DFSClient DEBUG level logging it should report
> > which
> > > > > blocks
> > > > > > > it is reading from. For example, here I am reading the start of
> > the
> > > > > index
> > > > > > > blocks with DFSClient DEBUG enabled but I grep out the
> DFSClient
> > > > > > emissions
> > > > > > > only:
> > > > > > >
> > > > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > > > > > DFSClient
> > > > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > > > SLF4J: Found binding in
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > SLF4J: Found binding in
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings
> for
> > > an
> > > > > > > explanation.
> > > > > > > SLF4J: Actual binding is of type
> > > [org.slf4j.impl.Log4jLoggerFactory]
> > > > > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > > > > CacheConfig:disabled
> > > > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > > > LocatedBlocks{
> > > > > > >   fileLength=108633903
> > > > > > >   underConstruction=false
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > >   isLastBlockComplete=true}
> > > > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.27:50011
> > > > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.27:50011
> > > > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > > > LocatedBlocks{
> > > > > > >   fileLength=108633903
> > > > > > >   underConstruction=false
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > > >   isLastBlockComplete=true}
> > > > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.30:50011
> > > > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
> > to
> > > > > > datanode
> > > > > > > 10.20.84.27:50011
> > > > > > >
> > > > > > > Do you see it reading from 'good' or 'bad' blocks?
> > > > > > >
> > > > > > > I added this line to hbase log4j.properties to enable DFSClient
> > > > DEBUG:
> > > > > > >
> > > > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > > > > >
> > > > > > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > My goal is to determine whether the block in question is
> > actually
> > > > > > corrupt
> > > > > > > > and, if so, in what way.
> > > > > > >
> > > > > > >
> > > > > > > What happens if you just try to copy the file local or
> elsewhere
> > in
> > > > the
> > > > > > > filesystem using dfs shell. Do you get a pure dfs exception
> > > > unhampered
> > > > > by
> > > > > > > hbaseyness?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > If it's possible to recover all of the file except
> > > > > > > > a portion of the affected block, that would be OK too.
> > > > > > >
> > > > > > >
> > > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool.
> We
> > > need
> > > > > to
> > > > > > > add it so you can recover all but the bad block (we should
> figure
> > > how
> > > > > to
> > > > > > > skip the bad section also).
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > I just don't want to
> > > > > > > > be in the position of having to lose all 3 gigs of data in
> this
> > > > > > > particular
> > > > > > > > region, given that most of it appears to be intact. I just
> > can't
> > > > find
> > > > > > the
> > > > > > > > right low-level tools to let me determine the diagnose the
> > exact
> > > > > state
> > > > > > > and
> > > > > > > > structure of the block data I have for this file.
> > > > > > > >
> > > > > > > >
> > > > > > > Nod.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Any help or direction that someone could provide would be
> much
> > > > > > > appreciated.
> > > > > > > > For reference, I'll repeat that our client is running Hadoop
> > > > > > > 2.0.0-cdh4.6.0
> > > > > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > > > > >
> > > > > > > >
> > > > > > > See if any of the above helps. I'll try and dig up some more
> > tools
> > > in
> > > > > > > meantime.
> > > > > > > St.Ack
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > Thanks!
> > > > > > > >
> > > > > > > > -md
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >
> > > > > >    - Andy
> > > > > >
> > > > > > Problems worthy of attack prove their worth by hitting back. -
> Piet
> > > > Hein
> > > > > > (via Tom White)
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

So, it turns out that the client has an archived data source that can
recreate the HBase data in question if needed, so the need for me to
actually recover this HFile has diminished to the point where it's probably
not worth investing my time in creating a custom tool to extract the data.

Given that they're willing to lose the data in this region and recreate it
if necessary, do I simply need to delete the HFile to make HDFS happy or is
there something I need to do at the HBase level to tell it that data will
be going away?

Thanks so much everyone for your help on this issue!

-md

On Wed, Mar 18, 2015 at 10:46 PM, Jerry He <je...@gmail.com> wrote:

> From HBase perspective, since we don't have a ready tool, the general idea
> will need you to have access to HBase source code and write your own tool.
> On the high level, the tool will read/scan the KVs from the hfile similar
> to what the HFile tool does, while opening a HFileWriter to dump the good
> data until you are not able to do so.
> Then you will close the HFileWriter with the necessary meta file info.
> There are APIs in HBase to do so, but they may not be external public API.
>
> Jerry
>
> On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mi...@synctree.com>
> wrote:
>
> > I've had a chance to try out Stack's passed along suggestion of
> > HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get
> this:
> > https://gist.github.com/md5/d42e97ab7a0bd656f09a
> >
> > After knowing what to look for, I was able to find the same checksum
> > failures in the logs during the major compaction failures.
> >
> > I'm willing to accept that all the data after that point in the corrupt
> > block is lost, so any specific advice for how to replace that block with
> a
> > partial one containing only the good data would be appreciated. I'm aware
> > that there may be other checksum failures in the subsequent blocks as
> well,
> > since nothing is currently able to read past the first corruption point,
> > but I'll just have to wash, rinse, and repeat to see how much good data
> is
> > left is the file as a whole.
> >
> > -md
> >
> > On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com> wrote:
> >
> > > For a 'fix' and 'recover' hfile tool at HBase level,  the relatively
> easy
> > > thing we can recover is probably the data (KVs) up to the point when we
> > hit
> > > the first corruption caused exception.
> > > After that, it will not be as easy.  For example, if the current key
> > length
> > > or value length is bad, there is no way to skip to the next KV.  We
> will
> > > probably need to skip the whole current hblock, and go to the next
> block
> > > for KVs assuming the hblock index is still good.
> > >
> > > HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does
> an
> > > incremental improvement to make sure we do get a corruption caused
> > > exception so that the scan/read will not go into an infinite loop.
> > >
> > > Jerry
> > >
> > > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <
> mike.dillon@synctree.com>
> > > wrote:
> > >
> > > > I haven't filed one myself, but I can do so if my investigation ends
> up
> > > > finding something bug-worthy as opposed to just random failures due
> to
> > > > out-of-disk scenarios.
> > > >
> > > > Unfortunately, I had to prioritize some other work this morning, so I
> > > > haven't made it back to the bad node yet.
> > > >
> > > > I did attempt restarting the datanode to see if I could make hadoop
> > fsck
> > > > happy, but that didn't have any noticeable effect. I'm hoping to have
> > > more
> > > > time this afternoon to investigate the other suggestions from this
> > > thread.
> > > >
> > > > -md
> > > >
> > > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <
> apurtell@apache.org>
> > > > wrote:
> > > >
> > > > > 
> > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > > > >
> > > > > > > If it's possible to recover all of the file except
> > > > > > > a portion of the affected block, that would be OK too.
> > > > > >
> > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool. We
> > need
> > > > to
> > > > > > add it so you can recover all but the bad block (we should figure
> > how
> > > > to
> > > > > > skip the bad section also).
> > > > >
> > > > >
> > > > > I was just getting caught up on this thread and had the same
> > thought.
> > > Is
> > > > > there an issue filed for this?
> > > > >
> > > > >
> > > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > > mike.dillon@synctree.com
> > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi all-
> > > > > > >
> > > > > > > I've got an HFile that's reporting a corrupt block in "hadoop
> > fsck"
> > > > and
> > > > > > was
> > > > > > > hoping to get some advice on recovering as much data as
> possible.
> > > > > > >
> > > > > > > When I examined the blk-* file on the three data nodes that
> have
> > a
> > > > > > replica
> > > > > > > of the affected block, I saw that the replicas on two of the
> > > > datanodes
> > > > > > had
> > > > > > > the same SHA-1 checksum and that the replica on the other
> > datanode
> > > > was
> > > > > a
> > > > > > > truncated version of the replica found on the other nodes (as
> > > > reported
> > > > > > by a
> > > > > > > difference at EOF by "cmp"). The size of the two identical
> blocks
> > > is
> > > > > > > 67108864, the same as most of the other blocks in the file.
> > > > > > >
> > > > > > > Given that there were two datanodes with the same data and
> > another
> > > > with
> > > > > > > truncated data, I made a backup of the truncated file and
> dropped
> > > the
> > > > > > > full-length copy of the block in its place directly on the data
> > > > mount,
> > > > > > > hoping that this would cause HDFS to no longer report the file
> as
> > > > > > corrupt.
> > > > > > > Unfortunately, this didn't seem to have any effect.
> > > > > > >
> > > > > > >
> > > > > > That seems like a reasonable thing to do.
> > > > > >
> > > > > > Did you restart the DN that was serving this block before you ran
> > > fsck?
> > > > > > (Fsck asks namenode what blocks are bad; it likely is still
> > reporting
> > > > off
> > > > > > old info).
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Looking through the Hadoop source code, it looks like there is
> a
> > > > > > > CorruptReplicasMap internally that tracks which nodes have
> > > "corrupt"
> > > > > > copies
> > > > > > > of a block. In HDFS-6663 <
> > > > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > > > >,
> > > > > > > a "-blockId" parameter was added to "hadoop fsck" to allow
> > dumping
> > > > the
> > > > > > > reason that a block ids is considered corrupt, but that wasn't
> > > added
> > > > > > until
> > > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > > > >
> > > > > > >
> > > > > > Good digging.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > I also had a look at running the "HFile" tool on the affected
> > file
> > > > (cf.
> > > > > > > section 9.7.5.2.2 at
> > > > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > > > ).
> > > > > > > When I did that, I was able to see the data up to the corrupted
> > > block
> > > > > as
> > > > > > > far as I could tell, but then it started repeatedly looping
> back
> > to
> > > > the
> > > > > > > first row and starting over. I believe this is related to the
> > > > behavior
> > > > > > > described in https://issues.apache.org/jira/browse/HBASE-12949
> > > > > >
> > > > > >
> > > > > >
> > > > > > So, your file is 3G and your blocks are 128M?
> > > > > >
> > > > > > The dfsclient should just pass over the bad replica and move on
> to
> > > the
> > > > > good
> > > > > > one so it would seem to indicate all replicas are bad for you.
> > > > > >
> > > > > > If you enable DFSClient DEBUG level logging it should report
> which
> > > > blocks
> > > > > > it is reading from. For example, here I am reading the start of
> the
> > > > index
> > > > > > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> > > > > emissions
> > > > > > only:
> > > > > >
> > > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > > > > DFSClient
> > > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > > SLF4J: Found binding in
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > SLF4J: Found binding in
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
> > an
> > > > > > explanation.
> > > > > > SLF4J: Actual binding is of type
> > [org.slf4j.impl.Log4jLoggerFactory]
> > > > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > > > CacheConfig:disabled
> > > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > > LocatedBlocks{
> > > > > >   fileLength=108633903
> > > > > >   underConstruction=false
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > >   isLastBlockComplete=true}
> > > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting
> to
> > > > > datanode
> > > > > > 10.20.84.27:50011
> > > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting
> to
> > > > > datanode
> > > > > > 10.20.84.27:50011
> > > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > > LocatedBlocks{
> > > > > >   fileLength=108633903
> > > > > >   underConstruction=false
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > > >   isLastBlockComplete=true}
> > > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting
> to
> > > > > datanode
> > > > > > 10.20.84.30:50011
> > > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting
> to
> > > > > datanode
> > > > > > 10.20.84.27:50011
> > > > > >
> > > > > > Do you see it reading from 'good' or 'bad' blocks?
> > > > > >
> > > > > > I added this line to hbase log4j.properties to enable DFSClient
> > > DEBUG:
> > > > > >
> > > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > > > >
> > > > > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > My goal is to determine whether the block in question is
> actually
> > > > > corrupt
> > > > > > > and, if so, in what way.
> > > > > >
> > > > > >
> > > > > > What happens if you just try to copy the file local or elsewhere
> in
> > > the
> > > > > > filesystem using dfs shell. Do you get a pure dfs exception
> > > unhampered
> > > > by
> > > > > > hbaseyness?
> > > > > >
> > > > > >
> > > > > >
> > > > > > > If it's possible to recover all of the file except
> > > > > > > a portion of the affected block, that would be OK too.
> > > > > >
> > > > > >
> > > > > > I actually do not see a 'fix' or 'recover' on the hfile tool. We
> > need
> > > > to
> > > > > > add it so you can recover all but the bad block (we should figure
> > how
> > > > to
> > > > > > skip the bad section also).
> > > > > >
> > > > > >
> > > > > >
> > > > > > > I just don't want to
> > > > > > > be in the position of having to lose all 3 gigs of data in this
> > > > > > particular
> > > > > > > region, given that most of it appears to be intact. I just
> can't
> > > find
> > > > > the
> > > > > > > right low-level tools to let me determine the diagnose the
> exact
> > > > state
> > > > > > and
> > > > > > > structure of the block data I have for this file.
> > > > > > >
> > > > > > >
> > > > > > Nod.
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Any help or direction that someone could provide would be much
> > > > > > appreciated.
> > > > > > > For reference, I'll repeat that our client is running Hadoop
> > > > > > 2.0.0-cdh4.6.0
> > > > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > > > >
> > > > > > >
> > > > > > See if any of the above helps. I'll try and dig up some more
> tools
> > in
> > > > > > meantime.
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > > > Thanks!
> > > > > > >
> > > > > > > -md
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >
> > > > >    - Andy
> > > > >
> > > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > > Hein
> > > > > (via Tom White)
> > > > >
> > > >
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Jerry He <je...@gmail.com>.

>From HBase perspective, since we don't have a ready tool, the general idea
will need you to have access to HBase source code and write your own tool.
On the high level, the tool will read/scan the KVs from the hfile similar
to what the HFile tool does, while opening a HFileWriter to dump the good
data until you are not able to do so.
Then you will close the HFileWriter with the necessary meta file info.
There are APIs in HBase to do so, but they may not be external public API.

Jerry

On Wed, Mar 18, 2015 at 4:27 PM, Mike Dillon <mi...@synctree.com>
wrote:

> I've had a chance to try out Stack's passed along suggestion of
> HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get this:
> https://gist.github.com/md5/d42e97ab7a0bd656f09a
>
> After knowing what to look for, I was able to find the same checksum
> failures in the logs during the major compaction failures.
>
> I'm willing to accept that all the data after that point in the corrupt
> block is lost, so any specific advice for how to replace that block with a
> partial one containing only the good data would be appreciated. I'm aware
> that there may be other checksum failures in the subsequent blocks as well,
> since nothing is currently able to read past the first corruption point,
> but I'll just have to wash, rinse, and repeat to see how much good data is
> left is the file as a whole.
>
> -md
>
> On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com> wrote:
>
> > For a 'fix' and 'recover' hfile tool at HBase level,  the relatively easy
> > thing we can recover is probably the data (KVs) up to the point when we
> hit
> > the first corruption caused exception.
> > After that, it will not be as easy.  For example, if the current key
> length
> > or value length is bad, there is no way to skip to the next KV.  We will
> > probably need to skip the whole current hblock, and go to the next block
> > for KVs assuming the hblock index is still good.
> >
> > HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does an
> > incremental improvement to make sure we do get a corruption caused
> > exception so that the scan/read will not go into an infinite loop.
> >
> > Jerry
> >
> > On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <mi...@synctree.com>
> > wrote:
> >
> > > I haven't filed one myself, but I can do so if my investigation ends up
> > > finding something bug-worthy as opposed to just random failures due to
> > > out-of-disk scenarios.
> > >
> > > Unfortunately, I had to prioritize some other work this morning, so I
> > > haven't made it back to the bad node yet.
> > >
> > > I did attempt restarting the datanode to see if I could make hadoop
> fsck
> > > happy, but that didn't have any noticeable effect. I'm hoping to have
> > more
> > > time this afternoon to investigate the other suggestions from this
> > thread.
> > >
> > > -md
> > >
> > > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <ap...@apache.org>
> > > wrote:
> > >
> > > > 
> > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > > >
> > > > > > If it's possible to recover all of the file except
> > > > > > a portion of the affected block, that would be OK too.
> > > > >
> > > > > I actually do not see a 'fix' or 'recover' on the hfile tool. We
> need
> > > to
> > > > > add it so you can recover all but the bad block (we should figure
> how
> > > to
> > > > > skip the bad section also).
> > > >
> > > >
> > > > I was just getting caught up on this thread and had the same
> thought.
> > Is
> > > > there an issue filed for this?
> > > >
> > > >
> > > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> > mike.dillon@synctree.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > Hi all-
> > > > > >
> > > > > > I've got an HFile that's reporting a corrupt block in "hadoop
> fsck"
> > > and
> > > > > was
> > > > > > hoping to get some advice on recovering as much data as possible.
> > > > > >
> > > > > > When I examined the blk-* file on the three data nodes that have
> a
> > > > > replica
> > > > > > of the affected block, I saw that the replicas on two of the
> > > datanodes
> > > > > had
> > > > > > the same SHA-1 checksum and that the replica on the other
> datanode
> > > was
> > > > a
> > > > > > truncated version of the replica found on the other nodes (as
> > > reported
> > > > > by a
> > > > > > difference at EOF by "cmp"). The size of the two identical blocks
> > is
> > > > > > 67108864, the same as most of the other blocks in the file.
> > > > > >
> > > > > > Given that there were two datanodes with the same data and
> another
> > > with
> > > > > > truncated data, I made a backup of the truncated file and dropped
> > the
> > > > > > full-length copy of the block in its place directly on the data
> > > mount,
> > > > > > hoping that this would cause HDFS to no longer report the file as
> > > > > corrupt.
> > > > > > Unfortunately, this didn't seem to have any effect.
> > > > > >
> > > > > >
> > > > > That seems like a reasonable thing to do.
> > > > >
> > > > > Did you restart the DN that was serving this block before you ran
> > fsck?
> > > > > (Fsck asks namenode what blocks are bad; it likely is still
> reporting
> > > off
> > > > > old info).
> > > > >
> > > > >
> > > > >
> > > > > > Looking through the Hadoop source code, it looks like there is a
> > > > > > CorruptReplicasMap internally that tracks which nodes have
> > "corrupt"
> > > > > copies
> > > > > > of a block. In HDFS-6663 <
> > > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > > >,
> > > > > > a "-blockId" parameter was added to "hadoop fsck" to allow
> dumping
> > > the
> > > > > > reason that a block ids is considered corrupt, but that wasn't
> > added
> > > > > until
> > > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > > >
> > > > > >
> > > > > Good digging.
> > > > >
> > > > >
> > > > >
> > > > > > I also had a look at running the "HFile" tool on the affected
> file
> > > (cf.
> > > > > > section 9.7.5.2.2 at
> > > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > > ).
> > > > > > When I did that, I was able to see the data up to the corrupted
> > block
> > > > as
> > > > > > far as I could tell, but then it started repeatedly looping back
> to
> > > the
> > > > > > first row and starting over. I believe this is related to the
> > > behavior
> > > > > > described in https://issues.apache.org/jira/browse/HBASE-12949
> > > > >
> > > > >
> > > > >
> > > > > So, your file is 3G and your blocks are 128M?
> > > > >
> > > > > The dfsclient should just pass over the bad replica and move on to
> > the
> > > > good
> > > > > one so it would seem to indicate all replicas are bad for you.
> > > > >
> > > > > If you enable DFSClient DEBUG level logging it should report which
> > > blocks
> > > > > it is reading from. For example, here I am reading the start of the
> > > index
> > > > > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> > > > emissions
> > > > > only:
> > > > >
> > > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > > >
> > > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > > > DFSClient
> > > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > > SLF4J: Found binding in
> > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > SLF4J: Found binding in
> > > > >
> > > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for
> an
> > > > > explanation.
> > > > > SLF4J: Actual binding is of type
> [org.slf4j.impl.Log4jLoggerFactory]
> > > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > > CacheConfig:disabled
> > > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > LocatedBlocks{
> > > > >   fileLength=108633903
> > > > >   underConstruction=false
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > >   isLastBlockComplete=true}
> > > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> > > > datanode
> > > > > 10.20.84.27:50011
> > > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> > > > datanode
> > > > > 10.20.84.27:50011
> > > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > > LocatedBlocks{
> > > > >   fileLength=108633903
> > > > >   underConstruction=false
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > > >   isLastBlockComplete=true}
> > > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> > > > datanode
> > > > > 10.20.84.30:50011
> > > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> > > > datanode
> > > > > 10.20.84.27:50011
> > > > >
> > > > > Do you see it reading from 'good' or 'bad' blocks?
> > > > >
> > > > > I added this line to hbase log4j.properties to enable DFSClient
> > DEBUG:
> > > > >
> > > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > > >
> > > > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > > > >
> > > > >
> > > > >
> > > > > > My goal is to determine whether the block in question is actually
> > > > corrupt
> > > > > > and, if so, in what way.
> > > > >
> > > > >
> > > > > What happens if you just try to copy the file local or elsewhere in
> > the
> > > > > filesystem using dfs shell. Do you get a pure dfs exception
> > unhampered
> > > by
> > > > > hbaseyness?
> > > > >
> > > > >
> > > > >
> > > > > > If it's possible to recover all of the file except
> > > > > > a portion of the affected block, that would be OK too.
> > > > >
> > > > >
> > > > > I actually do not see a 'fix' or 'recover' on the hfile tool. We
> need
> > > to
> > > > > add it so you can recover all but the bad block (we should figure
> how
> > > to
> > > > > skip the bad section also).
> > > > >
> > > > >
> > > > >
> > > > > > I just don't want to
> > > > > > be in the position of having to lose all 3 gigs of data in this
> > > > > particular
> > > > > > region, given that most of it appears to be intact. I just can't
> > find
> > > > the
> > > > > > right low-level tools to let me determine the diagnose the exact
> > > state
> > > > > and
> > > > > > structure of the block data I have for this file.
> > > > > >
> > > > > >
> > > > > Nod.
> > > > >
> > > > >
> > > > >
> > > > > > Any help or direction that someone could provide would be much
> > > > > appreciated.
> > > > > > For reference, I'll repeat that our client is running Hadoop
> > > > > 2.0.0-cdh4.6.0
> > > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > > >
> > > > > >
> > > > > See if any of the above helps. I'll try and dig up some more tools
> in
> > > > > meantime.
> > > > > St.Ack
> > > > >
> > > > >
> > > > >
> > > > > > Thanks!
> > > > > >
> > > > > > -md
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >
> > > >    - Andy
> > > >
> > > > Problems worthy of attack prove their worth by hitting back. - Piet
> > Hein
> > > > (via Tom White)
> > > >
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

I've had a chance to try out Stack's passed along suggestion of
HADOOP_ROOT_LOGGER="TRACE,console"  hdfs dfs -cat and managed to get this:
https://gist.github.com/md5/d42e97ab7a0bd656f09a

After knowing what to look for, I was able to find the same checksum
failures in the logs during the major compaction failures.

I'm willing to accept that all the data after that point in the corrupt
block is lost, so any specific advice for how to replace that block with a
partial one containing only the good data would be appreciated. I'm aware
that there may be other checksum failures in the subsequent blocks as well,
since nothing is currently able to read past the first corruption point,
but I'll just have to wash, rinse, and repeat to see how much good data is
left is the file as a whole.

-md

On Wed, Mar 18, 2015 at 2:41 PM, Jerry He <je...@gmail.com> wrote:

> For a 'fix' and 'recover' hfile tool at HBase level,  the relatively easy
> thing we can recover is probably the data (KVs) up to the point when we hit
> the first corruption caused exception.
> After that, it will not be as easy.  For example, if the current key length
> or value length is bad, there is no way to skip to the next KV.  We will
> probably need to skip the whole current hblock, and go to the next block
> for KVs assuming the hblock index is still good.
>
> HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does an
> incremental improvement to make sure we do get a corruption caused
> exception so that the scan/read will not go into an infinite loop.
>
> Jerry
>
> On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <mi...@synctree.com>
> wrote:
>
> > I haven't filed one myself, but I can do so if my investigation ends up
> > finding something bug-worthy as opposed to just random failures due to
> > out-of-disk scenarios.
> >
> > Unfortunately, I had to prioritize some other work this morning, so I
> > haven't made it back to the bad node yet.
> >
> > I did attempt restarting the datanode to see if I could make hadoop fsck
> > happy, but that didn't have any noticeable effect. I'm hoping to have
> more
> > time this afternoon to investigate the other suggestions from this
> thread.
> >
> > -md
> >
> > On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <ap...@apache.org>
> > wrote:
> >
> > > 
> > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > > >
> > > > > If it's possible to recover all of the file except
> > > > > a portion of the affected block, that would be OK too.
> > > >
> > > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need
> > to
> > > > add it so you can recover all but the bad block (we should figure how
> > to
> > > > skip the bad section also).
> > >
> > >
> > > I was just getting caught up on this thread and had the same thought.
> Is
> > > there an issue filed for this?
> > >
> > >
> > > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <
> mike.dillon@synctree.com
> > >
> > > > wrote:
> > > >
> > > > > Hi all-
> > > > >
> > > > > I've got an HFile that's reporting a corrupt block in "hadoop fsck"
> > and
> > > > was
> > > > > hoping to get some advice on recovering as much data as possible.
> > > > >
> > > > > When I examined the blk-* file on the three data nodes that have a
> > > > replica
> > > > > of the affected block, I saw that the replicas on two of the
> > datanodes
> > > > had
> > > > > the same SHA-1 checksum and that the replica on the other datanode
> > was
> > > a
> > > > > truncated version of the replica found on the other nodes (as
> > reported
> > > > by a
> > > > > difference at EOF by "cmp"). The size of the two identical blocks
> is
> > > > > 67108864, the same as most of the other blocks in the file.
> > > > >
> > > > > Given that there were two datanodes with the same data and another
> > with
> > > > > truncated data, I made a backup of the truncated file and dropped
> the
> > > > > full-length copy of the block in its place directly on the data
> > mount,
> > > > > hoping that this would cause HDFS to no longer report the file as
> > > > corrupt.
> > > > > Unfortunately, this didn't seem to have any effect.
> > > > >
> > > > >
> > > > That seems like a reasonable thing to do.
> > > >
> > > > Did you restart the DN that was serving this block before you ran
> fsck?
> > > > (Fsck asks namenode what blocks are bad; it likely is still reporting
> > off
> > > > old info).
> > > >
> > > >
> > > >
> > > > > Looking through the Hadoop source code, it looks like there is a
> > > > > CorruptReplicasMap internally that tracks which nodes have
> "corrupt"
> > > > copies
> > > > > of a block. In HDFS-6663 <
> > > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > > >,
> > > > > a "-blockId" parameter was added to "hadoop fsck" to allow dumping
> > the
> > > > > reason that a block ids is considered corrupt, but that wasn't
> added
> > > > until
> > > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > > >
> > > > >
> > > > Good digging.
> > > >
> > > >
> > > >
> > > > > I also had a look at running the "HFile" tool on the affected file
> > (cf.
> > > > > section 9.7.5.2.2 at
> > > http://hbase.apache.org/0.94/book/regions.arch.html
> > > > ).
> > > > > When I did that, I was able to see the data up to the corrupted
> block
> > > as
> > > > > far as I could tell, but then it started repeatedly looping back to
> > the
> > > > > first row and starting over. I believe this is related to the
> > behavior
> > > > > described in https://issues.apache.org/jira/browse/HBASE-12949
> > > >
> > > >
> > > >
> > > > So, your file is 3G and your blocks are 128M?
> > > >
> > > > The dfsclient should just pass over the bad replica and move on to
> the
> > > good
> > > > one so it would seem to indicate all replicas are bad for you.
> > > >
> > > > If you enable DFSClient DEBUG level logging it should report which
> > blocks
> > > > it is reading from. For example, here I am reading the start of the
> > index
> > > > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> > > emissions
> > > > only:
> > > >
> > > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > > >
> > > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > > DFSClient
> > > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > > org.apache.hadoop.util.PureJavaCrc32 available
> > > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > > org.apache.hadoop.util.PureJavaCrc32C available
> > > > SLF4J: Class path contains multiple SLF4J bindings.
> > > > SLF4J: Found binding in
> > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > SLF4J: Found binding in
> > > >
> > > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > > explanation.
> > > > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > > CacheConfig:disabled
> > > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > LocatedBlocks{
> > > >   fileLength=108633903
> > > >   underConstruction=false
> > > >
> > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > >
> > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > >   isLastBlockComplete=true}
> > > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> > > datanode
> > > > 10.20.84.27:50011
> > > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> > > datanode
> > > > 10.20.84.27:50011
> > > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > > LocatedBlocks{
> > > >   fileLength=108633903
> > > >   underConstruction=false
> > > >
> > > >
> > > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > > >
> > > >
> > > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > > >   isLastBlockComplete=true}
> > > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> > > datanode
> > > > 10.20.84.30:50011
> > > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> > > datanode
> > > > 10.20.84.27:50011
> > > >
> > > > Do you see it reading from 'good' or 'bad' blocks?
> > > >
> > > > I added this line to hbase log4j.properties to enable DFSClient
> DEBUG:
> > > >
> > > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > > >
> > > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > > >
> > > >
> > > >
> > > > > My goal is to determine whether the block in question is actually
> > > corrupt
> > > > > and, if so, in what way.
> > > >
> > > >
> > > > What happens if you just try to copy the file local or elsewhere in
> the
> > > > filesystem using dfs shell. Do you get a pure dfs exception
> unhampered
> > by
> > > > hbaseyness?
> > > >
> > > >
> > > >
> > > > > If it's possible to recover all of the file except
> > > > > a portion of the affected block, that would be OK too.
> > > >
> > > >
> > > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need
> > to
> > > > add it so you can recover all but the bad block (we should figure how
> > to
> > > > skip the bad section also).
> > > >
> > > >
> > > >
> > > > > I just don't want to
> > > > > be in the position of having to lose all 3 gigs of data in this
> > > > particular
> > > > > region, given that most of it appears to be intact. I just can't
> find
> > > the
> > > > > right low-level tools to let me determine the diagnose the exact
> > state
> > > > and
> > > > > structure of the block data I have for this file.
> > > > >
> > > > >
> > > > Nod.
> > > >
> > > >
> > > >
> > > > > Any help or direction that someone could provide would be much
> > > > appreciated.
> > > > > For reference, I'll repeat that our client is running Hadoop
> > > > 2.0.0-cdh4.6.0
> > > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > > >
> > > > >
> > > > See if any of the above helps. I'll try and dig up some more tools in
> > > > meantime.
> > > > St.Ack
> > > >
> > > >
> > > >
> > > > > Thanks!
> > > > >
> > > > > -md
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >
> > >    - Andy
> > >
> > > Problems worthy of attack prove their worth by hitting back. - Piet
> Hein
> > > (via Tom White)
> > >
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Jerry He <je...@gmail.com>.

For a 'fix' and 'recover' hfile tool at HBase level,  the relatively easy
thing we can recover is probably the data (KVs) up to the point when we hit
the first corruption caused exception.
After that, it will not be as easy.  For example, if the current key length
or value length is bad, there is no way to skip to the next KV.  We will
probably need to skip the whole current hblock, and go to the next block
for KVs assuming the hblock index is still good.

HBASE-12949 <https://issues.apache.org/jira/browse/HBASE-12949> does an
incremental improvement to make sure we do get a corruption caused
exception so that the scan/read will not go into an infinite loop.

Jerry

On Wed, Mar 18, 2015 at 12:03 PM, Mike Dillon <mi...@synctree.com>
wrote:

> I haven't filed one myself, but I can do so if my investigation ends up
> finding something bug-worthy as opposed to just random failures due to
> out-of-disk scenarios.
>
> Unfortunately, I had to prioritize some other work this morning, so I
> haven't made it back to the bad node yet.
>
> I did attempt restarting the datanode to see if I could make hadoop fsck
> happy, but that didn't have any noticeable effect. I'm hoping to have more
> time this afternoon to investigate the other suggestions from this thread.
>
> -md
>
> On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <ap...@apache.org>
> wrote:
>
> > 
> > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> > >
> > > > If it's possible to recover all of the file except
> > > > a portion of the affected block, that would be OK too.
> > >
> > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need
> to
> > > add it so you can recover all but the bad block (we should figure how
> to
> > > skip the bad section also).
> >
> >
> > I was just getting caught up on this thread and had the same thought. Is
> > there an issue filed for this?
> >
> >
> > On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> >
> > > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mike.dillon@synctree.com
> >
> > > wrote:
> > >
> > > > Hi all-
> > > >
> > > > I've got an HFile that's reporting a corrupt block in "hadoop fsck"
> and
> > > was
> > > > hoping to get some advice on recovering as much data as possible.
> > > >
> > > > When I examined the blk-* file on the three data nodes that have a
> > > replica
> > > > of the affected block, I saw that the replicas on two of the
> datanodes
> > > had
> > > > the same SHA-1 checksum and that the replica on the other datanode
> was
> > a
> > > > truncated version of the replica found on the other nodes (as
> reported
> > > by a
> > > > difference at EOF by "cmp"). The size of the two identical blocks is
> > > > 67108864, the same as most of the other blocks in the file.
> > > >
> > > > Given that there were two datanodes with the same data and another
> with
> > > > truncated data, I made a backup of the truncated file and dropped the
> > > > full-length copy of the block in its place directly on the data
> mount,
> > > > hoping that this would cause HDFS to no longer report the file as
> > > corrupt.
> > > > Unfortunately, this didn't seem to have any effect.
> > > >
> > > >
> > > That seems like a reasonable thing to do.
> > >
> > > Did you restart the DN that was serving this block before you ran fsck?
> > > (Fsck asks namenode what blocks are bad; it likely is still reporting
> off
> > > old info).
> > >
> > >
> > >
> > > > Looking through the Hadoop source code, it looks like there is a
> > > > CorruptReplicasMap internally that tracks which nodes have "corrupt"
> > > copies
> > > > of a block. In HDFS-6663 <
> > > https://issues.apache.org/jira/browse/HDFS-6663
> > > > >,
> > > > a "-blockId" parameter was added to "hadoop fsck" to allow dumping
> the
> > > > reason that a block ids is considered corrupt, but that wasn't added
> > > until
> > > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > > >
> > > >
> > > Good digging.
> > >
> > >
> > >
> > > > I also had a look at running the "HFile" tool on the affected file
> (cf.
> > > > section 9.7.5.2.2 at
> > http://hbase.apache.org/0.94/book/regions.arch.html
> > > ).
> > > > When I did that, I was able to see the data up to the corrupted block
> > as
> > > > far as I could tell, but then it started repeatedly looping back to
> the
> > > > first row and starting over. I believe this is related to the
> behavior
> > > > described in https://issues.apache.org/jira/browse/HBASE-12949
> > >
> > >
> > >
> > > So, your file is 3G and your blocks are 128M?
> > >
> > > The dfsclient should just pass over the bad replica and move on to the
> > good
> > > one so it would seem to indicate all replicas are bad for you.
> > >
> > > If you enable DFSClient DEBUG level logging it should report which
> blocks
> > > it is reading from. For example, here I am reading the start of the
> index
> > > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> > emissions
> > > only:
> > >
> > > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> > >
> > >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > > DFSClient
> > > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > > org.apache.hadoop.util.PureJavaCrc32 available
> > > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > > org.apache.hadoop.util.PureJavaCrc32C available
> > > SLF4J: Class path contains multiple SLF4J bindings.
> > > SLF4J: Found binding in
> > >
> > >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > SLF4J: Found binding in
> > >
> > >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > > explanation.
> > > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > > CacheConfig:disabled
> > > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > > LocatedBlocks{
> > >   fileLength=108633903
> > >   underConstruction=false
> > >
> > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > >
> > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > >   isLastBlockComplete=true}
> > > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.27:50011
> > > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.27:50011
> > > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > > LocatedBlocks{
> > >   fileLength=108633903
> > >   underConstruction=false
> > >
> > >
> > >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> > >
> > >
> > >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > > getBlockSize()=108633903; corrupt=false; offset=0;
> > > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > > DatanodeInfoWithStorage[10.20.84.31:50011
> > > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > > DatanodeInfoWithStorage[10.20.84.30:50011
> > > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> > >   isLastBlockComplete=true}
> > > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.30:50011
> > > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> > datanode
> > > 10.20.84.27:50011
> > >
> > > Do you see it reading from 'good' or 'bad' blocks?
> > >
> > > I added this line to hbase log4j.properties to enable DFSClient DEBUG:
> > >
> > > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> > >
> > > On HBASE-12949, what exception is coming up?  Dump it in here.
> > >
> > >
> > >
> > > > My goal is to determine whether the block in question is actually
> > corrupt
> > > > and, if so, in what way.
> > >
> > >
> > > What happens if you just try to copy the file local or elsewhere in the
> > > filesystem using dfs shell. Do you get a pure dfs exception unhampered
> by
> > > hbaseyness?
> > >
> > >
> > >
> > > > If it's possible to recover all of the file except
> > > > a portion of the affected block, that would be OK too.
> > >
> > >
> > > I actually do not see a 'fix' or 'recover' on the hfile tool. We need
> to
> > > add it so you can recover all but the bad block (we should figure how
> to
> > > skip the bad section also).
> > >
> > >
> > >
> > > > I just don't want to
> > > > be in the position of having to lose all 3 gigs of data in this
> > > particular
> > > > region, given that most of it appears to be intact. I just can't find
> > the
> > > > right low-level tools to let me determine the diagnose the exact
> state
> > > and
> > > > structure of the block data I have for this file.
> > > >
> > > >
> > > Nod.
> > >
> > >
> > >
> > > > Any help or direction that someone could provide would be much
> > > appreciated.
> > > > For reference, I'll repeat that our client is running Hadoop
> > > 2.0.0-cdh4.6.0
> > > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > > >
> > > >
> > > See if any of the above helps. I'll try and dig up some more tools in
> > > meantime.
> > > St.Ack
> > >
> > >
> > >
> > > > Thanks!
> > > >
> > > > -md
> > > >
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> >    - Andy
> >
> > Problems worthy of attack prove their worth by hitting back. - Piet Hein
> > (via Tom White)
> >
>

Re: Recovering from corrupt blocks in HFile

Posted by Mike Dillon <mi...@synctree.com>.

I haven't filed one myself, but I can do so if my investigation ends up
finding something bug-worthy as opposed to just random failures due to
out-of-disk scenarios.

Unfortunately, I had to prioritize some other work this morning, so I
haven't made it back to the bad node yet.

I did attempt restarting the datanode to see if I could make hadoop fsck
happy, but that didn't have any noticeable effect. I'm hoping to have more
time this afternoon to investigate the other suggestions from this thread.

-md

On Wed, Mar 18, 2015 at 11:41 AM, Andrew Purtell <ap...@apache.org>
wrote:

> 
> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
> >
> > > If it's possible to recover all of the file except
> > > a portion of the affected block, that would be OK too.
> >
> > I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> > add it so you can recover all but the bad block (we should figure how to
> > skip the bad section also).
>
>
> I was just getting caught up on this thread and had the same thought. Is
> there an issue filed for this?
>
>
> On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>
> > On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mi...@synctree.com>
> > wrote:
> >
> > > Hi all-
> > >
> > > I've got an HFile that's reporting a corrupt block in "hadoop fsck" and
> > was
> > > hoping to get some advice on recovering as much data as possible.
> > >
> > > When I examined the blk-* file on the three data nodes that have a
> > replica
> > > of the affected block, I saw that the replicas on two of the datanodes
> > had
> > > the same SHA-1 checksum and that the replica on the other datanode was
> a
> > > truncated version of the replica found on the other nodes (as reported
> > by a
> > > difference at EOF by "cmp"). The size of the two identical blocks is
> > > 67108864, the same as most of the other blocks in the file.
> > >
> > > Given that there were two datanodes with the same data and another with
> > > truncated data, I made a backup of the truncated file and dropped the
> > > full-length copy of the block in its place directly on the data mount,
> > > hoping that this would cause HDFS to no longer report the file as
> > corrupt.
> > > Unfortunately, this didn't seem to have any effect.
> > >
> > >
> > That seems like a reasonable thing to do.
> >
> > Did you restart the DN that was serving this block before you ran fsck?
> > (Fsck asks namenode what blocks are bad; it likely is still reporting off
> > old info).
> >
> >
> >
> > > Looking through the Hadoop source code, it looks like there is a
> > > CorruptReplicasMap internally that tracks which nodes have "corrupt"
> > copies
> > > of a block. In HDFS-6663 <
> > https://issues.apache.org/jira/browse/HDFS-6663
> > > >,
> > > a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
> > > reason that a block ids is considered corrupt, but that wasn't added
> > until
> > > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> > >
> > >
> > Good digging.
> >
> >
> >
> > > I also had a look at running the "HFile" tool on the affected file (cf.
> > > section 9.7.5.2.2 at
> http://hbase.apache.org/0.94/book/regions.arch.html
> > ).
> > > When I did that, I was able to see the data up to the corrupted block
> as
> > > far as I could tell, but then it started repeatedly looping back to the
> > > first row and starting over. I believe this is related to the behavior
> > > described in https://issues.apache.org/jira/browse/HBASE-12949
> >
> >
> >
> > So, your file is 3G and your blocks are 128M?
> >
> > The dfsclient should just pass over the bad replica and move on to the
> good
> > one so it would seem to indicate all replicas are bad for you.
> >
> > If you enable DFSClient DEBUG level logging it should report which blocks
> > it is reading from. For example, here I am reading the start of the index
> > blocks with DFSClient DEBUG enabled but I grep out the DFSClient
> emissions
> > only:
> >
> > [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> > org.apache.hadoop.hbase.io.hfile.HFile -h -f
> >
> >
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> > DFSClient
> > 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> > org.apache.hadoop.util.PureJavaCrc32 available
> > 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> > org.apache.hadoop.util.PureJavaCrc32C available
> > SLF4J: Class path contains multiple SLF4J bindings.
> > SLF4J: Found binding in
> >
> >
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: Found binding in
> >
> >
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> > SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> > explanation.
> > SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> > 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> > CacheConfig:disabled
> > 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> > LocatedBlocks{
> >   fileLength=108633903
> >   underConstruction=false
> >
> >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >
> >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >   isLastBlockComplete=true}
> > 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode
> > 10.20.84.27:50011
> > 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode
> > 10.20.84.27:50011
> > 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> > LocatedBlocks{
> >   fileLength=108633903
> >   underConstruction=false
> >
> >
> >
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
> >
> >
> >
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> > getBlockSize()=108633903; corrupt=false; offset=0;
> > locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> > ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> > DatanodeInfoWithStorage[10.20.84.31:50011
> > ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> > DatanodeInfoWithStorage[10.20.84.30:50011
> > ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
> >   isLastBlockComplete=true}
> > 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode
> > 10.20.84.30:50011
> > 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to
> datanode
> > 10.20.84.27:50011
> >
> > Do you see it reading from 'good' or 'bad' blocks?
> >
> > I added this line to hbase log4j.properties to enable DFSClient DEBUG:
> >
> > log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
> >
> > On HBASE-12949, what exception is coming up?  Dump it in here.
> >
> >
> >
> > > My goal is to determine whether the block in question is actually
> corrupt
> > > and, if so, in what way.
> >
> >
> > What happens if you just try to copy the file local or elsewhere in the
> > filesystem using dfs shell. Do you get a pure dfs exception unhampered by
> > hbaseyness?
> >
> >
> >
> > > If it's possible to recover all of the file except
> > > a portion of the affected block, that would be OK too.
> >
> >
> > I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> > add it so you can recover all but the bad block (we should figure how to
> > skip the bad section also).
> >
> >
> >
> > > I just don't want to
> > > be in the position of having to lose all 3 gigs of data in this
> > particular
> > > region, given that most of it appears to be intact. I just can't find
> the
> > > right low-level tools to let me determine the diagnose the exact state
> > and
> > > structure of the block data I have for this file.
> > >
> > >
> > Nod.
> >
> >
> >
> > > Any help or direction that someone could provide would be much
> > appreciated.
> > > For reference, I'll repeat that our client is running Hadoop
> > 2.0.0-cdh4.6.0
> > > and add that the HBase version is 0.94.15-cdh4.6.0.
> > >
> > >
> > See if any of the above helps. I'll try and dig up some more tools in
> > meantime.
> > St.Ack
> >
> >
> >
> > > Thanks!
> > >
> > > -md
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>

Re: Recovering from corrupt blocks in HFile

Posted by Andrew Purtell <ap...@apache.org>.


On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:
>
> > If it's possible to recover all of the file except
> > a portion of the affected block, that would be OK too.
>
> I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> add it so you can recover all but the bad block (we should figure how to
> skip the bad section also).


I was just getting caught up on this thread and had the same thought. Is
there an issue filed for this?


On Tue, Mar 17, 2015 at 9:47 PM, Stack <st...@duboce.net> wrote:

> On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mi...@synctree.com>
> wrote:
>
> > Hi all-
> >
> > I've got an HFile that's reporting a corrupt block in "hadoop fsck" and
> was
> > hoping to get some advice on recovering as much data as possible.
> >
> > When I examined the blk-* file on the three data nodes that have a
> replica
> > of the affected block, I saw that the replicas on two of the datanodes
> had
> > the same SHA-1 checksum and that the replica on the other datanode was a
> > truncated version of the replica found on the other nodes (as reported
> by a
> > difference at EOF by "cmp"). The size of the two identical blocks is
> > 67108864, the same as most of the other blocks in the file.
> >
> > Given that there were two datanodes with the same data and another with
> > truncated data, I made a backup of the truncated file and dropped the
> > full-length copy of the block in its place directly on the data mount,
> > hoping that this would cause HDFS to no longer report the file as
> corrupt.
> > Unfortunately, this didn't seem to have any effect.
> >
> >
> That seems like a reasonable thing to do.
>
> Did you restart the DN that was serving this block before you ran fsck?
> (Fsck asks namenode what blocks are bad; it likely is still reporting off
> old info).
>
>
>
> > Looking through the Hadoop source code, it looks like there is a
> > CorruptReplicasMap internally that tracks which nodes have "corrupt"
> copies
> > of a block. In HDFS-6663 <
> https://issues.apache.org/jira/browse/HDFS-6663
> > >,
> > a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
> > reason that a block ids is considered corrupt, but that wasn't added
> until
> > Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
> >
> >
> Good digging.
>
>
>
> > I also had a look at running the "HFile" tool on the affected file (cf.
> > section 9.7.5.2.2 at http://hbase.apache.org/0.94/book/regions.arch.html
> ).
> > When I did that, I was able to see the data up to the corrupted block as
> > far as I could tell, but then it started repeatedly looping back to the
> > first row and starting over. I believe this is related to the behavior
> > described in https://issues.apache.org/jira/browse/HBASE-12949
>
>
>
> So, your file is 3G and your blocks are 128M?
>
> The dfsclient should just pass over the bad replica and move on to the good
> one so it would seem to indicate all replicas are bad for you.
>
> If you enable DFSClient DEBUG level logging it should report which blocks
> it is reading from. For example, here I am reading the start of the index
> blocks with DFSClient DEBUG enabled but I grep out the DFSClient emissions
> only:
>
> [stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
> org.apache.hadoop.hbase.io.hfile.HFile -h -f
>
> /hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
> DFSClient
> 2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
> org.apache.hadoop.util.PureJavaCrc32 available
> 2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
> org.apache.hadoop.util.PureJavaCrc32C available
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
>
> [jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
>
> [jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig:
> CacheConfig:disabled
> 2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
> LocatedBlocks{
>   fileLength=108633903
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>   isLastBlockComplete=true}
> 2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to datanode
> 10.20.84.27:50011
> 2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to datanode
> 10.20.84.27:50011
> 2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
> LocatedBlocks{
>   fileLength=108633903
>   underConstruction=false
>
>
> blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]
>
>
> lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
> getBlockSize()=108633903; corrupt=false; offset=0;
> locs=[DatanodeInfoWithStorage[10.20.84.27:50011
> ,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
> DatanodeInfoWithStorage[10.20.84.31:50011
> ,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
> DatanodeInfoWithStorage[10.20.84.30:50011
> ,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
>   isLastBlockComplete=true}
> 2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to datanode
> 10.20.84.30:50011
> 2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to datanode
> 10.20.84.27:50011
>
> Do you see it reading from 'good' or 'bad' blocks?
>
> I added this line to hbase log4j.properties to enable DFSClient DEBUG:
>
> log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG
>
> On HBASE-12949, what exception is coming up?  Dump it in here.
>
>
>
> > My goal is to determine whether the block in question is actually corrupt
> > and, if so, in what way.
>
>
> What happens if you just try to copy the file local or elsewhere in the
> filesystem using dfs shell. Do you get a pure dfs exception unhampered by
> hbaseyness?
>
>
>
> > If it's possible to recover all of the file except
> > a portion of the affected block, that would be OK too.
>
>
> I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
> add it so you can recover all but the bad block (we should figure how to
> skip the bad section also).
>
>
>
> > I just don't want to
> > be in the position of having to lose all 3 gigs of data in this
> particular
> > region, given that most of it appears to be intact. I just can't find the
> > right low-level tools to let me determine the diagnose the exact state
> and
> > structure of the block data I have for this file.
> >
> >
> Nod.
>
>
>
> > Any help or direction that someone could provide would be much
> appreciated.
> > For reference, I'll repeat that our client is running Hadoop
> 2.0.0-cdh4.6.0
> > and add that the HBase version is 0.94.15-cdh4.6.0.
> >
> >
> See if any of the above helps. I'll try and dig up some more tools in
> meantime.
> St.Ack
>
>
>
> > Thanks!
> >
> > -md
> >
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Re: Recovering from corrupt blocks in HFile

Posted by Stack <st...@duboce.net>.

On Tue, Mar 17, 2015 at 5:04 PM, Mike Dillon <mi...@synctree.com>
wrote:

> Hi all-
>
> I've got an HFile that's reporting a corrupt block in "hadoop fsck" and was
> hoping to get some advice on recovering as much data as possible.
>
> When I examined the blk-* file on the three data nodes that have a replica
> of the affected block, I saw that the replicas on two of the datanodes had
> the same SHA-1 checksum and that the replica on the other datanode was a
> truncated version of the replica found on the other nodes (as reported by a
> difference at EOF by "cmp"). The size of the two identical blocks is
> 67108864, the same as most of the other blocks in the file.
>
> Given that there were two datanodes with the same data and another with
> truncated data, I made a backup of the truncated file and dropped the
> full-length copy of the block in its place directly on the data mount,
> hoping that this would cause HDFS to no longer report the file as corrupt.
> Unfortunately, this didn't seem to have any effect.
>
>
That seems like a reasonable thing to do.

Did you restart the DN that was serving this block before you ran fsck?
(Fsck asks namenode what blocks are bad; it likely is still reporting off
old info).

> Looking through the Hadoop source code, it looks like there is a
> CorruptReplicasMap internally that tracks which nodes have "corrupt" copies
> of a block. In HDFS-6663 <https://issues.apache.org/jira/browse/HDFS-6663
> >,
> a "-blockId" parameter was added to "hadoop fsck" to allow dumping the
> reason that a block ids is considered corrupt, but that wasn't added until
> Hadoop 2.7.0 and our client is running 2.0.0-cdh4.6.0.
>
>
Good digging.

> I also had a look at running the "HFile" tool on the affected file (cf.
> section 9.7.5.2.2 at http://hbase.apache.org/0.94/book/regions.arch.html).
> When I did that, I was able to see the data up to the corrupted block as
> far as I could tell, but then it started repeatedly looping back to the
> first row and starting over. I believe this is related to the behavior
> described in https://issues.apache.org/jira/browse/HBASE-12949

So, your file is 3G and your blocks are 128M?

The dfsclient should just pass over the bad replica and move on to the good
one so it would seem to indicate all replicas are bad for you.

If you enable DFSClient DEBUG level logging it should report which blocks
it is reading from. For example, here I am reading the start of the index
blocks with DFSClient DEBUG enabled but I grep out the DFSClient emissions
only:

[stack@c2020 ~]$ ./hbase/bin/hbase --config ~/conf_hbase
org.apache.hadoop.hbase.io.hfile.HFile -h -f
/hbase/data/default/tsdb/3f4ea5ea14653cee6006f13c7d06d10b/t/68b00cb158aa4d839f1744639880f362|grep
DFSClient
2015-03-17 21:42:56,950 DEBUG [main] util.ChecksumType:
org.apache.hadoop.util.PureJavaCrc32 available
2015-03-17 21:42:56,952 DEBUG [main] util.ChecksumType:
org.apache.hadoop.util.PureJavaCrc32C available
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/home/stack/hbase-1.0.1-SNAPSHOT/lib/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/home/stack/hadoop-2.7.0-SNAPSHOT/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2015-03-17 21:42:58,082 INFO  [main] hfile.CacheConfig: CacheConfig:disabled
2015-03-17 21:42:58,126 DEBUG [main] hdfs.DFSClient: newInfo =
LocatedBlocks{
  fileLength=108633903
  underConstruction=false

blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
getBlockSize()=108633903; corrupt=false; offset=0;
locs=[DatanodeInfoWithStorage[10.20.84.27:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
DatanodeInfoWithStorage[10.20.84.30:50011
,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]

lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
getBlockSize()=108633903; corrupt=false; offset=0;
locs=[DatanodeInfoWithStorage[10.20.84.30:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
DatanodeInfoWithStorage[10.20.84.27:50011
,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
  isLastBlockComplete=true}
2015-03-17 21:42:58,132 DEBUG [main] hdfs.DFSClient: Connecting to datanode
10.20.84.27:50011
2015-03-17 21:42:58,281 DEBUG [main] hdfs.DFSClient: Connecting to datanode
10.20.84.27:50011
2015-03-17 21:42:58,375 DEBUG [main] hdfs.DFSClient: newInfo =
LocatedBlocks{
  fileLength=108633903
  underConstruction=false

blocks=[LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
getBlockSize()=108633903; corrupt=false; offset=0;
locs=[DatanodeInfoWithStorage[10.20.84.30:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
DatanodeInfoWithStorage[10.20.84.27:50011
,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}]

lastLocatedBlock=LocatedBlock{BP-410607956-10.20.84.26-1391491814882:blk_1078238905_1099516142201;
getBlockSize()=108633903; corrupt=false; offset=0;
locs=[DatanodeInfoWithStorage[10.20.84.27:50011,DS-21a30dbf-5085-464d-97f4-608a0b610c49,DISK],
DatanodeInfoWithStorage[10.20.84.31:50011,DS-aa69a8eb-2761-40c7-9b18-9b887c8e5791,DISK],
DatanodeInfoWithStorage[10.20.84.30:50011
,DS-03a89da2-8ab6-465a-80bb-c83473f1dc8b,DISK]]}
  isLastBlockComplete=true}
2015-03-17 21:42:58,376 DEBUG [main] hdfs.DFSClient: Connecting to datanode
10.20.84.30:50011
2015-03-17 21:42:58,381 DEBUG [main] hdfs.DFSClient: Connecting to datanode
10.20.84.27:50011

Do you see it reading from 'good' or 'bad' blocks?

I added this line to hbase log4j.properties to enable DFSClient DEBUG:

log4j.logger.org.apache.hadoop.hdfs.DFSClient=DEBUG

On HBASE-12949, what exception is coming up?  Dump it in here.

> My goal is to determine whether the block in question is actually corrupt
> and, if so, in what way.

What happens if you just try to copy the file local or elsewhere in the
filesystem using dfs shell. Do you get a pure dfs exception unhampered by
hbaseyness?

> If it's possible to recover all of the file except
> a portion of the affected block, that would be OK too.

I actually do not see a 'fix' or 'recover' on the hfile tool. We need to
add it so you can recover all but the bad block (we should figure how to
skip the bad section also).

> I just don't want to
> be in the position of having to lose all 3 gigs of data in this particular
> region, given that most of it appears to be intact. I just can't find the
> right low-level tools to let me determine the diagnose the exact state and
> structure of the block data I have for this file.
>
>
Nod.

> Any help or direction that someone could provide would be much appreciated.
> For reference, I'll repeat that our client is running Hadoop 2.0.0-cdh4.6.0
> and add that the HBase version is 0.94.15-cdh4.6.0.
>
>
See if any of the above helps. I'll try and dig up some more tools in
meantime.
St.Ack

> Thanks!
>
> -md
>