You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mike Andrews <mr...@xoba.com> on 2009/03/26 13:21:39 UTC

corrupt unreplicated block in dfs (0.18.3)

i noticed that when a file with no replication (i.e., replication=1)
develops a corrupt block, hadoop takes no action aside from the
datanode throwing an exception to the client trying to read the file.
i manually corrupted a block in order to observe this.

obviously, with replication=1 its impossible to fix the block, but i
thought perhaps hadoop would take some other action, such as deleting
the file outright, or moving it to a "corrupt" directory, or marking
it or keeping track of it somehow to note that there's un-fixable
corruption in the filesystem? thus, the current behaviour seems to
sweep the corruption under the rug and allows its continued existence,
aside from notifying the specific client doing the read with an
exception.

if anyone has any information about this issue or how to work around
it, please let me know.

on the other hand, i tested that corrupting a block in a replication=3
file causes hadoop to re-replicate the block from another existing
copy, which is good and is i what i expected.

best,
mike


-- 
permanent contact information at http://mikerandrews.com

RE: corrupt unreplicated block in dfs (0.18.3)

Posted by Koji Noguchi <kn...@yahoo-inc.com>.
Mike, you might want to look at -move option in fsck.

bash-3.00$ hadoop fsck
Usage: DFSck <path> [-move | -delete | -openforwrite] [-files [-blocks
[-locations | -racks]]]
        <path>  start checking from this path
        -move   move corrupted files to /lost+found
        -delete delete corrupted files
        -files  print out files being checked
        -openforwrite   print out files opened for write
        -blocks print out block report
        -locations      print out locations for every block
        -racks  print out network topology for data-node locations



I never use it since I would rather have users' jobs fail than jobs
succeeding with incomplete inputs.

Koji


-----Original Message-----
From: Aaron Kimball [mailto:aaron@cloudera.com] 
Sent: Thursday, March 26, 2009 9:41 AM
To: core-user@hadoop.apache.org
Subject: Re: corrupt unreplicated block in dfs (0.18.3)

Just because a block is corrupt doesn't mean the entire file is corrupt.
Furthermore, the presence/absence of a file in the namespace is a
completely
separate issue to the data in the file. I think it would be a surprising
interface change if files suddenly disappeared just because 1 out of
potentially many blocks were corrupt.

- Aaron

On Thu, Mar 26, 2009 at 1:21 PM, Mike Andrews <mr...@xoba.com> wrote:

> i noticed that when a file with no replication (i.e., replication=1)
> develops a corrupt block, hadoop takes no action aside from the
> datanode throwing an exception to the client trying to read the file.
> i manually corrupted a block in order to observe this.
>
> obviously, with replication=1 its impossible to fix the block, but i
> thought perhaps hadoop would take some other action, such as deleting
> the file outright, or moving it to a "corrupt" directory, or marking
> it or keeping track of it somehow to note that there's un-fixable
> corruption in the filesystem? thus, the current behaviour seems to
> sweep the corruption under the rug and allows its continued existence,
> aside from notifying the specific client doing the read with an
> exception.
>
> if anyone has any information about this issue or how to work around
> it, please let me know.
>
> on the other hand, i tested that corrupting a block in a replication=3
> file causes hadoop to re-replicate the block from another existing
> copy, which is good and is i what i expected.
>
> best,
> mike
>
>
> --
> permanent contact information at http://mikerandrews.com
>

Re: corrupt unreplicated block in dfs (0.18.3)

Posted by Aaron Kimball <aa...@cloudera.com>.
Just because a block is corrupt doesn't mean the entire file is corrupt.
Furthermore, the presence/absence of a file in the namespace is a completely
separate issue to the data in the file. I think it would be a surprising
interface change if files suddenly disappeared just because 1 out of
potentially many blocks were corrupt.

- Aaron

On Thu, Mar 26, 2009 at 1:21 PM, Mike Andrews <mr...@xoba.com> wrote:

> i noticed that when a file with no replication (i.e., replication=1)
> develops a corrupt block, hadoop takes no action aside from the
> datanode throwing an exception to the client trying to read the file.
> i manually corrupted a block in order to observe this.
>
> obviously, with replication=1 its impossible to fix the block, but i
> thought perhaps hadoop would take some other action, such as deleting
> the file outright, or moving it to a "corrupt" directory, or marking
> it or keeping track of it somehow to note that there's un-fixable
> corruption in the filesystem? thus, the current behaviour seems to
> sweep the corruption under the rug and allows its continued existence,
> aside from notifying the specific client doing the read with an
> exception.
>
> if anyone has any information about this issue or how to work around
> it, please let me know.
>
> on the other hand, i tested that corrupting a block in a replication=3
> file causes hadoop to re-replicate the block from another existing
> copy, which is good and is i what i expected.
>
> best,
> mike
>
>
> --
> permanent contact information at http://mikerandrews.com
>