You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Wendy Chien (JIRA)" <ji...@apache.org> on 2007/01/23 03:49:49 UTC
[jira] Updated: (HADOOP-731) Sometimes when a dfs file is accessed
and one copy has a checksum error the I/O command fails, even if another
copy is alright.
[ https://issues.apache.org/jira/browse/HADOOP-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wendy Chien updated HADOOP-731:
-------------------------------
Attachment: hadoop-731-7.patch
Attached a patch which allows us to continue reading after getting a checksum error by modifying Checker.read to catch ChecksumExceptions thrown by verifySum.
In Checker.read, if we get a ChecksumException, we seek to a new datanode for both the data stream and the checksum stream (when using dfs, this is a no op for other fs). If at least one of the datanodes is different from before, we'll retry the read.
In DFSInputStream, added a new seek method which also requests a datanode other than the current node.
> Sometimes when a dfs file is accessed and one copy has a checksum error the I/O command fails, even if another copy is alright.
> -------------------------------------------------------------------------------------------------------------------------------
>
> Key: HADOOP-731
> URL: https://issues.apache.org/jira/browse/HADOOP-731
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.7.2
> Reporter: Dick King
> Assigned To: Wendy Chien
> Attachments: hadoop-731-7.patch
>
>
> for a particular file [alas, the file no longer exists -- I had to progress]
> $dfs -cp foo bar
> and
> $dfs -get foo local
> failed on a checksum error. The dfs browser's download function retrieved the file, so either that function doesn't check, or more likely the download function got a different copy.
> When a checksum fails on one copy of a file that is redundantly stored, I would prefer that dfs try a different copy, mark the bad one as not existing [which should induce a fresh copy being made from one of the good copies eventually], and make the call continue to work and deliver bytes.
> Ideally, if all copies have checksum errors but it's possible to piece together a good copy I would like that to be done.
> -dk
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.