You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Wendy Chien (JIRA)" <ji...@apache.org> on 2007/01/23 03:49:49 UTC

[jira] Updated: (HADOOP-731) Sometimes when a dfs file is accessed and one copy has a checksum error the I/O command fails, even if another copy is alright.

     [ https://issues.apache.org/jira/browse/HADOOP-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wendy Chien updated HADOOP-731:
-------------------------------

    Attachment: hadoop-731-7.patch

Attached a patch which allows us to continue reading after getting a checksum error by modifying Checker.read to catch ChecksumExceptions thrown by verifySum.  

In Checker.read, if we get a ChecksumException, we seek to a new datanode for both the data stream and the checksum stream (when using dfs, this is a no op for other fs).  If at least one of the datanodes is different from before, we'll retry the read.  

In DFSInputStream, added a new seek method which also requests a datanode other than the current node.

 

> Sometimes when a dfs file is accessed and one copy has a checksum error the I/O command fails, even if another copy is alright.
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-731
>                 URL: https://issues.apache.org/jira/browse/HADOOP-731
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.7.2
>            Reporter: Dick King
>         Assigned To: Wendy Chien
>         Attachments: hadoop-731-7.patch
>
>
> for a particular file [alas, the file no longer exists -- I had to progress]  
>     $dfs -cp foo bar        
> and
>     $dfs -get foo local
> failed on a checksum error.  The dfs browser's download function retrieved the file, so either that function doesn't check, or more likely the download function got a different copy.
> When a checksum fails on one copy of a file that is redundantly stored, I would prefer that dfs try a different copy, mark the bad one as not existing [which should induce a fresh copy being made from one of the good copies eventually], and make the call continue to work and deliver bytes.
> Ideally, if all copies have checksum errors but it's possible to piece together a good copy I would like that to be done.
> -dk

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.