You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2007/10/27 21:41:51 UTC

[jira] Created: (HADOOP-2114) Checksums for Namenode image files

Checksums for Namenode image files
----------------------------------

                 Key: HADOOP-2114
                 URL: https://issues.apache.org/jira/browse/HADOOP-2114
             Project: Hadoop
          Issue Type: Improvement
            Reporter: Raghu Angadi


Currently DFS can write multiple copies of image files but we do not automatically recover from corrupted or truncated image files well. This jira intends to keep CRC for image file records and Namenode should recover accurate image as long as data exists in one of the copies (e.g. it should be ok to have non overlapping corruptions in the copies). Will add more details in next comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2114) Checksums for Namenode image files

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539204 ] 

Raghu Angadi commented on HADOOP-2114:
--------------------------------------


Regd performance, we should be  able to write about 10k records (of 100-300 bytes each) per sec and about 10-20 records could be batched.

> Checksums for Namenode image files
> ----------------------------------
>
>                 Key: HADOOP-2114
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2114
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.3
>            Reporter: Raghu Angadi
>
> Currently DFS can write multiple copies of image files but we do not automatically recover from corrupted or truncated image files well. This jira intends to keep CRC for image file records and Namenode should recover accurate image as long as data exists in one of the copies (e.g. it should be ok to have non overlapping corruptions in the copies). Will add more details in next comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2114) Checksums for Namenode image files

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12539196 ] 

Raghu Angadi commented on HADOOP-2114:
--------------------------------------


The basic proposal is to keep crc for each "record" (e.g. in FSImage a file entry is a record) and in case of errors, try to recover pristine records from multiple copies of the image files.

Before I proceed further with design or implementation, are there any standard solutions for keeping few gigs of data resistent to various errors like on disk corruption? Its ok even if the existing solution is not compatibility with Apache. We can read about the method and see how it compares.

The only requirement is that NameNode should be confident that image files it writes and reads have correct data.

thanks.


> Checksums for Namenode image files
> ----------------------------------
>
>                 Key: HADOOP-2114
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2114
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Raghu Angadi
>
> Currently DFS can write multiple copies of image files but we do not automatically recover from corrupted or truncated image files well. This jira intends to keep CRC for image file records and Namenode should recover accurate image as long as data exists in one of the copies (e.g. it should be ok to have non overlapping corruptions in the copies). Will add more details in next comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-2114) Checksums for Namenode image files

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-2114:
---------------------------------

          Component/s: dfs
    Affects Version/s: 0.14.3

> Checksums for Namenode image files
> ----------------------------------
>
>                 Key: HADOOP-2114
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2114
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.14.3
>            Reporter: Raghu Angadi
>
> Currently DFS can write multiple copies of image files but we do not automatically recover from corrupted or truncated image files well. This jira intends to keep CRC for image file records and Namenode should recover accurate image as long as data exists in one of the copies (e.g. it should be ok to have non overlapping corruptions in the copies). Will add more details in next comment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.