You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/07/24 09:36:14 UTC

[jira] Created: (HDFS-503) Implement erasure coding as a layer on HDFS

Implement erasure coding as a layer on HDFS
-------------------------------------------

                 Key: HDFS-503
                 URL: https://issues.apache.org/jira/browse/HDFS-503
             Project: Hadoop HDFS
          Issue Type: New Feature
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before.

Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.

My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Reopened: (HDFS-503) Implement erasure coding as a layer on HDFS

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Boudnik reopened HDFS-503:
-------------------------------------


Raid related tests are keep failing for a while now (http://hudson.zones.apache.org/hudson/view/Hadoop/job/Hadoop-Hdfs-trunk/135/console and before):
{noformat}
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.711 sec
    [junit] Test org.apache.hadoop.hdfs.TestRaidDfs FAILED
...
    [junit] Tests run: 2, Failures: 0, Errors: 2, Time elapsed: 0.632 sec
    [junit] Test org.apache.hadoop.raid.TestRaidNode FAILED
...
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.461 sec
    [junit] Test org.apache.hadoop.raid.TestRaidPurge FAILED
{noformat}
It appears that some configuration options are incorrect.

> Implement erasure coding as a layer on HDFS
> -------------------------------------------
>
>                 Key: HDFS-503
>                 URL: https://issues.apache.org/jira/browse/HDFS-503
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: contrib/raid
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.22.0
>
>         Attachments: raid1.txt, raid2.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HDFS-503) Implement erasure coding as a layer on HDFS

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HDFS-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur resolved HDFS-503.
-----------------------------------

    Resolution: Fixed

I will fix the unit-test failure via HDFS-757. The unit tests failed when Maven integration code was checked into HDFS.

> Implement erasure coding as a layer on HDFS
> -------------------------------------------
>
>                 Key: HDFS-503
>                 URL: https://issues.apache.org/jira/browse/HDFS-503
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: contrib/raid
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.22.0
>
>         Attachments: raid1.txt, raid2.txt
>
>
> The goal of this JIRA is to discuss how the cost of raw storage for a HDFS file system can be reduced. Keeping three copies of the same data is very costly, especially when the size of storage is huge. One idea is to reduce the replication factor and do erasure coding of a set of blocks so that the over probability of failure of a block remains the same as before.
> Many forms of error-correcting codes are available, see http://en.wikipedia.org/wiki/Erasure_code. Also, recent research from CMU has described DiskReduce https://opencirrus.org/system/files/Gibson-OpenCirrus-June9-09.ppt.
> My opinion is to discuss implementation strategies that are not part of base HDFS, but is a layer on top of HDFS.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.