You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Wittawat Tantisiriroj (JIRA)" <ji...@apache.org> on 2010/08/27 22:20:54 UTC

[jira] Created: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Enable Erasure Code in Tool similar to Hadoop Archive
-----------------------------------------------------

                 Key: MAPREDUCE-2036
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: contrib/raid
            Reporter: Wittawat Tantisiriroj
            Priority: Minor


Features:
1) HAR-like Tool
2) RAID5/RAID6 & pluggable interface to implement additional coding
3) Enable to group blocks across files
4) Portable across cluster since all necessary metadata is embedded

While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Wittawat Tantisiriroj (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wittawat Tantisiriroj updated MAPREDUCE-2036:
---------------------------------------------

    Attachment: hdfs-raid.tar.gz
                MAPREDUCE-2036.patch

Prototype is uploaded.

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921024#action_12921024 ] 

Scott Chen commented on MAPREDUCE-2036:
---------------------------------------

Hey Wittawat,
bq. 2) Before reducing the replication, I would like to make sure that no block in the same group located on the same datanode. I have been working on a tool (similar Balancer) to migrate blocks so that no block in the same group located on the same datanode. 
I like this idea of migrating blocks. Is it possible that you can implement this on the current RAID project. That will be really helpful.

bq. Plus, I am also working on porting RS codes from Jerasure (http://www.cs.utk.edu/~plank/plank/papers/CS-08-627.html), so it can support more than 2 parities.
We have also implemented a java version of RS code in MAPREDUCE-1970. It has been deployed on our test cluster which holds 300TB of data.
In this patch, we have an interface for general erasure codes.
Maybe you can make your patch implements the same interface. So we can configure different codecs to use.
I think the encode/decode is more IO-bounded because the parity length we are using is really small comparing to the regular use cases of RS codes.

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid, harchive
>            Reporter: Wittawat Tantisiriroj
>            Assignee: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Wittawat Tantisiriroj (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920285#action_12920285 ] 

Wittawat Tantisiriroj commented on MAPREDUCE-2036:
--------------------------------------------------

1) Yes, it is a great idea. Please let me know how I can help integrate it with the current RAID
2) Before reducing the replication, I would like to make sure that no block in the same group located on the same datanode. I have been working on a tool (similar Balancer) to migrate blocks so that no block in the same group located on the same datanode. 
3) I agree. However, it would make sense to store parity files inside a HAR directory?

Plus, I am also working on porting RS codes from Jerasure (http://www.cs.utk.edu/~plank/plank/papers/CS-08-627.html), so it can support more than 2 parities.

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid, harchive
>            Reporter: Wittawat Tantisiriroj
>            Assignee: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Tsz Wo (Nicholas), SZE (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tsz Wo (Nicholas), SZE updated MAPREDUCE-2036:
----------------------------------------------

       Assignee: Wittawat Tantisiriroj
    Component/s: harchive

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid, harchive
>            Reporter: Wittawat Tantisiriroj
>            Assignee: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Ramkumar Vadali (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904399#action_12904399 ] 

Ramkumar Vadali commented on MAPREDUCE-2036:
--------------------------------------------

Hi Wittawat, good work on generating this patch! The concept of RAIDing files in a directory is a good complement to the existing RAID, which requires larger files.
Some thoughts:
1. It will be really good to integrate this with the current RAID. Apart from the obvious code reuse in DistributedRaidFileSystem, it has some automation around generating parity files. I also have a lot of upcoming patches that automate repair of lost blocks.
2. I did not see code that reduces the replication for RAIDed files. Is that supposed to be done independent of this tool?
3. A usage-related question: I assume the source directory under consideration is older data such that users can tolerate some increase in read latency. If so, the source directory could be HAR'ed and the result files then RAIDed using the current RAID. Thoughts? 

Looking forward to a good discussion!

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid, harchive
>            Reporter: Wittawat Tantisiriroj
>            Assignee: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Wittawat Tantisiriroj (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wittawat Tantisiriroj updated MAPREDUCE-2036:
---------------------------------------------

    Attachment:     (was: RaidTool.docx)

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Wittawat Tantisiriroj (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wittawat Tantisiriroj updated MAPREDUCE-2036:
---------------------------------------------

    Attachment: RaidTool.pdf

PDF Version of design document is uploaded.

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

Posted by "Wittawat Tantisiriroj (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wittawat Tantisiriroj updated MAPREDUCE-2036:
---------------------------------------------

    Attachment: RaidTool.docx

Design Document

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: RaidTool.docx
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.