You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ramkumar Vadali (JIRA)" <ji...@apache.org> on 2010/08/31 01:02:56 UTC

[jira] Commented: (MAPREDUCE-2036) Enable Erasure Code in Tool similar to Hadoop Archive

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12904399#action_12904399 ] 

Ramkumar Vadali commented on MAPREDUCE-2036:
--------------------------------------------

Hi Wittawat, good work on generating this patch! The concept of RAIDing files in a directory is a good complement to the existing RAID, which requires larger files.
Some thoughts:
1. It will be really good to integrate this with the current RAID. Apart from the obvious code reuse in DistributedRaidFileSystem, it has some automation around generating parity files. I also have a lot of upcoming patches that automate repair of lost blocks.
2. I did not see code that reduces the replication for RAIDed files. Is that supposed to be done independent of this tool?
3. A usage-related question: I assume the source directory under consideration is older data such that users can tolerate some increase in read latency. If so, the source directory could be HAR'ed and the result files then RAIDed using the current RAID. Thoughts? 

Looking forward to a good discussion!

> Enable Erasure Code in Tool similar to Hadoop Archive
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2036
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2036
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid, harchive
>            Reporter: Wittawat Tantisiriroj
>            Assignee: Wittawat Tantisiriroj
>            Priority: Minor
>         Attachments: hdfs-raid.tar.gz, MAPREDUCE-2036.patch, RaidTool.pdf
>
>
> Features:
> 1) HAR-like Tool
> 2) RAID5/RAID6 & pluggable interface to implement additional coding
> 3) Enable to group blocks across files
> 4) Portable across cluster since all necessary metadata is embedded
> While it was developed separately from HAR or RAID due to time constraints, it would make sense to integrate with either of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.