You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kai Zheng (JIRA)" <ji...@apache.org> on 2015/05/04 16:01:07 UTC
[jira] [Commented] (HADOOP-11828) Implement the Hitchhiker erasure coding algorithm

    [ https://issues.apache.org/jira/browse/HADOOP-11828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526652#comment-14526652 ] 

Kai Zheng commented on HADOOP-11828:
------------------------------------

Hi Jack, the updated patch looks overall good to me. Some comments so far:
* Some comments might be better to reorganized to make them look better. Some are too long, and some can be longer.
* Please note lines should not exceed 80 chars. You could set the width limit in your IDE.
* As both xor raw coder and rs raw coder are common to erasure coders for RS and HH, please extract common codes resolving the duplicates to abstract class, regarding creating xor and rs raw coder.
* We may need abstract class like {{HHErasureDecodingStep}} and {{HHErasureEncodingStep}} for the three derivations of the HH algorithm. Classes like {{HHXORErasureDecodingStep}} can inherit from them.
* Please try to reuse codes between the two versions of coding: byte[] version and ByteBuffer version. You may look at the patch in HADOOP-11847 for some idea.
* We might not override {{testCoding}} and {{performCodingStep}} in {{TestHHErasureCoderBase}}. Any specific for HH here? If we have to, then there would be problem to use the coder as it's not general to use.
* We need Javadocs for the public functions in {{HHUtil}}.
* Is it possible to avoid the cloning input data in {{getPiggyBacksFromInput}}?
* I thought we don't need this test as it's the configuration isn't specific to the coder.
{code}
+  @Test
+  public void testCodingDirectBufferWithConf_10x4() {
+    /**
+     * This tests if the two configuration items work or not.
+     */
+    Configuration conf = new Configuration();
+    conf.set(CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_RAWCODER_KEY,
+        RSRawErasureCoderFactory.class.getCanonicalName());
+    conf.setBoolean(
+        CommonConfigurationKeys.IO_ERASURECODE_CODEC_RS_USEXOR_KEY, false);
+    prepare(conf, 10, 4, null);
+    initHitchhiker();
+    testCoding(true);
+  }
{code}

> Implement the Hitchhiker erasure coding algorithm
> -------------------------------------------------
>
>                 Key: HADOOP-11828
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11828
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Zhe Zhang
>            Assignee: jack liuquan
>         Attachments: 7715-hitchhikerXOR-v2-testcode.patch, 7715-hitchhikerXOR-v2.patch, HADOOP-11828-hitchhikerXOR-V3.patch, HADOOP-11828-hitchhikerXOR-V4.patch, HDFS-7715-hhxor-decoder.patch, HDFS-7715-hhxor-encoder.patch
>
>
> [Hitchhiker | http://www.eecs.berkeley.edu/~nihar/publications/Hitchhiker_SIGCOMM14.pdf] is a new erasure coding algorithm developed as a research project at UC Berkeley. It has been shown to reduce network traffic and disk I/O by 25%-45% during data reconstruction. This JIRA aims to introduce Hitchhiker to the HDFS-EC framework, as one of the pluggable codec algorithms.
> The existing implementation is based on HDFS-RAID. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)