You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Colin Patrick McCabe (JIRA)" <ji...@apache.org> on 2016/05/05 19:43:12 UTC

[jira] [Commented] (HADOOP-13010) Refactor raw erasure coders

    [ https://issues.apache.org/jira/browse/HADOOP-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272936#comment-15272936 ] 

Colin Patrick McCabe commented on HADOOP-13010:
-----------------------------------------------

Thanks, guys.  Sorry for the delay in reviewing.  We've been busy.

{{CodecUtil.java}}: there are a LOT of functions here for creating {{RawErasureEncoder}} objects.
We've got:
{code}
createRSRawEncoder(Configuration conf, int numDataUnits, int numParityUnits, String codec)
createRSRawEncoder(Configuration conf, int numDataUnit, int numParityUnit)
createRSRawEncoder(Configuration conf, String codec, ErasureCoderOptions coderOptions)
createRSRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
createXORRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
createXORRawEncoder(Configuration conf, int numDataUnits, int numParityUnits)
createRawEncoder(Configuration conf, String rawCoderFactoryKey, ErasureCoderOptions coderOptions)
{code}

Plus a similar number of functions for creating decoders.  Why do we have to have so many functions?  Surely the codec, numParityUnits, numDataUnits, whether it is XOR or not, etc. etc. should just be included in ErasureCoderOptions.  Then we could just have one function:
{code}
createRawEncoder(Configuration conf, ErasureCoderOptions coderOptions)
{code}

On a related note, why does each particular type of encoder need its own factory?  It seems like we just need a static function for each encoder type that takes a Configuration and ErasureCoderOptions, and we're good to go.  We can locate these static functions via reflection.

{code}
  protected void doDecode(DecodingState decodingState, byte[][] inputs,
                          int[] inputOffsets, int[] erasedIndexes,
                          byte[][] outputs, int[] outputOffsets) {
{code}
Can we just include the inputs, inputOffsets, erasedIndexes, outputs, outputOffsets in {{DecodingState}}?

> Refactor raw erasure coders
> ---------------------------
>
>                 Key: HADOOP-13010
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13010
>             Project: Hadoop Common
>          Issue Type: Sub-task
>            Reporter: Kai Zheng
>            Assignee: Kai Zheng
>         Attachments: HADOOP-13010-v1.patch, HADOOP-13010-v2.patch, HADOOP-13010-v3.patch, HADOOP-13010-v4.patch, HADOOP-13010-v5.patch
>
>
> This will refactor raw erasure coders according to some comments received so far.
> * As discussed in HADOOP-11540 and suggested by [~cmccabe], better not to rely class inheritance to reuse the codes, instead they can be moved to some utility.
> * Suggested by [~jingzhao] somewhere quite some time ago, better to have a state holder to keep some checking results for later reuse during an encode/decode call.
> This would not get rid of some inheritance levels as doing so isn't clear yet for the moment and also incurs big impact. I do wish the end result by this refactoring will make all the levels more clear and easier to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org