You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "László Bence Nagy (JIRA)" <ji...@apache.org> on 2017/03/17 14:44:41 UTC

[jira] [Created] (HDFS-11542) Fix RawErasureCoderBenchmark decoding operation

László Bence Nagy created HDFS-11542:
----------------------------------------

             Summary: Fix RawErasureCoderBenchmark decoding operation
                 Key: HDFS-11542
                 URL: https://issues.apache.org/jira/browse/HDFS-11542
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: erasure-coding
    Affects Versions: 3.0.0-alpha2
            Reporter: László Bence Nagy
            Priority: Minor


There are some issues with the decode operation in the *RawErasureCoderBenchmark.java* file. The decoding method is called like this: *decoder.decode(decodeInputs, ERASED_INDEXES, outputs);*. 

Using RS 6+3 configuration it could be called with these parameters correctly like this: *decode([ d0, NULL, d2, d3, NULL, d5, p0, NULL, p2 ], [ 1, 4, 7 ], [ -, -, - ])*. The 1,4,7 indexes are in the *ERASED_INDEXES* array so in the *decodeInputs* array the values at those indexes are set to NULL, all other data and parity packets are present in the array. The *outputs* array's length is 3, where the d1, d4 and p1 packets should be reconstructed. This would be the right solution.

Right now this example would be called like this: *decode([ d0, d1, d2, d3, d4, d5, -, -, - ], [ 1, 4, 7 ], [ -, -, - ])*. So it has two main problems with the *decodeInputs* array. Firstly, the packets are not set to NULL where they should be based on the *ERASED_INDEXES* array. Secondly, it does not have any parity packets for decoding.

The first problem is easy to solve, the values at the proper indexes need to be set to NULL. The latter one is a little harder because right now multiple rounds of encode operations are done one after another and similarly multiple decode operations are called one by one. Encode and decode pairs should be called one after another so that the encoded parity packets can be used in the *decodeInputs* array as a parameter for decode. (Of course, their performance should be still measured separately.)

Moreover, there is one more problem in this file. Right now it works with RS 6+3 and the *ERASED_INDEXES* array is fixed to *[ 6, 7, 8 ]*. So the three parity packets are needed to be reconstructed. This means that no real decode performance is measured because no data packet is needed to be reconstructed (even if the decode works properly). Actually, only new parity packets are needed to be encoded. The exact implementation depends on the underlying erasure coding plugin, but the point is that data packets should also be erased to measure real decode performance.

In addition to this, more RS configurations (not just 6+3) could be measured as well to be able to compare them.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org