You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by "Yiqun Zhang (Jira)" <ji...@apache.org> on 2021/08/10 09:24:00 UTC

[jira] [Created] (ORC-931) Optimize RunLengthIntegerWriterV2 code for better readability

Yiqun Zhang created ORC-931:
-------------------------------

             Summary: Optimize RunLengthIntegerWriterV2 code for better readability
                 Key: ORC-931
                 URL: https://issues.apache.org/jira/browse/ORC-931
             Project: ORC
          Issue Type: Improvement
            Reporter: Yiqun Zhang


RunLengthIntegerWriterV2.java
512-546 line
{code:java}
  if (diffBitsLH > 1) {
      for (int i = 0; i < numLiterals; i++) {
        baseRedLiterals[i] = literals[i] - min;
      }
      brBits95p = utils.percentileBits(baseRedLiterals, 0, numLiterals, 0.95);
      brBits100p = utils.percentileBits(baseRedLiterals, 0, numLiterals, 1.0);
      if ((brBits100p - brBits95p) != 0 && Math.abs(min) < BASE_VALUE_LIMIT) {
        encoding = EncodingType.PATCHED_BASE;
        preparePatchedBlob();
        return;
      } else {
        encoding = EncodingType.DIRECT;
        return;
      }
    } else {
      // if difference in bits between 95th percentile and 100th percentile is
      // 0, then patch length will become 0. Hence we will fallback to direct
      encoding = EncodingType.DIRECT;
      return;
    }
{code}
All three conditional branch logics have been completed and the return statement is redundant.

691-704 line
{code:java}
      if (fixedRunLength < MIN_REPEAT) {
          variableRunLength = fixedRunLength;
          fixedRunLength = 0;
          determineEncoding();
          writeValues();
        } else if (fixedRunLength >= MIN_REPEAT
            && fixedRunLength <= MAX_SHORT_REPEAT_LENGTH) {
          encoding = EncodingType.SHORT_REPEAT;
          writeValues();
        } else {
          encoding = EncodingType.DELTA;
          isFixedDelta = true;
          writeValues();
        }
{code}
fixedRunLength >= MIN_REPEAT is redundant, the previous condition already ensures this.  Extract the writeValues() method to the end. It seems better for conditional judgements to deal only with encoding and state.

772-781 line
{code:java}
          if (fixedRunLength >= MIN_REPEAT) {
            if (fixedRunLength <= MAX_SHORT_REPEAT_LENGTH) {
              encoding = EncodingType.SHORT_REPEAT;
              writeValues();
            } else {
              encoding = EncodingType.DELTA;
              isFixedDelta = true;
              writeValues();
            }
          }
{code}
Ditto



--
This message was sent by Atlassian Jira
(v8.3.4#803005)