You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2016/02/26 01:36:18 UTC

[jira] [Created] (HIVE-13161) ORC: Always do sloppy overlaps for DiskRanges

Gopal V created HIVE-13161:
------------------------------

             Summary: ORC: Always do sloppy overlaps for DiskRanges
                 Key: HIVE-13161
                 URL: https://issues.apache.org/jira/browse/HIVE-13161
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.3.0, 2.1.0
            Reporter: Gopal V
            Assignee: Prasanth Jayachandran


The selected columns are sometimes only a few bytes apart (particularly for nulls which compresses tightly) and the reads aren't merged 

The WORST_UNCOMPRESSED_SLOP is only applied in the PPD case and is applied more for safety than reducing total number of round-trip calls to filesystem.

{code}
 /**
   * Update the disk ranges to collapse adjacent or overlapping ranges. It
   * assumes that the ranges are sorted.
   * @param ranges the list of disk ranges to merge
   */
  static void mergeDiskRanges(List<DiskRange> ranges) {
    DiskRange prev = null;
    for(int i=0; i < ranges.size(); ++i) {
      DiskRange current = ranges.get(i);
      if (prev != null && overlap(prev.offset, prev.end,
          current.offset, current.end)) {
        prev.offset = Math.min(prev.offset, current.offset);
        prev.end = Math.max(prev.end, current.end);
        ranges.remove(i);
        i -= 1;
      } else {
        prev = current;
      }
    }
  }
...
  private static boolean overlap(long leftA, long rightA, long leftB, long rightB) {
    if (leftA <= leftB) {
      return rightA >= leftB;
    }
    return rightB >= leftA;
  }

{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)