You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2016/02/26 01:36:18 UTC
[jira] [Created] (HIVE-13161) ORC: Always do sloppy overlaps for
DiskRanges
Gopal V created HIVE-13161:
------------------------------
Summary: ORC: Always do sloppy overlaps for DiskRanges
Key: HIVE-13161
URL: https://issues.apache.org/jira/browse/HIVE-13161
Project: Hive
Issue Type: Bug
Affects Versions: 1.3.0, 2.1.0
Reporter: Gopal V
Assignee: Prasanth Jayachandran
The selected columns are sometimes only a few bytes apart (particularly for nulls which compresses tightly) and the reads aren't merged
The WORST_UNCOMPRESSED_SLOP is only applied in the PPD case and is applied more for safety than reducing total number of round-trip calls to filesystem.
{code}
/**
* Update the disk ranges to collapse adjacent or overlapping ranges. It
* assumes that the ranges are sorted.
* @param ranges the list of disk ranges to merge
*/
static void mergeDiskRanges(List<DiskRange> ranges) {
DiskRange prev = null;
for(int i=0; i < ranges.size(); ++i) {
DiskRange current = ranges.get(i);
if (prev != null && overlap(prev.offset, prev.end,
current.offset, current.end)) {
prev.offset = Math.min(prev.offset, current.offset);
prev.end = Math.max(prev.end, current.end);
ranges.remove(i);
i -= 1;
} else {
prev = current;
}
}
}
...
private static boolean overlap(long leftA, long rightA, long leftB, long rightB) {
if (leftA <= leftB) {
return rightA >= leftB;
}
return rightB >= leftA;
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)