You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Owen O'Malley (Jira)" <ji...@apache.org> on 2020/04/24 00:00:00 UTC

[jira] [Resolved] (ORC-616) In Patched Base encoding, the value of headerThirdByte goes beyond the range of byte

     [ https://issues.apache.org/jira/browse/ORC-616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved ORC-616.
-------------------------------
    Fix Version/s: 1.6.3
                   1.5.10
                   1.4.6
       Resolution: Fixed

I just committed this. Thanks, Ruochen!

> In Patched Base encoding, the value of headerThirdByte goes beyond the range of byte
> ------------------------------------------------------------------------------------
>
>                 Key: ORC-616
>                 URL: https://issues.apache.org/jira/browse/ORC-616
>             Project: ORC
>          Issue Type: Bug
>          Components: Java
>    Affects Versions: master
>            Reporter: Ruochen Zou
>            Assignee: Ruochen Zou
>            Priority: Critical
>              Labels: RLE
>             Fix For: 1.4.6, 1.5.10, 1.6.3
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> In Patched Base encoding, the first three bits of headerThirdByte represent the base value width. If Math.abs(min) greater than or equal to 1 << 56, the value of baseBytes is 9, and the value of bb goes beyond range fo byte.
> {code:java}
> final boolean isNegative = min < 0 ? true : false;
> if (isNegative) {
>   min = -min;
> }
> // find the number of bytes required for base and shift it by 5 bits
> // to accommodate patch width. The additional bit is used to store the sign
> // of the base value.
> final int baseWidth = utils.findClosestNumBits(min) + 1;
> final int baseBytes = baseWidth % 8 == 0 ? baseWidth / 8 : (baseWidth / 8) + 1;
> final int bb = (baseBytes - 1) << 5;
> // if the base value is negative then set MSB to 1
> if (isNegative) {
>   min |= (1L << ((baseBytes * 8) - 1));
> }
> // third byte contains 3 bits for number of bytes occupied by base
> // and 5 bits for patchWidth
> final int headerThirdByte = bb | utils.encodeBitWidth(patchWidth);
> {code}
> The byte to be written is the eight low-order bits of the headerThirdByte, the value read by RunLengthIntegerReaderV2 is incorrect, as well as data of the column is unexpected.
> {code:java}
> // extract the number of bytes occupied by base
> int thirdByte = input.read();
> int bw = (thirdByte >>> 5) & 0x07;
> // base width is one off
> bw += 1;
> {code}
> In some cases, RunLengthIntegerReaderV2 fails with EOFExeption.
> {code:java}
> Caused by: java.io.EOFException: Read past end of RLE integer from compressed stream Stream for column 2 kind DATA position: 3213835 length: 3213835 range: 0 offset: 3217373 limit: 3217373 range 0 = 0 to 3213835 uncompressed: 184478 to 184478
>         at org.apache.orc.impl.RunLengthIntegerReaderV2.readValues(RunLengthIntegerReaderV2.java:61)
>         at org.apache.orc.impl.RunLengthIntegerReaderV2.next(RunLengthIntegerReaderV2.java:323)
>         at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
>         at org.apache.orc.impl.TreeReaderFactory$LongTreeReader.nextVector(TreeReaderFactory.java:587)
>         at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1815)
>         at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1184)
>         ... 20 more
> {code}
> For example, consider the following sequence:
> {code:java}
> long data[] = {-9007199254740992l,-8725724278030337l,-1125762467889153l,-1l,-9007199254740992l,-9007199254740992l,-497l,127l,-1l,-72057594037927936l,-4194304l,-9007199254740992l,-4503599593816065l,-4194304l,-8936830510563329l,-9007199254740992l, -1l, -70334384439312l,-4063233l, -6755399441973249l};
> {code}
> The min value is -72057594037927936(-1 << 56),RLEv2 writes this sequence with Patched Base encoding, and the data read out by RunLengthIntegerReaderV2 is:
> {code:java}
> [281474976710656, 36275087623585792, 247390116249599, 72053196528287743, 72057594037927935, 72022409665839104, 246290604621824, -71776119061217282, 4222124650659840, 36028797018963967, 71776119061217280, 281474976694272, 246290604621824, 263882790797311, 72057594037911552, 246565482528767, 72022409665839104, 281474976710655, 72057319294238719, 67835469387252223]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)