You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "lichaoyong (Jira)" <ji...@apache.org> on 2020/12/22 04:12:00 UTC

[jira] [Updated] (ORC-703) RLE encoding bug on large negative integer

     [ https://issues.apache.org/jira/browse/ORC-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

lichaoyong updated ORC-703:
---------------------------
    Description: 
ORC has use RLE to encoding/decoding integer.
Four types are comprised of the RLE encoding/decoding algorithm.
Short Repeat : used for short repeating integer sequences.
Direct : used for integer sequences whose values have a relatively constant bit width.
Patched Base : used for integer sequences whose bit widths varies a lot.
Delta : used for monotonically increasing or decreasing sequences.

This bug occurs in Patched Base Type for large negative number.
In patched base, base value is stored 1 to 8 bytes and encoding to 0 ~ 7.
If the base value is 8 byte, the encoding value for base width should be 7.
But now will encoding to 8, this is problem.
It will result in inconsistent data with loaded data because wrong encoding procedure.
In extreme case, the process will be cored dump because illegal address.

> RLE encoding bug on large negative integer
> ------------------------------------------
>
>                 Key: ORC-703
>                 URL: https://issues.apache.org/jira/browse/ORC-703
>             Project: ORC
>          Issue Type: Bug
>            Reporter: lichaoyong
>            Priority: Major
>
> ORC has use RLE to encoding/decoding integer.
> Four types are comprised of the RLE encoding/decoding algorithm.
> Short Repeat : used for short repeating integer sequences.
> Direct : used for integer sequences whose values have a relatively constant bit width.
> Patched Base : used for integer sequences whose bit widths varies a lot.
> Delta : used for monotonically increasing or decreasing sequences.
> This bug occurs in Patched Base Type for large negative number.
> In patched base, base value is stored 1 to 8 bytes and encoding to 0 ~ 7.
> If the base value is 8 byte, the encoding value for base width should be 7.
> But now will encoding to 8, this is problem.
> It will result in inconsistent data with loaded data because wrong encoding procedure.
> In extreme case, the process will be cored dump because illegal address.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)