You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "lichaoyong (Jira)" <ji...@apache.org> on 2020/12/22 04:12:00 UTC

[jira] [Commented] (ORC-703) RLE encoding bug on large negative integer

    [ https://issues.apache.org/jira/browse/ORC-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253264#comment-17253264 ] 

lichaoyong commented on ORC-703:
--------------------------------

The following data can be used to reproduced the problem.
```
-17887939293638656
-15605417571528704
-15605417571528704
-13322895849418752
-13322895849418752
-84742859065569280
-15605417571528704
-13322895849418752
-13322895849418752
-15605417571528704
-13322895849418752
-13322895849418752
-15605417571528704
-15605417571528704
-13322895849418752
-13322895849418752
-15605417571528704
-15605417571528704
-13322895849418752
-13322895849418752
-11040374127308800
-15605417571528704
-13322895849418752
-13322895849418752
-15605417571528704
-15605417571528704
-13322895849418752
-13322895849418752
-15605417571528704
-13322895849418752
```

> RLE encoding bug on large negative integer
> ------------------------------------------
>
>                 Key: ORC-703
>                 URL: https://issues.apache.org/jira/browse/ORC-703
>             Project: ORC
>          Issue Type: Bug
>            Reporter: lichaoyong
>            Priority: Major
>
> ORC has use RLE to encoding/decoding integer.
> Four types are comprised of the RLE encoding/decoding algorithm.
> Short Repeat : used for short repeating integer sequences.
> Direct : used for integer sequences whose values have a relatively constant bit width.
> Patched Base : used for integer sequences whose bit widths varies a lot.
> Delta : used for monotonically increasing or decreasing sequences.
> This bug occurs in Patched Base Type for large negative number.
> In patched base, base value is stored 1 to 8 bytes and encoding to 0 ~ 7.
> If the base value is 8 byte, the encoding value for base width should be 7.
> But now will encoding to 8, this is problem.
> It will result in inconsistent data with loaded data because wrong encoding procedure.
> In extreme case, the process will be cored dump because illegal address.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)