You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@orc.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2021/10/08 02:13:00 UTC

[jira] [Commented] (ORC-1020) Improve orc::RleDecoderV2::nextDirect

    [ https://issues.apache.org/jira/browse/ORC-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425914#comment-17425914 ] 

Dongjoon Hyun commented on ORC-1020:
------------------------------------

Thank you for filing a JIRA and sharing the result, [~stigahuang]. Go for it~

> Improve orc::RleDecoderV2::nextDirect
> -------------------------------------
>
>                 Key: ORC-1020
>                 URL: https://issues.apache.org/jira/browse/ORC-1020
>             Project: ORC
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: orc-scan-release-lineitem-random-bigint-snappy.svg
>
>
> This is found by [~drorke] that orc::RleDecoderV2::nextDirect takes the majority of the time when scanning bigint columns. I reproduce the issue by using the orc-scan tool to read the random bigint columns of a TPCH lineitem table. In the attached frame graph, 91.89% of the time is spent in orc::RleDecoderV2::nextDirect. Only a small portion of it is used in snappy decompression.
> Note that orc::RleDecoderV2::nextDirect is also used in other column types, e.g. dictionary encoded string columns. So improving it can boost performance in many scenarios.
> We should consider unrolling the loop in orc::RleDecoderV2::readLongs. There is already a TODO: [https://github.com/apache/orc/blob/93af6b076c210b0c3b77e5af3d6fbef1bd1150a1/c%2B%2B/src/RLEv2.hh#L186]
> [~csringhofer] also points out that we can borrow some ideas done in Impala for bit unpacking: [https://github.com/apache/impala/blob/b28da054f3595bb92873433211438306fc22fbc7/be/src/util/bit-packing.inline.h#L60]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)