You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/03/05 13:49:00 UTC

[jira] [Commented] (IMPALA-6506) Codegen in ORC scanner

    [ https://issues.apache.org/jira/browse/IMPALA-6506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052154#comment-17052154 ] 

ASF subversion and git services commented on IMPALA-6506:
---------------------------------------------------------

Commit 0b081aef3fc681b40c9cc45e0387bf7dd84358a9 in impala's branch refs/heads/master from Gabor Kaszab
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0b081ae ]

IMPALA-6506: Codegen in ORC scanner for primitives and struct

IMPALA-9228 introduced scratch batch handling for struct and
primitive types in the ORC scanner and the existing scratch batch
logic already supports Codegen for ProcessScratchBatch() function.
This change turns on this Codegen logic for primitives types and
structs in the ORC scanner.

Note, if the query involves collection types then
ProcessScratchBatch() is still codegend but the codegend function
isn't used as the regular row-by-row approach is followed in this
case without using a scratch batch.

Testing:
  - Re-run the whole test suite to check for regressions.
  - Checked the performance on a scale 25 TPCH workload in ORC format
    using single_node_perf_run.py. Comparing the query runtimes it
    seems that codegen brings a 1-21% improvement for most of the
    queries. There is a slight decrease in 3 queries that are not
    scan-heavy where codegen doesn't provide any help for scanning.
    However, these are short queries where the size of the
    degradation is in subseconds so I'd say the decrease is
    negligible.
  - Did a manual check for a table that contains both Parquet and ORC
    partitions. Verified that in this case ProcessScratchBatch() is
    codegend for both formats and the query results are as expected.

Change-Id: I2352d0c8fc75ff722e931bc8c866b3e43d3636f4
Reviewed-on: http://gerrit.cloudera.org:8080/15350
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Codegen in ORC scanner
> ----------------------
>
>                 Key: IMPALA-6506
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6506
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>            Reporter: Quanlong Huang
>            Assignee: Gabor Kaszab
>            Priority: Major
>
> Currently, the orc-scanner materializes tuples from the orc-reader (ORC lib). We need a Codegen implementation in TransferScratchTuples for the runtime filter + conjunct evaluation loop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org