You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2020/02/27 19:12:00 UTC

[Impala-ASF-CR] IMPALA-9228: ORC scanner reads rows into scratch batch

Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/15104

to look at the new patch set (#4).

Change subject: IMPALA-9228: ORC scanner reads rows into scratch batch
......................................................................

IMPALA-9228: ORC scanner reads rows into scratch batch

Because of performance considerations this change enhances ORC
scanner to populate a scratch batch on a column-by-column manner
using data from the column readers. Once this is done the parquet
code was reused to apply runtime filter and conjuncts and to
populate the outgoing row batch.

This approach reduces the number of virtual function calls and takes
advantage of the columnar orientation of the data to enhance scan
performance. Additionally, introducing the scratch batch concept also
opens the door for codegen runtime filtering and applying conjuncts.

Tesing:
  - Re-run the full test suite to verify that no regression is
    introduced.
  - Checked the performance impact by running TPCH workload on a
    scale 25 database using single_node_perf_run.py. The total query
    runtime is decreased by 0-20% depending on how scan heavy the
    particular query was. The more scan heavy the query is the more
    performance gain I observe.

Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
R be/src/exec/hdfs-columnar-scanner-ir.cc
A be/src/exec/hdfs-columnar-scanner.cc
A be/src/exec/hdfs-columnar-scanner.h
M be/src/exec/hdfs-orc-scanner.cc
M be/src/exec/hdfs-orc-scanner.h
M be/src/exec/hdfs-scanner.h
M be/src/exec/orc-column-readers.cc
M be/src/exec/orc-column-readers.h
M be/src/exec/parquet/CMakeLists.txt
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
R be/src/exec/scratch-tuple-batch.h
15 files changed, 464 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/04/15104/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15104
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I56db0325dee283d73742ebbae412d19693fac0ca
Gerrit-Change-Number: 15104
Gerrit-PatchSet: 4
Gerrit-Owner: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>