You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2016/04/13 17:42:10 UTC
[Impala-CR](cdh5-trunk) PREVIEW: Basic column-wise slot materialization in Parquet scanner.
Alex Behm has uploaded a new change for review.
http://gerrit.cloudera.org:8080/2779
Change subject: PREVIEW: Basic column-wise slot materialization in Parquet scanner.
......................................................................
PREVIEW: Basic column-wise slot materialization in Parquet scanner.
This change is a first step towards a more efficient Parquet scanner.
The focus is on presenting the new code flow that materializes
the table-level slots in a column-wise fashion, without going deep
into actually improving scan efficieny.
After these changes there are several obvious places that should
be optimized to realize efficiency gains.
Summary of changes
- the table-level tuples are materialized in a column-wise fashion
with new ColumnReader::ReadValueBatch() functions
- this is done by materializing a 'scratch' batch, and transferring
scratch tuples that survive filters/conjuncts to the output batch
- the tuples of nested collections are still materialized in
a row-wise fashion using the ColumnReader::ReadValue() function,
just as before
Mini benchmark
I ran the following queries on a single impalad before and after my
change using a synthetic 'huge_lineitem' table.
I modified hdfs-scan-node.cc to set the number of rows of any row
batch to 0 to focus the measurement on the scan time.
Query options:
set num_scanner_threads=1;
set disable_codegen=true;
set num_nodes=1;
select * from huge_lineitem;
Before: 22.39s
Afer: 18.50s
select * from huge_lineitem where l_linenumber < 0;
Before: 25.11s
After: 20.56s
select * from huge_lineitem where l_linenumber % 2 = 0;
Before: 26.32s
After: 21.82s
Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
---
M be/src/exec/hdfs-parquet-scanner.cc
M be/src/exec/hdfs-parquet-scanner.h
2 files changed, 373 insertions(+), 144 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/79/2779/1
--
To view, visit http://gerrit.cloudera.org:8080/2779
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I72a613fa805c542e39df20588fb25c57b5f139aa
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Alex Behm <al...@cloudera.com>