You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2017/10/18 21:31:14 UTC

[Impala-ASF-CR] IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

Hello Lars Volker, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8267

to look at the new patch set (#8).

Change subject: IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding
......................................................................

IMPALA-4177,IMPALA-6039: batched bit reading and rle decoding

Switch the decoders to using more batch-oriented interfaces. As an
intermediate step this doesn't make the interfaces of LevelDecoder
or DictDecoder batch-oriented, only the lower-level utility classes.

The next step would be to change those interfaces to be batch-oriented
and make according optimisations in parquet. This could deliver much
larger perf improvements than the current patch.

The high-level changes are.
* BitReader -> BatchedBitReader, which is built to unpack runs of 32
  bit-packed values efficiently.
* RleDecoder -> RleBatchDecoder, which exposes the repeated and literal
  runs to the caller and uses BatchedBitReader to unpack literal runs
  efficiently.
* Dict decoding uses RleBatchDecoder to decode repeated runs efficiently
  and uses the BitPacking utilities to unpack and encode in a single
  step.

Also removes an older benchmark that isn't too interesting (since
the batch-oriented approach to encoding and decoding is so much
faster than the value-by-value approach).

Testing:
* Ran core tests.
* Updated unit tests to exercise new code.
* Added test coverage for the deprecated bit-packed level encoding to
  that it still works (there was no coverage previously).

Perf:
Single-node benchmarks showed a few % performance gain. 16 node cluster
benchmarks only showed a gain for TPC-H nested.

Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
---
M be/src/benchmarks/CMakeLists.txt
M be/src/benchmarks/bit-packing-benchmark.cc
D be/src/benchmarks/rle-benchmark.cc
M be/src/exec/parquet-column-readers.cc
M be/src/exec/parquet-column-readers.h
D be/src/experiments/bit-stream-utils.8byte.h
D be/src/experiments/bit-stream-utils.8byte.inline.h
M be/src/util/bit-packing.h
M be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/dict-encoding.h
M be/src/util/dict-test.cc
M be/src/util/parquet-reader.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
M testdata/data/README
A testdata/data/alltypes_agg_bitpacked_def_levels.parquet
M tests/query_test/test_scanners.py
19 files changed, 1,149 insertions(+), 946 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/8267/8
-- 
To view, visit http://gerrit.cloudera.org:8080/8267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35de0cf80c86f501c4a39270afc8fb8111552ac6
Gerrit-Change-Number: 8267
Gerrit-PatchSet: 8
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Lars Volker <lv...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>