You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Daniel Becker (Code Review)" <ge...@cloudera.org> on 2019/04/24 13:12:24 UTC

[Impala-ASF-CR] IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types

Daniel Becker has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/12985 )

Change subject: IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types
......................................................................

IMPALA-8381: Optimize ParquetPlainEncoder::DecodeBatch() for simple types

Refactored the ParquetPlainEncoder::Decode() and
ParquetPlainEncoder::DecodeBatch() methods to increase performance in
batch decoding.

The `Decode` and `DecodeBatch` methods retain their behaviour and
outward interface, but the internal structure changes.

We change how we split up the `Decode` template specialisations. The
generic unspecialised template is used for numerical parquet types
(INT32, INT64, INT96, FLOAT and DOUBLE) and various specialisations are
used for BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY.

We add a new method template, DecodeNoCheck, which does the actual
decoding without bounds checking. It is called by the generic Decode
method template internally. For all parquet types except for BYTE_ARRAY,
DecodeBatch performs the bounds check once for the whole batch at the
same time and calls DecodeNoCheck, so we save the cost of bounds
checking for every decoded value. For BYTE_ARRAY, this cannot be done
and we have to perform the checks for every value.

In the non-BYTE_ARRAY version of DecodeBatch, we explicitly unroll the
loop in batches of 8 to increase performance.

The overall performance increase is up to 2x for small strides (8 bytes,
INT32) but decreases as the stride increases, and disappears from around
40 bytes.

Testing:
  Added tests to parquet-plain-test.cc to test the `Decode` and the
  `DecodeBatch` methods both in single-value decoding and batch
  decoding.

Change-Id: I57b7d2573bb6dfd038e581acb3bd8ea1565aa20d
---
M be/src/exec/parquet/parquet-common.h
M be/src/exec/parquet/parquet-plain-test.cc
A be/src/testutil/random-vector-generators.h
3 files changed, 425 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/85/12985/9
-- 
To view, visit http://gerrit.cloudera.org:8080/12985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I57b7d2573bb6dfd038e581acb3bd8ea1565aa20d
Gerrit-Change-Number: 12985
Gerrit-PatchSet: 9
Gerrit-Owner: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <da...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>