You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2016/09/01 19:44:47 UTC

[Impala-CR] IMPALA-3376: Extra definition level when writing Parquet files

Hello Matthew Jacobs, Internal Jenkins,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4207

to review the following change.

Change subject: IMPALA-3376: Extra definition level when writing Parquet files
......................................................................

IMPALA-3376: Extra definition level when writing Parquet files

Currently, when writing a new value to a parquet file, we write
the definition level before checking if there's enough space on
the current page for the value. If there isn't, we create a new
page and rewrite the definition level to it, but this leaves the
definition level for that value still written to the old page.

To fix this, we should make sure that we have enough space to write
both the definition level and the value before writing either.

This patch also modifies the parquet-reader tool, which reads
parquet files and performs minimal sanity checking on their
metadata, to check for extra definition levels, and adds a test
that runs the tool automatically.

Change-Id: I20f25a90aa1ef74b4f00f38f832bc1c1853342c6
Reviewed-on: http://gerrit.cloudera.org:8080/3835
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Internal Jenkins
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
(cherry picked from commit 45d059855b4afc2d5730c5e7ecaed7141993f354)
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/util/parquet-reader.cc
M be/src/util/rle-encoding.h
M tests/query_test/test_insert_parquet.py
4 files changed, 137 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/07/4207/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I20f25a90aa1ef74b4f00f38f832bc1c1853342c6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-CR] IMPALA-3376: Extra definition level when writing Parquet files

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change.

Change subject: IMPALA-3376: Extra definition level when writing Parquet files
......................................................................


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4207
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I20f25a90aa1ef74b4f00f38f832bc1c1853342c6
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Matthew Jacobs <mj...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>