You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2016/09/01 19:44:54 UTC

[Impala-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

Hello Michael Ho, Internal Jenkins, Dan Hecht,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/4237

to review the following change.

Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.
......................................................................

IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

The Bug: Prior to this patch, a DCHECK was used to verify that the
underlying memory pool for the scratch batch was empty in a count based
scenario. For IMPALA-3964 (where a count(*) is performed on a nested
collection), if a Parquet column chunk is compressed, upon reading each
new data page it would be decompressed and eventually placed in to the
underlying scratch batch memory pool causing the aforementioned DCHECK
to fail. This was not picked up in the test suite as the TPCH nested
Parquet data is not compressed.

The Fix: Removed the erroneous DCHECK. Added logic to determine if any
memory in the scratch batch needs to be freed (due to the transfer that
occurs from the decompressed data pool), if so, it will be done.
Augmented the load_nested.py script to snappy compress each of the
tables within the 'tpch_nested_parquet' database. This is consistent with
how the flat TPCH Parquet data set is stored. Regarding test coverage,
there are already a number of tests that will perform nested collection
counts against the tables in the 'tpch_nested_parquet' database. For
uncompressed nested Parquet, the 'test_nested_types.py' test suite
leverages the 'ComplexTypesTbl' table to provide good coverage.

Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Reviewed-on: http://gerrit.cloudera.org:8080/3940
Reviewed-by: Michael Ho <kw...@cloudera.com>
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Internal Jenkins
(cherry picked from commit 90a6b3206e0b522650bfe21c5754c15d009f708c)
---
M be/src/exec/hdfs-parquet-scanner.cc
M testdata/bin/load_nested.py
2 files changed, 7 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/37/4237/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Christopher Channing <cc...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-CR] IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has abandoned this change.

Change subject: IMPALA-3964: Fix crash when a count(*) is performed on a nested collection.
......................................................................


Abandoned

-- 
To view, visit http://gerrit.cloudera.org:8080/4237
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: Id0955c85d18dfba4bd29a35ec95d0355da050607
Gerrit-PatchSet: 1
Gerrit-Project: Impala
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Christopher Channing <cc...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>