You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2019/03/13 04:46:49 UTC
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Hello Impala Public Jenkins,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/12741
to review the following change.
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.
Testing was carried out independently on parquet and non-parquet
scans:
1. Parquet scans were tested via the existing parquet-filtering
planner test. Additionally, a new test
[parquet-filtering-disabled] was added to ensure that the
explain plan generated skips parquet predicates based on the
query options.
2. Non-parquet scans were tested manually to ensure that the
functions to compute parquet conjucts were not invoked.
Additional test cases were added to the parquet-filtering
planner test to scan non parquet tables and ensure that the
plans do not contain conjuncts based on parquet statistics.
3. A parquet partition was added to the alltypesmixedformat
table in the functional database. Planner tests were added
to ensure that Parquet conjuncts are constructed only when
the Parquet partition is included in the query.
Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/bin/create-load-data.sh
M testdata/bin/load-dependent-tables.sql
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering-disabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M tests/query_test/test_rows_availability.py
14 files changed, 482 insertions(+), 34 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/12741/1
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
Patch Set 1:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/12741/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:
http://gerrit.cloudera.org:8080/#/c/12741/1/testdata/bin/create-load-data.sh@313
PS1, Line 313: hadoop fs -cp /test-warehouse/alltypes_parquet/year=2009/month=4/ /tmp/alltypes_parquet/year=2009
line too long (99 > 90)
http://gerrit.cloudera.org:8080/#/c/12741/1/tests/query_test/test_rows_availability.py
File tests/query_test/test_rows_availability.py:
http://gerrit.cloudera.org:8080/#/c/12741/1/tests/query_test/test_rows_availability.py@109
PS1, Line 109: \
flake8: E502 the backslash is redundant between brackets
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 04:47:45 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
Patch Set 1:
Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3904/ DRY_RUN=false
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 06:28:19 +0000
Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
Patch Set 1: Verified+1
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 10:17:31 +0000
Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
Patch Set 1:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/2422/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 05:25:43 +0000
Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
Patch Set 1:
Conflicts:
testdata/bin/create-load-data.sh
302 function copy-and-load-dependent-tables {
303 # COPY
304 # TODO: The multi-format table will move these files. So we need to copy them to a
305 # temporary location for that table to use. Should find a better way to handle this.
306 echo COPYING AND LOADING DATA FOR DEPENDENT TABLES
307 hadoop fs -rm -r -f /test-warehouse/alltypesmixedformat \
308 /tmp/alltypes_rc /tmp/alltypes_seq /tmp/alltypes_parquet
309 hadoop fs -mkdir -p /tmp/alltypes_seq/year=2009 \
310 <<<<<<< HEAD
311 /tmp/alltypes_rc/year=2009
312 =======
313 /tmp/alltypes_rc/year=2009 /tmp/alltypes_parquet/year=2009
314
315 # The file written by hive to /test-warehouse will be strangely replicated rather than
316 # erasure coded if EC is not set in /tmp
317 if [[ -n "${HDFS_ERASURECODE_POLICY:-}" ]]; then
318 hdfs ec -setPolicy -policy "${HDFS_ERASURECODE_POLICY}" -path "/tmp/alltypes_rc"
319 hdfs ec -setPolicy -policy "${HDFS_ERASURECODE_POLICY}" -path "/tmp/alltypes_seq"
320 hdfs ec -setPolicy -policy "${HDFS_ERASURECODE_POLICY}" -path "/tmp/alltypes_parquet"
321 fi
322
323 >>>>>>> c6f9b61... IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
324 hadoop fs -cp /test-warehouse/alltypes_seq/year=2009/month=2/ /tmp/alltypes_seq/year=2009
325 hadoop fs -cp /test-warehouse/alltypes_rc/year=2009/month=3/ /tmp/alltypes_rc/year=2009
326 hadoop fs -cp /test-warehouse/alltypes_parquet/year=2009/month=4/ /tmp/alltypes_parquet/year=2009
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 04:56:19 +0000
Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
Patch Set 1: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Mar 2019 17:59:32 +0000
Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12741 )
Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................
IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.
Testing was carried out independently on parquet and non-parquet
scans:
1. Parquet scans were tested via the existing parquet-filtering
planner test. Additionally, a new test
[parquet-filtering-disabled] was added to ensure that the
explain plan generated skips parquet predicates based on the
query options.
2. Non-parquet scans were tested manually to ensure that the
functions to compute parquet conjucts were not invoked.
Additional test cases were added to the parquet-filtering
planner test to scan non parquet tables and ensure that the
plans do not contain conjuncts based on parquet statistics.
3. A parquet partition was added to the alltypesmixedformat
table in the functional database. Planner tests were added
to ensure that Parquet conjuncts are constructed only when
the Parquet partition is included in the query.
Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/12741
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/bin/create-load-data.sh
M testdata/bin/load-dependent-tables.sql
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering-disabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M tests/query_test/test_rows_availability.py
14 files changed, 482 insertions(+), 34 deletions(-)
Approvals:
Impala Public Jenkins: Verified
Tim Armstrong: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>