You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2019/03/13 04:46:49 UTC

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Hello Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12741

to review the following change.


Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................

IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.

Testing was carried out independently on parquet and non-parquet
scans:
  1. Parquet scans were tested via the existing parquet-filtering
     planner test. Additionally, a new test
     [parquet-filtering-disabled] was added to ensure that the
     explain plan generated skips parquet predicates based on the
     query options.
  2. Non-parquet scans were tested manually to ensure that the
     functions to compute parquet conjucts were not invoked.
     Additional test cases were added to the parquet-filtering
     planner test to scan non parquet tables and ensure that the
     plans do not contain conjuncts based on parquet statistics.
  3. A parquet partition was added to the alltypesmixedformat
     table in the functional database. Planner tests were added
     to ensure that Parquet conjuncts are constructed only when
     the Parquet partition is included in the query.

Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/bin/create-load-data.sh
M testdata/bin/load-dependent-tables.sql
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering-disabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M tests/query_test/test_rows_availability.py
14 files changed, 482 insertions(+), 34 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/41/12741/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12741/1/testdata/bin/create-load-data.sh
File testdata/bin/create-load-data.sh:

http://gerrit.cloudera.org:8080/#/c/12741/1/testdata/bin/create-load-data.sh@313
PS1, Line 313:   hadoop fs -cp /test-warehouse/alltypes_parquet/year=2009/month=4/ /tmp/alltypes_parquet/year=2009
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/12741/1/tests/query_test/test_rows_availability.py
File tests/query_test/test_rows_availability.py:

http://gerrit.cloudera.org:8080/#/c/12741/1/tests/query_test/test_rows_availability.py@109
PS1, Line 109: \
flake8: E502 the backslash is redundant between brackets



-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 04:47:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................


Patch Set 1:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3904/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 06:28:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................


Patch Set 1: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 10:17:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/2422/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 05:25:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................


Patch Set 1:

Conflicts:
        testdata/bin/create-load-data.sh

302 function copy-and-load-dependent-tables {
303   # COPY
304   # TODO: The multi-format table will move these files. So we need to copy them to a
305   # temporary location for that table to use. Should find a better way to handle this.
306   echo COPYING AND LOADING DATA FOR DEPENDENT TABLES
307   hadoop fs -rm -r -f /test-warehouse/alltypesmixedformat \
308     /tmp/alltypes_rc /tmp/alltypes_seq /tmp/alltypes_parquet
309   hadoop fs -mkdir -p /tmp/alltypes_seq/year=2009 \
310 <<<<<<< HEAD
311     /tmp/alltypes_rc/year=2009
312 =======
313     /tmp/alltypes_rc/year=2009 /tmp/alltypes_parquet/year=2009
314
315   # The file written by hive to /test-warehouse will be strangely replicated rather than
316   # erasure coded if EC is not set in /tmp
317   if [[ -n "${HDFS_ERASURECODE_POLICY:-}" ]]; then
318     hdfs ec -setPolicy -policy "${HDFS_ERASURECODE_POLICY}" -path "/tmp/alltypes_rc"
319     hdfs ec -setPolicy -policy "${HDFS_ERASURECODE_POLICY}" -path "/tmp/alltypes_seq"
320     hdfs ec -setPolicy -policy "${HDFS_ERASURECODE_POLICY}" -path "/tmp/alltypes_parquet"
321   fi
322
323 >>>>>>> c6f9b61... IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
324   hadoop fs -cp /test-warehouse/alltypes_seq/year=2009/month=2/ /tmp/alltypes_seq/year=2009
325   hadoop fs -cp /test-warehouse/alltypes_rc/year=2009/month=3/ /tmp/alltypes_rc/year=2009
326   hadoop fs -cp /test-warehouse/alltypes_parquet/year=2009/month=4/ /tmp/alltypes_parquet/year=2009


-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Mar 2019 04:56:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Mar 2019 17:59:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12741 )

Change subject: IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans
......................................................................

IMPALA-6625: Skip computing parquet conjuncts for non-Parquet scans

This change ensures that the planner computes parquet conjuncts
only when for scans containing parquet files. Additionally, it
also handles PARQUET_DICTIONARY_FILTERING and
PARQUET_READ_STATISTICS query options in the planner.

Testing was carried out independently on parquet and non-parquet
scans:
  1. Parquet scans were tested via the existing parquet-filtering
     planner test. Additionally, a new test
     [parquet-filtering-disabled] was added to ensure that the
     explain plan generated skips parquet predicates based on the
     query options.
  2. Non-parquet scans were tested manually to ensure that the
     functions to compute parquet conjucts were not invoked.
     Additional test cases were added to the parquet-filtering
     planner test to scan non parquet tables and ensure that the
     plans do not contain conjuncts based on parquet statistics.
  3. A parquet partition was added to the alltypesmixedformat
     table in the functional database. Planner tests were added
     to ensure that Parquet conjuncts are constructed only when
     the Parquet partition is included in the query.

Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Reviewed-on: http://gerrit.cloudera.org:8080/10704
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/12741
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/bin/create-load-data.sh
M testdata/bin/load-dependent-tables.sql
M testdata/workloads/functional-planner/queries/PlannerTest/constant-folding.test
M testdata/workloads/functional-planner/queries/PlannerTest/fk-pk-join-detection.test
A testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering-disabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/parquet-filtering.test
M testdata/workloads/functional-planner/queries/PlannerTest/resource-requirements.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/mixed-format.test
M testdata/workloads/functional-query/queries/QueryTest/show-stats.test
M tests/query_test/test_rows_availability.py
14 files changed, 482 insertions(+), 34 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Tim Armstrong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/12741
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I9d6c26d42db090c8a15c602f6419ad6399c329e7
Gerrit-Change-Number: 12741
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <po...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>