You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2020/05/19 14:14:21 UTC

[Impala-ASF-CR] IMPALA-8834: Short-circuit partition key scan

Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/13993 )

Change subject: IMPALA-8834: Short-circuit partition key scan
......................................................................


Patch Set 16:

(5 comments)

Great work! Have some questions about tests.

http://gerrit.cloudera.org:8080/#/c/13993/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13993/16//COMMIT_MSG@17
PS16, Line 17: THe
nit: The


http://gerrit.cloudera.org:8080/#/c/13993/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/13993/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1135
PS16, Line 1135: 100
nit: should this magic number be replaced with analyzer.getQueryOptions().exec_single_node_rows_threshold?


http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
File testdata/workloads/functional-planner/queries/PlannerTest/distinct.test:

http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-planner/queries/PlannerTest/distinct.test@526
PS16, Line 526:    partition key scan
              :    row-size=0B cardinality=24
hmm... Did you run the test with some extract options?

If I explain this query with OPTIMIZE_PARTITION_KEY_SCANS=false, the plan is the old one (cardinality=7.30K). If run with OPTIMIZE_PARTITION_KEY_SCANS=true, the plan is optimized to

PLAN-ROOT SINK
|
02:AGGREGATE [FINALIZE]
|  output: count(1)
|  having: count(1) IS NOT NULL
|  row-size=8B cardinality=1
|
01:AGGREGATE
|  group by: 1
|  row-size=1B cardinality=1
|
00:UNION
   constant-operands=1
   row-size=0B cardinality=1


http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test
File testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test:

http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test@13
PS16, Line 13: aggregation(SUM, RowsReturned): 48
Are we checking 48 == 2 * NumFiles? If so, how does the factor (2) come? Should we check the "RowsRead" instead just like line 25?


http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test@30
PS16, Line 30: order by year, month
nit: I think we don't need the ORDER BY clause. The test framework will sort the results: https://github.com/apache/impala/blob/76e4a17fb379bb232618dccb4ad3504dbe5c945c/tests/common/test_result_verifier.py#L58

With the ORDER BY clause, this test is identical with the one at line 77.



-- 
To view, visit http://gerrit.cloudera.org:8080/13993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c
Gerrit-Change-Number: 13993
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 19 May 2020 14:14:21 +0000
Gerrit-HasComments: Yes