You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2020/12/01 16:54:46 UTC

[Impala-ASF-CR] IMPALA-10360: Allow simple limit to be treated as sampling hint

Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/16792 )

Change subject: IMPALA-10360: Allow simple limit to be treated as sampling hint
......................................................................


Patch Set 2:

(6 comments)

Looks good to me!

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
File fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java@79
PS2, Line 79:  if (Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e1) &&
            :           Expr.IS_ALWAYS_TRUE_PREDICATE.apply(e2)) {
            :         setHasAlwaysTrueHint(true);
Maybe better case on the op here. For OR, when one branch is always true, we can set setHasAlwaysTrueHint(true).


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/Expr.java@520
PS2, Line 520: if (conjuncts.size() > 1) {
Seems we should do so even when for a single predicate case.


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@223
PS2, Line 223: if (getTableRefs().size() == 1)
It seems to me that we should check hasConvertLimitToSampleHint() on a single table case.

SELECT * FROM T WHERE ...

vs. 

SELECT * FROM T [convert_limit_to_sample] WHERE ...


http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
File fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java:

http://gerrit.cloudera.org:8080/#/c/16792/2/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java@209
PS2, Line 209: estimatedTotalRows
When the #rows are indeed estimated due to missing or corrupt stats, it may be useful to inflate the sampling rate, say by 3 times to reduce the chance of under-sampling, or to discard the table for consideration completely.

https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java#L1219


http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test
File testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test:

http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@259
PS2, Line 259: # Query against a view that has WHERE clause hint.
             : # Both the num partitions and files are pruned.
             : select * from functional.alltypes_dp_2_view_1 limit 10;
             : ---- PLAN
Nice test queries!


http://gerrit.cloudera.org:8080/#/c/16792/2/testdata/workloads/functional-planner/queries/PlannerTest/optimize-simple-limit.test@275
PS2, Line 275: select * from functional.alltypes_dp_2_view_2 limit 10;
May consider add one or two queries against tables directly (i.e., no views).



-- 
To view, visit http://gerrit.cloudera.org:8080/16792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife05a5343c913006f7659949b327b63d3f10c04b
Gerrit-Change-Number: 16792
Gerrit-PatchSet: 2
Gerrit-Owner: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Shant Hovsepian <sh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Tue, 01 Dec 2020 16:54:46 +0000
Gerrit-HasComments: Yes