You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "wangsheng (Code Review)" <ge...@cloudera.org> on 2022/08/10 07:57:18 UTC

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

wangsheng has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18829


Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, We need execute 'COMPUTE STATS' manually to compute
table stats info. Stats is very useful for query plan generate.
Without these stats, query plan maybe worse. In order to solve
this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS',
we can use this hint to set table rows, even without table stats
info. We can use this new hint after a hdfs or kudu table in
query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows, even if
table has no stats.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 134 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> Thanks for suggestion, Qifan. 
Yeah, the logic to clean up the hash table could be a little bit complicated. I wonder if the timing to do can be after the analyze(), here.

https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java#L472

In addition, hash table lookup and throw exception can be done inside method analyzeHint().
https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java#L70

Hash table population can be done in the constructor for TableRef.



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 20 Sep 2022 14:57:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 10:

> Hi Quanlong, I submited pre-review-test, but failed in these tests:
 > 
 > authorization.test_ranger.TestRanger.test_show_grant_hive_privilege
 > generate_junitxml.buildall.run-custom-cluster-tests
 > 
 > Seems unrelated to this patch.

Here is the task url: https://jenkins.impala.io/job/pre-review-test/1490/console


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 26 Dec 2022 07:01:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 12: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 04 Jan 2023 08:57:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 9:

> (1 comment)

Hi Qifan, sorry for taking so long to reply. I consider about the different row hint for same table, I think this quite complicated.
We get row hint of each table in TableRef.analyze(). If table ref A without #row hint, and table ref B with #row hint, and they are same physical table in fact. This means, we need to update TableRef_A, after we get row hint from TableRef_B. In this way, after we analyzed all tables, we need to check each table row hints again.
This feature been going on for a long, so I recommand that we ignore the different row hint for same table, and keeping basic function of table row hint. And consider this in another patch. How do you think?


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 20 Dec 2022 13:17:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 5:

(4 comments)

> (5 comments)
 > 
 > Looks good!
 > 
 > Complete the code section and have a few comments.
 > 
 > I have not got a chance to review the test section. If it is
 > similar to the one in the parent JIRA of this JIRA, would it be
 > possible to address any concerns there in this JIRA?

Hi Qifan, thanks for continued interest in this feature. This patch is split from https://gerrit.cloudera.org/#/c/18023. We will focus on table cardinality hint in this patch, and reduce previous patch to focus on selectivity hint.

I've already minimize test cases in this patch. Remove unnecessary plan to keeping test file small. Such as PARALLEL and DISTRIBUTED plan, not exists in test file. Only 61 lines currently. If you are free, you can read it quickly and simply.

http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG@9
PS5, Line 9: Currently, We need execute 'COMPUTE STATS' manually to compute
           : table stats info. Stats is very useful for query planning.
           : Without these stats, query plan maybe worse. In order to solve
           : this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS',
> nit. Rewording.
Done
By the way, 'valid' -> 'invalid'?


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> This new data member in class TableRef implies that two table ref objects m
Do you mean different 'table_num_row' value for same table in sql will cause different cardinality is a problem?
We should raise error for this situation?
Such as: 
select count(1) from functional_parquet.alltypes t1/* +TABLE_NUM_ROWS(1000) */,functional_parquet.alltypes t2/* +TABLE_NUM_ROWS(2000) */ where t1.id = t2.id;


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1547
PS5, Line 1547:  * Currently, we provide a table hint 'TABLE_NUM_ROWS' to set table rows manually if
              :    * table has no stats or has corrupt stats. If the table has valid stats, this hint
              :    * will be ignored.
> In my past experience with very large data warehouse, that the table num ro
Thanks for adivce, Qifan. I agree that we can extend for valid stats later on if necessary.


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@379
PS5, Line 379: no
> nit. has no stats
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 13 Sep 2022 13:00:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, we run 'COMPUTE STATS' command to compute table stats
which is very useful for query planning. Without these stats, a
query plan may not be optimal. However, these stats may not be
available, up to date, or valid. To workaround this problem,
this patch adds a new query hint: 'TABLE_NUM_ROWS', We can use
this new hint after a hdfs or kudu table in query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 164 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> If user set different hint value for same table in sql, this will indeed le
IMHO, that the table references are in different scope should not matter, as they all refer to the same physical table which can't have different row numbers at the same time. 

One possible implementation would be to allocate a static hash table at TableRef class. The hash key is the fully qualified table name and the hash value is the number of row hint value for the table. When a new number of row hint is seen, make sure it is the same in the hash table if present.


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5046
PS6, Line 5046: Syntax error in line 1
> Thanks for testing this, it seem that we can ignore this problem now?
Sure. Your current code and the test are good.



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 16 Sep 2022 12:56:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, we run 'COMPUTE STATS' command to compute table stats
which is very useful for query planning. Without these stats, a
query plan may not be optimal. However, these stats may not be
available, up to date, or valid. To workaround this problem,
this patch adds a new query hint: 'TABLE_NUM_ROWS', We can use
this new hint after a hdfs or kudu table in query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 164 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 8:

(2 comments)

I found that when executing test, the cardinality of functional_parquet.alltypes seems different in each jenkins test.
Refer to: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/17575/testReport/junit/org.apache.impala.planner/PlannerTest/testTableCardinalityHint/
Besides, I found that cardinality of functional_parquet.alltypes are different in test file, such as: mt-dop-validation.test, parquet-filtering.test, parquet-stats-agg.test. Since these test cases not use VALIDATE_CARDINALITY.
But this seems not happen to functional.alltypes, why?

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> IMHO, that the table references are in different scope should not matter, a
Thanks for suggestion, Qifan. 
I try this may, but find a problem. We can add a static map to reserve each involved table and related row hint. Bug when do we clean this map?
If we do not clean this map, when submit same query second time, this map already contains involved table in first submit and not been clean. Then second submit will invalid rows hint.
We need to clean this map after sql parse complete, but it seem that we don't know when does sql parse end in TableRef.java


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5046
PS6, Line 5046: Syntax error in line 1
> Sure. Your current code and the test are good.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 20 Sep 2022 08:26:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 9: Code-Review+1

Hi Wangsheng, 

Sure. To separate the integrity check into another JIRA sounds good to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 20 Dec 2022 14:55:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> Yeah, the logic to clean up the hash table could be a little bit complicate
Thanks for suggestion, Qifan. This seems reasonable.
I suddenly thought of another question, if multiple queries are submitted concurrently, there maybe race condition for this map. One sql is parse sql, another sql parse complete, and then clean map, sql on parsing phase would lost the table and hint info.

It seem that put a static map in 'TableRef.java' is not suitable. It should belong to AnalysisResult or Analyzer? I'm not sure. What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 23 Sep 2022 12:25:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, We need execute 'COMPUTE STATS' manually to compute
table stats info. Stats is very useful for query plan generate.
Without these stats, query plan maybe worse. In order to solve
this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS',
we can use this hint to set table rows, even without table stats
info. We can use this new hint after a hdfs or kudu table in
query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows, even if
table has no stats.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 135 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11364/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 16 Sep 2022 03:06:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11191/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 13:23:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, We need execute 'COMPUTE STATS' manually to compute
table stats info. Stats is very useful for query planning.
Without these stats, query plan maybe worse. In order to solve
this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS',
We can use this new hint after a hdfs or kudu table in query
like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 160 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11145/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Thu, 11 Aug 2022 16:41:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11128/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Wed, 10 Aug 2022 08:18:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8943/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 04 Jan 2023 03:44:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11358/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Sep 2022 07:52:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, we run 'COMPUTE STATS' command to compute table stats
which is very useful for query planning. Without these stats, a
query plan may not be optimal. However, these stats may not be
available, up to date, or valid. To workaround this problem,
this patch adds a new query hint: 'TABLE_NUM_ROWS', We can use
this new hint after a hdfs or kudu table in query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 146 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11313/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 08 Sep 2022 13:04:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18829/5/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
File testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test:

http://gerrit.cloudera.org:8080/#/c/18829/5/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@11
PS5, Line 11: 10000
Maybe in the future we could allow short-hand notation such as 100K, 10G etc.

Copied from https://gerrit.cloudera.org/#/c/18023/, by Qifan Chen.



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 13 Sep 2022 13:02:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, we run 'COMPUTE STATS' command to compute table stats
which is very useful for query planning. Without these stats, a
query plan may not be optimal. However, these stats may not be
available, up to date, or valid. To workaround this problem,
this patch adds a new query hint: 'TABLE_NUM_ROWS', We can use
this new hint after a hdfs or kudu table in query like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Reviewed-on: http://gerrit.cloudera.org:8080/18829
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 146 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 3:

(13 comments)

Thanks for spliting the patch! It's much easier to review and iterate.

http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@10
PS3, Line 10: query plan generate
nit: generation? or "query planning", "query optimization"


http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@19
PS3, Line 19: , even if
            : table has no stats.
nit: regardless the existense of the stats.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@173
PS3, Line 173:   // Reserve 'TABLE_NUM_ROWS' hint value from query, we will use this value when construct
             :   // HdfsScanNode if table does not have stats, or corrupt stats. Otherwise, Impala will
             :   // use table original stats.
nit: might be better to reword to

Value of query hint 'TABLE_NUM_ROWS' on this table. Used in constructing HdfsScanNode if the table does not have stats, or has correct stats. -1 indicates no hint.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@510
PS3, Line 510: isTableHintSupport
nit: isTableHintSupported


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@514
PS3, Line 514: Table hints supported for Hdfs/Kudu tables now
nit: reword to

Table hints only supported for Hdfs/Kudu tables.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@518
PS3, Line 518:       if (hint.is("SCHEDULE_CACHE_LOCAL")) {
Does this mean we support such hints for Kudu tables now? I think the SCHEDULE_* hints only apply on Hdfs tables.

I think what we expect is only the TABLE_NUM_ROWS hint being used on both hdfs and kudu tables. The other hints should remain to be only used on hdfs tables. Although I think we should have scheduler hints for kudu tables as well, we can support them in another JIRA.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@555
PS3, Line 555: // This hint is valid for Hdfs/Kudu table now
nit: can we remove this comment? It seems no need to explain the following line.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@556
PS3, Line 556: Long.valueOf
nit: can use Long.parseLong() directly, which is used internally in Long.valueOf().


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@564
PS3, Line 564: Support table row hint for hdfs and kudu table now.
nit: reword to 

Returns whether the table supports table hint. Currently, only hdfs and kudu tables support it.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1465
PS3, Line 1465:       cardinality_ = extrapolatedNumRows_;
Should we overwrite this as well if the hint exists?


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1542
PS3, Line 1542:    * partitions with corrupt stats.
Could you please mention the hint in this comment?


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1573
PS3, Line 1573:       return tableNumRowsHint_ >= 0 ? tableNumRowsHint_ : partitionNumRows_;
I thought we only use the hint when missing stats. This always overwrites the original stats which is ok but conflict with the commit message and some comments. Please update them appropriately.

BTW, should we update partitionNumRows_ to be tableNumRowsHint_ if it's valid? Not sue if inconsistency in these two values will cause troubles.

UPDATE: I see partitionNumRows_ is only used in getTableStatsExplainString() outside this method. It's shown in the query plan. It'd be better to update it as tableNumRowsHint_ to avoid confusion.


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java@81
PS3, Line 81:   // Reserve 'TABLE_NUM_ROWS' hint value from query, we will use this value when construct
            :   // ScanNode if table does not have stats, or corrupt stats. Otherwise, Impala will
            :   // use table original stats.
            :   // Only valid for hdfs table currently.
nit: maybe we can just refer to the comment of TableRef.tableNumRowsHint_



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Aug 2022 08:01:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 10:

Hi Quanlong, I submited pre-review-test, but failed in these tests:

authorization.test_ranger.TestRanger.test_show_grant_hive_privilege
generate_junitxml.buildall.run-custom-cluster-tests

Seems unrelated to this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 26 Dec 2022 07:00:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 10: Code-Review+2

Thanks!


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 23 Dec 2022 15:32:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 10:

> Thanks!
Ok, Qifan. Thanks for continued interest in this feature.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Sun, 25 Dec 2022 02:49:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 12
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Wed, 04 Jan 2023 03:44:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................

IMPALA-7942 (part 1): Add query hints for table cardinalities

Currently, We need execute 'COMPUTE STATS' manually to compute
table stats info. Stats is very useful for query planning.
Without these stats, query plan maybe worse. In order to solve
this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS',
We can use this new hint after a hdfs or kudu table in query
like this:

  * select col from t /* +TABLE_NUM_ROWS(1000) */;

If set, Impala will use this value as table scanned rows when
table no stats or has corrput stats. This hint value will not
valid if table stats is normal.

Testing:
- Added new fe test in 'PlannerTest'
- Added new fe test in 'AnalyzeStmtsTest' for negative cases

Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
---
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/ScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
8 files changed, 160 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/29/18829/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 5:

(3 comments)

Thanks for reivew, Quanlong. For other engine:
1. Presto/Trino does not support hint syntax;
2. Spark does not support table row related hint: https://spark.apache.org/docs/latest/sql-ref-syntax-qry-select-hints.html
3. Mysql does not support table row related hint yet: https://dev.mysql.com/doc/refman/8.0/en/optimizer-hints.html

It seem that we cannot refer to other engine implement. Maybe we can invite other people to discuss this.

http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/analysis/TableRef.java@174
PS4, Line 174: corrupt
> nit: corrupt
Done


http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1548
PS4, Line 1548: ts. If the table has valid stats, this hint
> nit: "If the table has valid stats, this hint will be ignored".
Done


http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1548
PS4, Line 1548: has no s
> nit: "has no stats"
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 08 Sep 2022 12:57:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG@9
PS5, Line 9: Currently, we run 'COMPUTE STATS' command to compute table stats
           : which is very useful for query planning. Without these stats, a
           : query plan may not be optimal. However, these stats may not be
           : available, up to date, or valid. To workaround this problem,
> nit. I guess 'valid' expresses the idea of 'invalid', due to the negation i
Done


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> yes.
Yes, this situation would not throw exception or error according my test.


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5046
PS6, Line 5046: Syntax error in line 1
> Is it possible to include the invalid hint value in error message. In this 
This is a good idea, but not easy to achieve. Here are two mainly exception:
1. For '/* +TABLE_NUM_ROWS(aa) */', thrown NumberFormatException exception in 'TableRef.java', which means syntax parse success, but check failed in 'TableRef.java';
2. For '/* +TABLE_NUM_ROWS(-1) */' or '/* +TABLE_NUM_ROWS(1.0) */', we get ParseException, which means syntax parse failed in 'sql-parser.cup'.
These two test cases failed in different phase, main reason is that:

plan_hint ::=
  KW_STRAIGHT_JOIN
  {: RESULT = new PlanHint("straight_join"); :}
  | IDENT:name
  {: RESULT = new PlanHint(name); :}
  | IDENT:name LPAREN ident_list:args RPAREN
  {: RESULT = new PlanHint(name, args); :}
  | IDENT:name LPAREN INTEGER_LITERAL:value RPAREN
  {:
    if (value != null) {
      RESULT = new PlanHint(name,
        new ArrayList(Arrays.asList(value.toString())));
    } else {
      RESULT = new PlanHint(name);
    }
  :}
  | /* empty */
  {: RESULT = null; :}
  ;

 ident_list ::=
  ident_or_default:ident
  {:
    List<String> list = new ArrayList<>();
    list.add(ident);
    RESULT = list;
  :}
  | ident_list:list COMMA ident_or_default:ident
  {:
    list.add(ident);
    RESULT = list;
  :}
  ;

 As you can see, 'sql-parser.cup' define hint wrtie format. We can use string, string list or INTEGER_LITERAL as hint args after hint name. If we want to include the invalid hint value in error message, only two ways:
 1. Implement error message in 'sql-parser.cup', this maybe not suitable, since we need to check each hint format in 'sql-parser.cup';
 2. Allow all kinds of args in 'sql-parser.cup', to ensure sql parse success, and then add hint format check in corresponding class, for example, 'TABLE_NUM_ROWS' check in 'TableRef.java'. In this way, we may need to modify 'ident_list' to allowed all kinds of type, currently is only  'ident_or_default' or 'INTEGER_LITERAL', I'm not sure this is good, since this change will influence other hint behavior.
 Do you have any suggestion?


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5056
PS6, Line 5056:   // Cannot set cardinality hint without parameter
              :     AnalyzesOk("select * from functional.alltypes /* +TABLE_NUM_ROWS */",
> It does not appear these two belong to this section. Move them to positive 
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test
File testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test:

http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@1
PS6, Line 1: 
> you meant to say "and no hint is applied"?
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@10
PS6, Line 10: d the 
> nit.  "and the hint will be taken".
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@28
PS6, Line 28: 
> nit. the hint value will be ignored.
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@37
PS6, Line 37: the hint value will be 
> nit. the hint value will be ignored.
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@46
PS6, Line 46: lue will be 
> nit. the hint value will be ignored.
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/testdata/workloads/functional-planner/queries/PlannerTest/table-cardinality-hint.test@54
PS6, Line 54: the hint value will be 
> nit. the hint value will be ignored.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Sep 2022 07:31:30 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 8:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> IMHO this can lead to inconsistency in the plan, depending on where the num
If user set different hint value for same table in sql, this will indeed lead to inconsistency in the plan, but this depend on user behavior, and this problem also exists for selectivity hint.

Each table get hint value in 'TableRef.java' separately. If we need to add this ensurence, we need to put all involved tables into a list, and then find same table, and check hint value. But this may not easy to implements, especially for multiple nesting sql. Do you have any suggestion?

Or we can just add some comments to explain that this hint is work on each single table, even we use different hint value for same table in a sql. Besides, what if the user just wants to do this? Maybe we can keep this setting for more flexibility.


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5046
PS6, Line 5046: Syntax error in line 1
> Yeah, I agree with you in that in this case it may be hard to figure out th
Thanks for testing this, it seem that we can ignore this problem now?


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5048
PS6, Line 5048: Syntax error in line 1
> same comment as above.
Done


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5051
PS6, Line 5051: "Syntax error in line 1
> same comment as above.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 8
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 16 Sep 2022 02:45:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> Thanks for suggestion, Qifan. This seems reasonable.
Sure, that arrangement should work too and clean.  With the static attachment approach, you may need to include the query id in the hash key. 

Another data point is that we may need to consider a table ref A without #row hint to have the same # of rows specified in another table ref B with #row hint, assuming both refer to the same table. This is to avoid the requirement to redundant specification of the same #row hint in the query.



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 26 Sep 2022 12:57:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 11
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 03 Jan 2023 06:41:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 10:

> Patch Set 10:
> 
> > Hi Quanlong, I submited pre-review-test, but failed in these tests:
>  > 
>  > authorization.test_ranger.TestRanger.test_show_grant_hive_privilege
>  > generate_junitxml.buildall.run-custom-cluster-tests
>  > 
>  > Seems unrelated to this patch.
> 
> Here is the task url: https://jenkins.impala.io/job/pre-review-test/1490/console

Thanks Sheng!

test_show_grant_hive_privilege is broken by IMPALA-10986 since the number of added Ranger policies in tests reaches a certain number after IMPALA-10986 in the pre-review-test. I think all (or most of) the authorization-related tests are run with a single instance of the Ranger server in the pre-review-test.

We do not observe it in the gerrit-verify-dryrun-external probably because those authorization-related tests are run in different shards each having its own instance of Ranger server and the limit is not reached yet.  So the failure was not caused by your patch. I have a patch at https://gerrit.cloudera.org/c/19373 that will be approved soon.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 10
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 26 Dec 2022 23:10:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 11:

> > Patch Set 10:
 > >
 > > > Hi Quanlong, I submited pre-review-test, but failed in these
 > tests:
 > >  >
 > >  > authorization.test_ranger.TestRanger.test_show_grant_hive_privilege
 > >  > generate_junitxml.buildall.run-custom-cluster-tests
 > >  >
 > >  > Seems unrelated to this patch.
 > >
 > > Here is the task url: https://jenkins.impala.io/job/pre-review-test/1490/console
 > 
 > Thanks Sheng!
 > 
 > test_show_grant_hive_privilege is broken by IMPALA-10986 since the
 > number of added Ranger policies in tests reaches a certain number
 > after IMPALA-10986 in the pre-review-test. I think all (or most of)
 > the authorization-related tests are run with a single instance of
 > the Ranger server in the pre-review-test.
 > 
 > We do not observe it in the gerrit-verify-dryrun-external probably
 > because those authorization-related tests are run in different
 > shards each having its own instance of Ranger server and the limit
 > is not reached yet.  So the failure was not caused by your patch. I
 > have a patch at https://gerrit.cloudera.org/c/19373 that will be
 > approved soon.

Ok thanks for replay. I've already rebase this patch. Anyone could help to start a gerrit-verify-dryrun task to verify this patch?


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 11
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qf...@hotmail.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 03 Jan 2023 06:46:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 5:

(5 comments)

Looks good!

Complete the code section and have a few comments.

I have not got a chance to review the test section. If it is similar to the one in the parent JIRA of this JIRA, would it be possible to address any concerns there in this JIRA?

http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18829/5//COMMIT_MSG@9
PS5, Line 9: Currently, We need execute 'COMPUTE STATS' manually to compute
           : table stats info. Stats is very useful for query planning.
           : Without these stats, query plan maybe worse. In order to solve
           : this probelm, this patch adds a new query hint: 'TABLE_NUM_ROWS',
nit. Rewording.

Currently, we run 'COMPUTE STATS' command to compute table stats which is very useful for query planning. Without these stats, a query plan may not be optimal. However, these stats may not be available, up to date, or valid. To workaround this problem, this patch adds a new query hint: ......


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
This new data member in class TableRef implies that two table ref objects may be assigned with different num rows, even when both refer to the same SQL table. 

select * from T /* table_num_row(100) */, T /* table_num_row(1000) */;

It sounds like we need to have a step to verify this and raise an error.


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@569
PS5, Line 569: supportTableHint
It is a good idea to specify the exact hint in question: supportTableRowHint().


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1547
PS5, Line 1547:  * Currently, we provide a table hint 'TABLE_NUM_ROWS' to set table rows manually if
              :    * table has no stats or has corrupt stats. If the table has valid stats, this hint
              :    * will be ignored.
In my past experience with very large data warehouse, that the table num row hint can be applied to even a table with valid stats is very useful to judge plans under different table growth conditions. 

For this patch, I feel it is okey to support it only for missing or non valid stats first, and extend it for valid stats later on.


http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
File fe/src/main/java/org/apache/impala/planner/KuduScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/planner/KuduScanNode.java@379
PS5, Line 379: no
nit. has no stats



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 08 Sep 2022 17:41:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "wangsheng (Code Review)" <ge...@cloudera.org>.
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 4:

(13 comments)

Hi Quanlong, thanks for advice. I think you are right, use hint value to replace original table stats may cause consistency when use explain. So I modify the code, table hint is valid when no stats or has corrupt stats. Here is a problem, I use 'functional.alltypes' for hdfs table with stats, 'functional_parquet.alltypes' for hdfs table without stats, 'functional_kudu.alltypes' for kudu table with stats.
But I did not figure out the way to test kudu table without stats.

http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@10
PS3, Line 10: query planning.
> nit: generation? or "query planning", "query optimization"
Done


http://gerrit.cloudera.org:8080/#/c/18829/3//COMMIT_MSG@19
PS3, Line 19: l not
            : valid if table stat
> nit: regardless the existense of the stats.
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@173
PS3, Line 173:   // Value of query hint 'TABLE_NUM_ROWS' on this table. Used in constructing ScanNode if
             :   // the table does not have stats, or has correct stats. -1 indicates no hint. Currently,
             :   // this hint is valid for hd
> nit: might be better to reword to
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@510
PS3, Line 510: 
> nit: isTableHintSupported
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@514
PS3, Line 514: estTable() != null &&
> nit: reword to
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@518
PS3, Line 518:     for (PlanHint hint: tableHints_) {
> Does this mean we support such hints for Kudu tables now? I think the SCHED
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@555
PS3, Line 555: analyzer.setHasPlanHints();
> nit: can we remove this comment? It seems no need to explain the following 
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@556
PS3, Line 556: Long.parseLo
> nit: can use Long.parseLong() directly, which is used internally in Long.va
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/analysis/TableRef.java@564
PS3, Line 564: Returns whether the table supports hint. Currently,
> nit: reword to 
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1465
PS3, Line 1465:       cardinality_ = extrapolatedNumRows_;
> Should we overwrite this as well if the hint exists?
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1542
PS3, Line 1542:    * partitions with corrupt stats.
> Could you please mention the hint in this comment?
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1573
PS3, Line 1573:     // by each of the partitions, as the row count for the table.
> I thought we only use the hint when missing stats. This always overwrites t
Done


http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java
File fe/src/main/java/org/apache/impala/planner/ScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/3/fe/src/main/java/org/apache/impala/planner/ScanNode.java@81
PS3, Line 81:   // Refer to the comment of 'TableRef.tableNumRowsHint_'
            :   protected long tableNumRowsHint_ = -1;
            : 
            :   public ScanNode(PlanNodeId id, TupleDes
> nit: maybe we can just refer to the comment of TableRef.tableNumRowsHint_
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Fri, 19 Aug 2022 13:06:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/5/fe/src/main/java/org/apache/impala/analysis/TableRef.java@176
PS5, Line 176: tableNumRowsHint_
> Yes, this situation would not throw exception or error according my test.
IMHO this can lead to inconsistency in the plan, depending on where the num row hints are specified in the query and how they are processed by the code.


http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java:

http://gerrit.cloudera.org:8080/#/c/18829/6/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java@5046
PS6, Line 5046: Syntax error in line 1
> This is a good idea, but not easy to achieve. Here are two mainly exception
Yeah, I agree with you in that in this case it may be hard to figure out the exact location directly in the parser. 

I ran a simple query from cli as follows. It looks like the user can easily spot the problem. Seems what you have is okay.  

Query: select * from foo  /* +TABLE_NUM_ROWS(1.0) */
Query submitted at: 2022-09-15 14:42:09 (Coordinator: http://qifan-10229:25000)
ERROR: ParseException: Syntax error in line 1:
select * from foo  /* +TABLE_NUM_ROWS(1.0) */
                                      ^
Encountered: DECIMAL LITERAL
Expected: DEFAULT, IDENTIFIER

CAUSED BY: Exception: Syntax error

Could not execute command: select * from foo  /* +TABLE_NUM_ROWS(1.0) */



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Thu, 15 Sep 2022 18:50:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11386/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 9
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Tue, 20 Sep 2022 08:54:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7942 (part 1): Add query hints for table cardinalities

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18829 )

Change subject: IMPALA-7942 (part 1): Add query hints for table cardinalities
......................................................................


Patch Set 4:

(4 comments)

> Patch Set 4:
> 
> (13 comments)
> 
> Hi Quanlong, thanks for advice. I think you are right, use hint value to replace original table stats may cause consistency when use explain. So I modify the code, table hint is valid when no stats or has corrupt stats. Here is a problem, I use 'functional.alltypes' for hdfs table with stats, 'functional_parquet.alltypes' for hdfs table without stats, 'functional_kudu.alltypes' for kudu table with stats.
> But I did not figure out the way to test kudu table without stats.

Thanks for updating the patch!

I'm not sure whether we should allow the hint to overwrite valid table stats and fix the explain result instead. This worths a broader discussion. Not sure how other systems design.

http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/analysis/TableRef.java
File fe/src/main/java/org/apache/impala/analysis/TableRef.java:

http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/analysis/TableRef.java@174
PS4, Line 174: correct
nit: corrupt


http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1548
PS4, Line 1548: If table already has stats, this hint will not
nit: "If the table has valid stats, this hint will be ignored".


http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1548
PS4, Line 1548: no stats
nit: "has no stats"


http://gerrit.cloudera.org:8080/#/c/18829/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1624
PS4, Line 1624: if table no stats
nit: "if the table has no stats"



-- 
To view, visit http://gerrit.cloudera.org:8080/18829
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9f0c773f4e67782a1428db64062f68afbd257af7
Gerrit-Change-Number: 18829
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng <sk...@163.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: wangsheng <sk...@163.com>
Gerrit-Comment-Date: Mon, 05 Sep 2022 02:01:53 +0000
Gerrit-HasComments: Yes