You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "fifteencai (Code Review)" <ge...@cloudera.org> on 2021/04/13 10:02:47 UTC
[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option
fifteencai has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17306 )
Change subject: IMPALA-10445: Adjust NDV's scale with query option
......................................................................
IMPALA-10445: Adjust NDV's scale with query option
This is a new way to control NDV's scale.
Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale`. That scale is decided by SQL
writers. However, it is a bumpy road for cluster admins to allow for
larger scales. Here lies 2 reasons:
- Firstly, SQL writers are reluctant to low the scale. They prone
to fill up the scale, which will make the cluster unstable, especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin instead of sql writer choose appropriate scale.
- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.
In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`.
During to the advantage of query option, Cluster admins can either tune
1 desired query, or influence upcoming queries by placing a default
query option in a dynamic resource pool.
We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can degrade
service level by transforming `count(distinct id)` to `ndv(id, scale)`.
Implementation details:
- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.
Usage:
To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10; -- ranges from 1 to 10, both inclusive.
```
To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```
Here are test results of a typical workload (cardinality=40,090,650):
+====================================================================+
| Metric | Count Distinct | NDV2 | NDV5 | NDV10 |
+--------------------------------------------------------------------+
| Memory(GB) | 3.83 | 1.84 | 1.85 | 1.89 |
| Duration(s) | 182.89 | 30.22 | 29.72 | 29.24 |
| ErrorRate | 0% | 1.8% | 1.17% | 0.06% |
+====================================================================+
Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.
Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 251 insertions(+), 34 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/9
--
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai <fi...@tencent.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>