You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "fifteencai (Code Review)" <ge...@cloudera.org> on 2021/04/13 10:02:47 UTC

[Impala-ASF-CR] IMPALA-10445: Adjust NDV's scale with query option

fifteencai has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17306 )

Change subject: IMPALA-10445: Adjust NDV's scale with query option
......................................................................

IMPALA-10445: Adjust NDV's scale with query option

This is a new way to control NDV's scale.

Since IMPALA-2658, we can trade memory for more accurate
estimation by setting larger `scale`. That scale is decided by SQL
writers. However, it is a bumpy road for cluster admins to allow for
larger scales. Here lies 2 reasons:

- Firstly, SQL writers are reluctant to low the scale. They prone
to fill up the scale, which will make the cluster unstable, especially
when there are `group by`s with high cardinalities. So it is wiser to
let cluster admin instead of sql writer choose appropriate scale.

- Secondly, In some application scenarios, queries are stored in DBs.
In a BI system, for example, rewriting thousands of SQLs is risky.

In this commit, we introduced a new Query Option `DEFAULT_NDV_SCALE`.
During to the advantage of query option, Cluster admins can either tune
1 desired query, or influence upcoming queries by placing a default
query option in a dynamic resource pool.

We also refactored method `Analyze` to make sure APPX_COUNT_DISTINCT
can work with this query option. After this, cluster admins can degrade
service level by transforming `count(distinct id)` to `ndv(id, scale)`.

Implementation details:

- The default value of DEFAULT_NDV_SCALE is 2, so we won't change
the default ndv behavior.
- We port `CountDistinctToNdv` transform logic from
`SelectStmt.analyze()` to `ExprRewriter`, making it compatible with
further rewrite rules.
- The newly added rewrite rule `DefaultNdvScaleRule` is applied
after `CountDistinctToNdvRule`.

Usage:

To set a default ndv scale:
```
SET DEFAULT_NDV_SCALE = 10;  -- ranges from 1 to 10, both inclusive.
```

To unset:
```
SET DEFAULT_NDV_SCALE = 2;
```

Here are test results of a typical workload (cardinality=40,090,650):
+====================================================================+
|   Metric    | Count Distinct |    NDV2    |    NDV5    |    NDV10  |
+--------------------------------------------------------------------+
|  Memory(GB) |       3.83     |    1.84    |    1.85    |     1.89  |
| Duration(s) |      182.89    |   30.22    |    29.72   |     29.24 |
|  ErrorRate  |        0%      |    1.8%    |    1.17%   |     0.06% |
+====================================================================+

Testing:
1) Added 3 unit test cases in `ExprRewriteRulesTest`.
2) Added 5 unit test cases in `ExprRewriterTest`.
3) Ran all front-end unit test, passed.
4) Added a new query-option test.

Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
A fe/src/main/java/org/apache/impala/rewrite/CountDistinctToNdvRule.java
A fe/src/main/java/org/apache/impala/rewrite/DefaultNdvScaleRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
11 files changed, 251 insertions(+), 34 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17306/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1669858a6e8252e167b464586e8d0b6cb0d0bd50
Gerrit-Change-Number: 17306
Gerrit-PatchSet: 9
Gerrit-Owner: fifteencai <fi...@tencent.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>