You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2021/08/01 01:19:50 UTC
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17744
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
This path addresses the current limitation in DS_HLL_SKETCH function by
extending the function to optionally take a secondary argument
called precision.
DS_HLL_SKETCH(expression [, precision])
The precision value must be between 4 and 21, specified as an integer
literal. The default is 12.
Here are test results of a typical workload in tpch25.lineitem (#1):
+====================================================================+
| Metric | Count Distinct | DS_HLL-12 | DS_HLL-16 | DS_HLL-21 |
+--------------------------------------------------------------------+
| Memory(MB) | 725.43 | 124.87 | 123.19 | 121.85 |
| Duration(s) | 5.64 | 1.03 | 1.13 | 1.64 |
| ErrorRate | 0% | 1.26% | 0.22% | 0.05% |
+====================================================================+
Testing:
1. Ran unit tests against table lineitem in TPC-DS in both serial and
parallel plan settings;
2. Ran "core" tests.
Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
6 files changed, 155 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/1
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17744 )
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
This path addresses the current limitation in DS_HLL_SKETCH function by
extending the function to optionally take a secondary argument
called precision.
DS_HLL_SKETCH(expression [, precision])
The precision value must be between 4 and 21, specified as an integer
literal. The default is 12.
Here are test results of a typical workload in tpch25.lineitem (#1):
+====================================================================+
| Metric | Count Distinct | DS_HLL-12 | DS_HLL-16 | DS_HLL-21 |
+--------------------------------------------------------------------+
| Memory(MB) | 725.43 | 124.87 | 123.19 | 121.85 |
| Duration(s) | 5.64 | 1.03 | 1.13 | 1.64 |
| ErrorRate | 0% | 1.26% | 0.22% | 0.05% |
+====================================================================+
Testing:
1. Ran unit tests against table lineitem in TPC-DS in both serial and
parallel plan settings;
2. Ran "core" tests.
Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
6 files changed, 155 insertions(+), 3 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/2
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Posted by "Alexander Saydakov (Code Review)" <ge...@cloudera.org>.
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826
PS1, Line 1826: datasketches
why max here, not the specified precision?
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Sep 2021 21:38:14 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
Patch Set 2:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/9499/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 11:03:21 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
Patch Set 2:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722
PS1, Line 1722: nctionCon
> precision is not the best name. I would suggest following the datasketches
Done
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826
PS1, Line 1826: _cast<datask
> why max here, not the specified precision?
The resulting accuracy of a sketch returned at the end of the unioning process will be a function of the smallest of <i>lg_max_k</i> and <i>lg_config_k</i> that the union operator has seen.
see: https://github.com/apache/datasketches-cpp/blob/master/hll/include/hll.hpp#L404-L407
In order not to affect the union operation of the high-precision ds_hll_sketch result sketch, lg_max_k takes the maximum value. If necessary, precision parameters will be added to ds_hll_union in the new jira
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 10:42:09 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Posted by "Alexander Saydakov (Code Review)" <ge...@cloudera.org>.
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:
http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722
PS1, Line 1722: precision
precision is not the best name. I would suggest following the datasketches library and call it lg_k
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Sep 2021 22:24:56 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )
Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................
Patch Set 1:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/9218/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 01 Aug 2021 01:42:24 +0000
Gerrit-HasComments: No