You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2021/08/01 01:19:50 UTC

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17744


Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................

IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision

This path addresses the current limitation in DS_HLL_SKETCH function by
extending the function to optionally take a secondary argument
called precision.

   DS_HLL_SKETCH(expression [, precision])

The precision value must be between 4 and 21, specified as an integer
literal. The default is 12.

Here are test results of a typical workload in tpch25.lineitem (#1):
+====================================================================+
|   Metric    | Count Distinct | DS_HLL-12  | DS_HLL-16  | DS_HLL-21 |
+--------------------------------------------------------------------+
|  Memory(MB) |     725.43     |   124.87   |    123.19  |    121.85 |
| Duration(s) |      5.64      |   1.03     |    1.13    |     1.64  |
|  ErrorRate  |       0%       |   1.26%    |    0.22%   |     0.05% |
+====================================================================+

Testing:
1. Ran unit tests against table lineitem in TPC-DS in both serial and
   parallel plan settings;
2. Ran "core" tests.

Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
6 files changed, 155 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................

IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision

This path addresses the current limitation in DS_HLL_SKETCH function by
extending the function to optionally take a secondary argument
called precision.

   DS_HLL_SKETCH(expression [, precision])

The precision value must be between 4 and 21, specified as an integer
literal. The default is 12.

Here are test results of a typical workload in tpch25.lineitem (#1):
+====================================================================+
|   Metric    | Count Distinct | DS_HLL-12  | DS_HLL-16  | DS_HLL-21 |
+--------------------------------------------------------------------+
|  Memory(MB) |     725.43     |   124.87   |    123.19  |    121.85 |
| Duration(s) |      5.64      |   1.03     |    1.13    |     1.64  |
|  ErrorRate  |       0%       |   1.26%    |    0.22%   |     0.05% |
+====================================================================+

Testing:
1. Ran unit tests against table lineitem in TPC-DS in both serial and
   parallel plan settings;
2. Ran "core" tests.

Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
---
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/datasketches-common.h
M fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
M fe/src/main/java/org/apache/impala/catalog/BuiltinsDb.java
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
6 files changed, 155 insertions(+), 3 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/44/17744/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Posted by "Alexander Saydakov (Code Review)" <ge...@cloudera.org>.
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826
PS1, Line 1826: datasketches
why max here, not the specified precision?



-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Sep 2021 21:38:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9499/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 11:03:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722
PS1, Line 1722: nctionCon
> precision is not the best name. I would suggest following the datasketches 
Done


http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1826
PS1, Line 1826: _cast<datask
> why max here, not the specified precision?
The resulting accuracy of a sketch returned at the end of the unioning process will be a function of the smallest of <i>lg_max_k</i> and <i>lg_config_k</i> that the union operator has seen.
see: https://github.com/apache/datasketches-cpp/blob/master/hll/include/hll.hpp#L404-L407

In order not to affect the union operation of the high-precision ds_hll_sketch result sketch, lg_max_k takes the maximum value. If necessary, precision parameters will be added to ds_hll_union in the new jira



-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 26 Sep 2021 10:42:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Posted by "Alexander Saydakov (Code Review)" <ge...@cloudera.org>.
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17744/1/be/src/exprs/aggregate-functions-ir.cc@1722
PS1, Line 1722: precision
precision is not the best name. I would suggest following the datasketches library and call it lg_k



-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Sep 2021 22:24:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10835: Extend the DS HLL SKETCH function to accept a precision

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17744 )

Change subject: IMPALA-10835: Extend the DS_HLL_SKETCH function to accept a precision
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9218/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17744
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I91a360bb046d4abb101641772b6159308bf6c014
Gerrit-Change-Number: 17744
Gerrit-PatchSet: 1
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 01 Aug 2021 01:42:24 +0000
Gerrit-HasComments: No