You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2020/11/12 06:25:15 UTC

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16711


Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................

IMPALA-10134: Implement ds_hll_union_f() function.

This function receives two string that is a serialized Apache DataSketches
HLL sketch. Union two sketches and returns the resulting sketch of union.

Example:
select ds_hll_estimate(ds_hll_union_f(i_i, h_i))
from hll_sketches_impala_hive2;
+-------------------------------------------+
| ds_hll_estimate(ds_hll_union_f(i_i, h_i)) |
+-------------------------------------------+
| 7                                         |
+-------------------------------------------+

Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
4 files changed, 105 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16711/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7636/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Nov 2020 06:47:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6713/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Nov 2020 09:05:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................

IMPALA-10134: Implement ds_hll_union_f() function.

This function receives two strings that are serialized Apache DataSketches
HLL sketches. Union two sketches and returns the resulting sketch of union.

Example:
select ds_hll_estimate(ds_hll_union_f(i_i, h_i))
from hll_sketches_impala_hive2;
+-------------------------------------------+
| ds_hll_estimate(ds_hll_union_f(i_i, h_i)) |
+-------------------------------------------+
| 7                                         |
+-------------------------------------------+

Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
4 files changed, 116 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16711/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 3:

(9 comments)

Great work! Thanks for taking care of this!

http://gerrit.cloudera.org:8080/#/c/16711/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16711/3//COMMIT_MSG@9
PS3, Line 9: string that is a
nit: strings that are


http://gerrit.cloudera.org:8080/#/c/16711/3//COMMIT_MSG@10
PS3, Line 10: sketch
nit: sketches


http://gerrit.cloudera.org:8080/#/c/16711/3/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/16711/3/be/src/exprs/datasketches-functions-ir.cc@88
PS3, Line 88: if (sketch.get_estimate() == 0.0) {
            :     return StringVal::null();
            :   }
I checked the behaviour of hive: If you give 2 empty sketches (sketches of an empty data set) to ds_hll_union_f() then it returns an empty sketch instead of null. I think we should follow Hive's approach.
Additionally giving 2 nulls as parameter also results an empty sketch.
(By empty sketch I mean a sketch that weren't updated by any inputs.)


http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
File testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test:

http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@346
PS3, Line 346: Unions the sketches from sketch_store and checks if
This comment is misleading. This test checks that unioning a valid sketch with a null value result the valid sketch being returned.


http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@361
PS3, Line 361: # Check that ds_hll_union_f() returns null for an empty sketch.
Hive returns an empty sketch instead of null.


http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@369
PS3, Line 369: # Checks that ds_hll_union_f() produces NULL for NULL inputs.
Hive returns an empty sketch instead of null.


http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@378
PS3, Line 378: select ds_hll_union_f(date_string_col, date_string_col) from functional_parquet.alltypestiny
I'd split this test into 2:
First where the first param is an invalid sketch.
Second where the second param is an invalid sketch.


http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@383
PS3, Line 383: ---- QUERY
Please add a short comment for the purpose of this particular test case.


http://gerrit.cloudera.org:8080/#/c/16711/3/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test@407
PS3, Line 407: 5,7,6,5,6,8,6,6,NULL
Please check the comment in L204. There I descibe that unioning string, char or varchar columns is not working well when one sketch is from Impala and the other is from Hive. Please add a similar comment including (the Jira ID) here.



-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Nov 2020 15:37:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Nov 2020 14:33:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 5: Code-Review+2

Great work!


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Nov 2020 09:05:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7662/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Nov 2020 14:21:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/7670/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Nov 2020 01:34:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................

IMPALA-10134: Implement ds_hll_union_f() function.

This function receives two strings that are serialized Apache DataSketches
HLL sketches. Union two sketches and returns the resulting sketch of union.

Example:
select ds_hll_estimate(ds_hll_union_f(i_i, h_i))
from hll_sketches_impala_hive2;
+-------------------------------------------+
| ds_hll_estimate(ds_hll_union_f(i_i, h_i)) |
+-------------------------------------------+
| 7                                         |
+-------------------------------------------+

Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Reviewed-on: http://gerrit.cloudera.org:8080/16711
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
4 files changed, 116 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 7
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................


Patch Set 6: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 25 Nov 2020 09:05:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10134: Implement ds hll union f() function.

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16711 )

Change subject: IMPALA-10134: Implement ds_hll_union_f() function.
......................................................................

IMPALA-10134: Implement ds_hll_union_f() function.

This function receives two strings that are serialized Apache DataSketches
HLL sketches. Union two sketches and returns the resulting sketch of union.

Example:
select ds_hll_estimate(ds_hll_union_f(i_i, h_i))
from hll_sketches_impala_hive2;
+-------------------------------------------+
| ds_hll_estimate(ds_hll_union_f(i_i, h_i)) |
+-------------------------------------------+
| 7                                         |
+-------------------------------------------+

Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test
4 files changed, 116 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/11/16711/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic06e959ed956af5cedbfc7d4d063141d5babb2a8
Gerrit-Change-Number: 16711
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>