You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2021/03/05 14:33:59 UTC

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17153


Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+-------------------------------------------------------+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+-------------------------------------------------------+
| 5                                                     |
+-------------------------------------------------------+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
4 files changed, 125 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 5: Code-Review+2

Thanks for the changes!


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 15:50:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+-------------------------------------------------------+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+-------------------------------------------------------+
| 5                                                     |
+-------------------------------------------------------+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Reviewed-on: http://gerrit.cloudera.org:8080/17153
Reviewed-by: Gabor Kaszab <ga...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 169 insertions(+), 0 deletions(-)

Approvals:
  Gabor Kaszab: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 6
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 22:14:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+-------------------------------------------------------+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+-------------------------------------------------------+
| 5                                                     |
+-------------------------------------------------------+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 167 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+-------------------------------------------------------+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+-------------------------------------------------------+
| 5                                                     |
+-------------------------------------------------------+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 166 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc
File be/src/exprs/datasketches-common.cc:

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc@54
PS4, Line 54: bool DeserializeDsSketch(
Could you please comment that this is a specialization of the template DeserializeDsSketch() for theta sketches?


http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc@148
PS4, Line 148: // A is not null
nit: I don't think this comment and the one below adds much. The comment above explains well the functionality here.


http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h@74
PS3, Line 74: sketch. If it is not
nit: "...sketches. If they are not..."


http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@398
PS4, Line 398: check that it is equal to the
             : # estimate of a.
Does this mean that with this test A and B has no common items so the result of A-not-B equals to the cardinality of A?

If yes, do you think it would make sense to add a test where there are some common items between the two input sets but not all the items are common?



-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 08:12:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8361/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Mar 2021 14:50:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 2:

(8 comments)

Thanks for these changes! I had some comments mostly nits and around test coverage.

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@128
PS2, Line 128:   // a_not_b
nit: this comment is not needed as doesn't give extra info


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@131
PS2, Line 131: datasketches::theta_sketch::unique_ptr first_sketch_ptr;
             :   if (!first_serialized_sketch.is_null && first_serialized_sketch.len > 0) {
             :     try {
             :       first_sketch_ptr = datasketches::theta_sketch::deserialize(
             :           (void*)first_serialized_sketch.ptr, first_serialized_sketch.len);
             :     } catch (const std::exception&) {
             :       LogSketchDeserializationError(ctx);
             :       return StringVal::null();
             :     }
             :   }
This part seems pretty identical to the section L141-150. Can you move it to a function to avoid code repetition?


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@155
PS2, Line 155: first_sketch_ptr.operator bool()
I'm not sure I understand the condition in this format :) Could you please explain what goes on here?


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h@73
PS2, Line 73: 'serialized_sketch'
Could you mention both sketch params?


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@330
PS2, Line 330: for A is an empty sketch.
When A is empty and B is null.


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@331
PS2, Line 331: select ds_theta_estimate(ds_theta_exclude(ds_theta_sketch(f2), null))
Could you please add another test where A is null and B is empty? (the opposite of this one)


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@332
PS2, Line 332: from functional_parquet.emptytable;
Another test would be where A and B are both empty.


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@379
PS2, Line 379: ====
I miss a test where the result of an a-not-b is a non-empty sketch (where the estimate is greater than zero).



-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 12 Mar 2021 12:12:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@128
PS2, Line 128:   datasketches::theta_a_not_b a_not_b;
> nit: this comment is not needed as doesn't give extra info
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@131
PS2, Line 131: if (!first_serialized_sketch.is_null && first_serialized_sketch.len > 0) {
             :     if (!DeserializeDsSketch(first_serialized_sketch, &first_sketch_ptr)) {
             :       LogSketchDeserializationError(ctx);
             :       return StringVal::null();
             :     }
             :   }
             :   datasketches::theta_sketch::unique_ptr second_sketch_ptr;
             :   if (!second_serialized_sketch.is_null && second_serialized_sketch.len > 0) {
             :     if (!DeserializeDsSketch(second_serialized_sketch, &second_sketch_ptr)) {
             :    
> This part seems pretty identical to the section L141-150. Can you move it t
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions-ir.cc@155
PS2, Line 155: d::stringstream serialized_input
> I'm not sure I understand the condition in this format :) Could you please 
function ref: https://en.cppreference.com/w/cpp/memory/unique_ptr/operator_bool, usage has been modified with reference to the example.


http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/17153/2/be/src/exprs/datasketches-functions.h@73
PS2, Line 73: 'first_serialized_s
> Could you mention both sketch params?
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@330
PS2, Line 330: When A is empty and B is 
> When A is empty and B is null.
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@331
PS2, Line 331: select ds_theta_estimate(ds_theta_exclude(ds_theta_sketch(f2), null))
> Could you please add another test where A is null and B is empty? (the oppo
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@332
PS2, Line 332: from functional_parquet.emptytable;
> Another test would be where A and B are both empty.
Done


http://gerrit.cloudera.org:8080/#/c/17153/2/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@379
PS2, Line 379: i.ti i_ti, i.i i_i, i.bi i_bi, i.f i_f, i.d i_d, i.s i_s, i.c i_c, i.v i_v,i.nc i_nc,
> I miss a test where the result of an a-not-b is a non-empty sketch (where t
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 14 Mar 2021 14:30:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8383/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 15:35:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................

IMPALA-10558: Implement ds_theta_exclude() function

This function receives two strings that are serialized Apache
DataSketches Theta sketches. Computes the a-not-b set operation given
two sketches of same or different column.

Example:
select ds_theta_estimate(ds_theta_exclude(sketch1, sketch2))
from sketch_tbl;
+-------------------------------------------------------+
| ds_theta_estimate(ds_theta_exclude(sketch1, sketch2)) |
+-------------------------------------------------------+
| 5                                                     |
+-------------------------------------------------------+

Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
---
M be/src/exprs/datasketches-common.cc
M be/src/exprs/datasketches-functions-ir.cc
M be/src/exprs/datasketches-functions.h
M common/function-registry/impala_functions.py
M testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
5 files changed, 169 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17153/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8368/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 Mar 2021 08:04:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 5:

(4 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc
File be/src/exprs/datasketches-common.cc:

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-common.cc@54
PS4, Line 54: bool DeserializeDsSketch(
> Could you please comment that this is a specialization of the template Dese
Done


http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc
File be/src/exprs/datasketches-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17153/4/be/src/exprs/datasketches-functions-ir.cc@148
PS4, Line 148: 
> nit: I don't think this comment and the one below adds much. The comment ab
Done


http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h
File be/src/exprs/datasketches-functions.h:

http://gerrit.cloudera.org:8080/#/c/17153/3/be/src/exprs/datasketches-functions.h@74
PS3, Line 74: sketches. If they ar
> nit: "...sketches. If they are not..."
Done


http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test
File testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test:

http://gerrit.cloudera.org:8080/#/c/17153/4/testdata/workloads/functional-query/queries/QueryTest/datasketches-theta.test@398
PS4, Line 398: ch.
             : create table ske
> Does this mean that with this test A and B has no common items so the resul
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 14:20:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8301/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Mar 2021 14:55:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10558: Implement ds theta exclude() function

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17153 )

Change subject: IMPALA-10558: Implement ds_theta_exclude() function
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6974/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17153
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I05119fd8c652c07ff248a99e44b0da3541e46ca3
Gerrit-Change-Number: 17153
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 Mar 2021 15:50:51 +0000
Gerrit-HasComments: No