You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fucun Chu (Code Review)" <ge...@cloudera.org> on 2021/01/17 12:50:05 UTC

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Fucun Chu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16959


Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................

IMPALA-10440: Import Theta functionality from DataSketches

This patch imports the functionality needed for Theta approximate
algorithm from Apache DataSketches.

First, I updated our existing snapshot of DataSketches to the
following commit:b2f749ed5ce6ba650f4259602b133c310c3a5ee4"
Merge pull request #182 from chufucun/include_type"
This affects files originated from theta/ directories of the
DataSketches repo.

Then I copied all the files needed for Theta into our snapshot
directory.

Browse the source files here:
https://github.com/apache/datasketches-cpp

Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
A be/src/thirdparty/datasketches/theta_a_not_b.hpp
A be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_sketch.hpp
A be/src/thirdparty/datasketches/theta_sketch_impl.hpp
A be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_impl.hpp
16 files changed, 2,154 insertions(+), 60 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16959/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8035/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 06:25:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................

IMPALA-10440: Import Theta functionality from DataSketches

This patch imports the functionality needed for Theta approximate
algorithm from Apache DataSketches.

First, I updated our existing snapshot of DataSketches to the
following commit:b2f749ed5ce6ba650f4259602b133c310c3a5ee4"
Merge pull request #182 from chufucun/include_type"
This affects files originated from hll/, kll/ and theta/ directories
of the DataSketches repo.

Then I copied all the files needed for Theta into our snapshot
directory.

Browse the source files here:
https://github.com/apache/datasketches-cpp/tree/b2f749ed5ce6ba650f4259602b133c310c3a5ee4

Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Reviewed-on: http://gerrit.cloudera.org:8080/16959
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
A be/src/thirdparty/datasketches/theta_a_not_b.hpp
A be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_sketch.hpp
A be/src/thirdparty/datasketches/theta_sketch_impl.hpp
A be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_impl.hpp
16 files changed, 2,153 insertions(+), 60 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 5
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 3: Code-Review+2

Thanks for the changes!


-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 09:15:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16959/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16959/2//COMMIT_MSG@15
PS2, Line 15: This affects files originated from theta/ directories
also files were changed in hll/ kll/ dirs


http://gerrit.cloudera.org:8080/#/c/16959/2//COMMIT_MSG@22
PS2, Line 22: https://github.com/apache/datasketches-cpp
Could you include a link that point to the particular git commit of the repo that we use here?


http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/exprs/datasketches-test.cc
File be/src/exprs/datasketches-test.cc:

http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/exprs/datasketches-test.cc@176
PS2, Line 176: 
nit: drop the extra empty line


http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/thirdparty/datasketches/README.md
File be/src/thirdparty/datasketches/README.md:

http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/thirdparty/datasketches/README.md@1
PS2, Line 1: HLL CPC Theta
nit: please use a comma between these names. Additionally, could you please arrange them in alphabetical order?



-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 Jan 2021 12:56:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6857/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 09:16:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 09:16:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................

IMPALA-10440: Import Theta functionality from DataSketches

This patch imports the functionality needed for Theta approximate
algorithm from Apache DataSketches.

First, I updated our existing snapshot of DataSketches to the
following commit:b2f749ed5ce6ba650f4259602b133c310c3a5ee4"
Merge pull request #182 from chufucun/include_type"
This affects files originated from hll/, kll/ and theta/ directories
of the DataSketches repo.

Then I copied all the files needed for Theta into our snapshot
directory.

Browse the source files here:
https://github.com/apache/datasketches-cpp/tree/b2f749ed5ce6ba650f4259602b133c310c3a5ee4

Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
---
M be/src/exprs/CMakeLists.txt
M be/src/exprs/datasketches-test.cc
M be/src/thirdparty/datasketches/HllUnion-internal.hpp
M be/src/thirdparty/datasketches/README.md
M be/src/thirdparty/datasketches/hll.hpp
M be/src/thirdparty/datasketches/kll_quantile_calculator_impl.hpp
M be/src/thirdparty/datasketches/kll_sketch.hpp
M be/src/thirdparty/datasketches/kll_sketch_impl.hpp
A be/src/thirdparty/datasketches/theta_a_not_b.hpp
A be/src/thirdparty/datasketches/theta_a_not_b_impl.hpp
A be/src/thirdparty/datasketches/theta_intersection.hpp
A be/src/thirdparty/datasketches/theta_intersection_impl.hpp
A be/src/thirdparty/datasketches/theta_sketch.hpp
A be/src/thirdparty/datasketches/theta_sketch_impl.hpp
A be/src/thirdparty/datasketches/theta_union.hpp
A be/src/thirdparty/datasketches/theta_union_impl.hpp
16 files changed, 2,153 insertions(+), 60 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/16959/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Fucun Chu (Code Review)" <ge...@cloudera.org>.
Fucun Chu has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 3:

(4 comments)

Thanks for the review! Addressed the comments.

http://gerrit.cloudera.org:8080/#/c/16959/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16959/2//COMMIT_MSG@15
PS2, Line 15: This affects files originated from hll/, kll/ and the
> also files were changed in hll/ kll/ dirs
Done


http://gerrit.cloudera.org:8080/#/c/16959/2//COMMIT_MSG@22
PS2, Line 22: https://github.com/apache/datasketches-cpp/tree/b2f749ed5ce6ba650f4259602b133c310c3a5ee4
> Could you include a link that point to the particular git commit of the rep
Done


http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/exprs/datasketches-test.cc
File be/src/exprs/datasketches-test.cc:

http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/exprs/datasketches-test.cc@176
PS2, Line 176:     for (int key = 0; key < 100000; key++) sketch1.update(key);
> nit: drop the extra empty line
Done


http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/thirdparty/datasketches/README.md
File be/src/thirdparty/datasketches/README.md:

http://gerrit.cloudera.org:8080/#/c/16959/2/be/src/thirdparty/datasketches/README.md@1
PS2, Line 1: CPC, HLL, KLL
> nit: please use a comma between these names. Additionally, could you please
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 3
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 06:05:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8014/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 2
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Sun, 17 Jan 2021 13:11:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10440: Import Theta functionality from DataSketches

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16959 )

Change subject: IMPALA-10440: Import Theta functionality from DataSketches
......................................................................


Patch Set 4: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16959
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8485d6829f50b130c84ec8bef0a4b5895255ba6c
Gerrit-Change-Number: 16959
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 Jan 2021 14:48:09 +0000
Gerrit-HasComments: No