You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gabor Kaszab (Code Review)" <ge...@cloudera.org> on 2020/12/02 08:07:17 UTC

[Impala-ASF-CR] IMPALA-10282: Implement ds cpc sketch() and ds cpc estimate() functions

Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/16656 )

Change subject: IMPALA-10282: Implement ds_cpc_sketch() and ds_cpc_estimate() functions
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16656/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16656/4//COMMIT_MSG@28
PS4, Line 28:  Ran manual tests on tpch_parquet.lineitem to compare perfomance
            :    with ndv(). Depending on data characteristics ndv() appears 2x-3x
            :    faster. CPC gives closer estimate than current ndv(). CPC is more
            :    accurate than HLL in some cases
Have you compared CPC and HLL in terms of runtime performance? It would be nice to see if any of them is faster. I see the link for comparison above, I just wanted to see some numbers when we run these algorithms through Impala.

Additionally, could you share more details when you compare CPC an HLL in terms of accuracy. You mention that in some cases CPC is more accurate. Could you mention which are these cases and what is the difference between the algorithms?



-- 
To view, visit http://gerrit.cloudera.org:8080/16656
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I731e66fbadc74bc339c973f4d9337db9b7dd715a
Gerrit-Change-Number: 16656
Gerrit-PatchSet: 4
Gerrit-Owner: Fucun Chu <ch...@hotmail.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 02 Dec 2020 08:07:17 +0000
Gerrit-HasComments: Yes