You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/07/11 03:00:00 UTC

[jira] [Commented] (IMPALA-10439) Implement count(distinct) function (DataSketches/Theta)

    [ https://issues.apache.org/jira/browse/IMPALA-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17378567#comment-17378567 ] 

Quanlong Huang commented on IMPALA-10439:
-----------------------------------------

[~chufucun] Can we resolve this? Or is there any new tasks remaining?

> Implement count(distinct) function (DataSketches/Theta)
> -------------------------------------------------------
>
>                 Key: IMPALA-10439
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10439
>             Project: IMPALA
>          Issue Type: Epic
>          Components: Backend
>            Reporter: Fucun Chu
>            Priority: Major
>
> Implement the count(distinct) function from the DataSketches library for Theta in C++.
> Theta sketch provides approximate distinct counting with set operations (union, intersection and set difference).
> This can be used for retention analysis, eg: "How many unique users signed up in week 1, and purchased something in week 2?"
> General info about the sketch:
> https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html
> C++ implementation to wrap:
> https://github.com/apache/datasketches-cpp/tree/master/theta
> Using thetaSketch in Druid:
> https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org