You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Fucun Chu (Jira)" <ji...@apache.org> on 2021/01/16 03:45:00 UTC
[jira] [Created] (IMPALA-10439) datasketches-theta
Fucun Chu created IMPALA-10439:
----------------------------------
Summary: datasketches-theta
Key: IMPALA-10439
URL: https://issues.apache.org/jira/browse/IMPALA-10439
Project: IMPALA
Issue Type: Epic
Components: Backend
Reporter: Fucun Chu
Implement the count(distinct) function from the DataSketches library for Theta in C++.
Theta sketch provides approximate distinct counting with set operations (union, intersection and set difference).
This can be used for retention analysis, eg: "How many unique users signed up in week 1, and purchased something in week 2?"
General info about the sketch:
https://datasketches.apache.org/docs/Theta/ThetaSketchFramework.html
C++ implementation to wrap:
https://github.com/apache/datasketches-cpp/tree/master/theta
Using thetaSketch in Druid:
https://druid.apache.org/docs/latest/development/extensions-core/datasketches-theta.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org