You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Zhong Yanghong (JIRA)" <ji...@apache.org> on 2018/08/10 07:53:00 UTC
[jira] [Updated] (KYLIN-3487) Create a new measure for precise
count distinct
[ https://issues.apache.org/jira/browse/KYLIN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zhong Yanghong updated KYLIN-3487:
----------------------------------
Summary: Create a new measure for precise count distinct (was: Create a new measure for count distinct)
> Create a new measure for precise count distinct
> -----------------------------------------------
>
> Key: KYLIN-3487
> URL: https://issues.apache.org/jira/browse/KYLIN-3487
> Project: Kylin
> Issue Type: Improvement
> Reporter: Zhong Yanghong
> Assignee: Zhong Yanghong
> Priority: Major
>
> In eBay, there'll be around 20M sessions each day. And there's a requirement to calculate the count distinct of sessions
> For deep dive, users want to get the session cardinality in a year, or even several years. If just for one year, the total cardinality will be around 20M*360 = 7B > 2B. It will exceed the the upper limitation of bitmap, and will not good for
> To calculate the count distinct of session, if a session never crosses days, it's meaningless to merge the related counter, bitmap or hll, across days.
> For count distinct session, it's meaningless to merge across days, for session is never across days. Therefore, we may need a new measure containing a map, using the date info as the key, and using bitmap or hll as the value. When calculating count distinct, it's only need to get the state for each key-value entry and then to summarize the states. And we don't need merge bitmap or hll across different key-value entries.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)