You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Zhong Yanghong (JIRA)" <ji...@apache.org> on 2018/08/10 07:53:00 UTC

[jira] [Updated] (KYLIN-3487) Create a new measure for precise count distinct

     [ https://issues.apache.org/jira/browse/KYLIN-3487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhong Yanghong updated KYLIN-3487:
----------------------------------
    Summary: Create a new measure for precise count distinct  (was: Create a new measure for count distinct)

> Create a new measure for precise count distinct
> -----------------------------------------------
>
>                 Key: KYLIN-3487
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3487
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>            Priority: Major
>
> In eBay, there'll be around 20M sessions each day. And there's a requirement to calculate the count distinct of sessions
> For deep dive, users want to get the session cardinality in a year, or even several years. If just for one year, the total cardinality will be around 20M*360 = 7B > 2B. It will exceed the the upper limitation of bitmap, and will not good for 
> To calculate the count distinct of session, if a session never crosses days, it's meaningless to merge the related counter, bitmap or hll, across days.
> For count distinct session, it's meaningless to merge across days, for session is never across days. Therefore, we may need a new measure containing a map, using the date info as the key, and using bitmap or hll as the value. When calculating count distinct, it's only need to get the state for each key-value entry and then to summarize the states. And we don't need merge bitmap or hll across different key-value entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)