You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Yerui Sun (JIRA)" <ji...@apache.org> on 2016/05/20 09:19:12 UTC

[jira] [Commented] (KYLIN-1379) More stable and functional precise count distinct implements after KYLIN-1186

    [ https://issues.apache.org/jira/browse/KYLIN-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15293057#comment-15293057 ] 

Yerui Sun commented on KYLIN-1379:
----------------------------------

I've worked on this issue for several weeks, and it's time to release. 
The new bitmap implementation will fully support all data types, based on cube-level append-able dictionary, which introduced by KYLIN-1705. 
In our scenario, the bitmap precision is difficult to decide. That's why we didn't introduced the precision concept in new version, instead of support any size bitmap. It may cause the fix-sized byte buffer overflow, and we have resolved this by KYLIN-1718.
We also found the query is much slower when query count distinct over 10M, and found the compression in endpoint is expensive. Here's an improvement in KYLIN-1719 by disable compression.
All the above issues has been pushed to one branch called KYLIN-1379-1705-1718-1719, thanks [~liyang.gmt8@gmail.com] for your help and reviewing, and any comments is welcome.

> More stable and functional precise count distinct implements after KYLIN-1186
> -----------------------------------------------------------------------------
>
>                 Key: KYLIN-1379
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1379
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v1.5.0, v1.3.0
>            Reporter: Yerui Sun
>            Assignee: Yerui Sun
>
> After KYLIN-1186, we've gained the ability to count distinct Int type columns precisely.
> However, the implements of KYLIN-1186 is not stable, especially in 2.x-staging branch.
> The reason is that the measure's maxlength is used to allocate memory in 2.x version, and the BitmapMeasure is hardcoded to 8MB in KYLIN-1186, causing OOM when cube building.
> To resolve this problem, we have introduce precision on the bitmap measure, such as bitmap(100), bitmap(10000), bitmap(1000000), meaning the measure could accept 100/10000/1M cardinality at most. This solution should be fine, considering the reality, if the count value over 1000000, the hyperloglog measure which produce approx. result should be acceptable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)