You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "XiaoXiang Yu (JIRA)" <ji...@apache.org> on 2019/03/25 03:27:00 UTC

[jira] [Commented] (KYLIN-3905) Enable shrunken dictionary default

    [ https://issues.apache.org/jira/browse/KYLIN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800378#comment-16800378 ] 

XiaoXiang Yu commented on KYLIN-3905:
-------------------------------------

h2. Comparsion and Summary
{quote} * CDH cluster with 56 vcore and 110GB Memory
 * Fact Table 153326740 rows
 * Build cube with three bitmap count-distinct measure, one column's cardinality is 55200325{quote}
h4. Without ShrunkenDict (Cannot completed)
 * Build basecuboid cannot completed

h4. With ShrunkenDict (Completed)
 * New added step build ShrunkenDict for each map task

[!https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png|width=355!|https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png]
 * MapReduce Job Stats

[!https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png|width=1151!|https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png]
 * ShrunkenDict in HDFS
[!https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png!|https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png]

[!https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png!|https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png]

> Enable shrunken dictionary default
> ----------------------------------
>
>                 Key: KYLIN-3905
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3905
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: XiaoXiang Yu
>            Assignee: XiaoXiang Yu
>            Priority: Minor
>
> When using bitmap measure on a large cardinality column(require global dictionaty), build base cuboid step need frequent cache swap so it cannot finished within a reasonable period.
> When shrunken dictionary enabled, a new step will be added to build separated dictionary for each `InputSplit`, Mapper of **BuildBaseCuboid** step only has to fetch a smaller dictionary for itself, instead of a larger global dictionary. It will reduce cache swap and make **BuildBaseCuboid** step run as quicker as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)