You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "XiaoXiang Yu (JIRA)" <ji...@apache.org> on 2019/03/25 03:27:00 UTC
[jira] [Commented] (KYLIN-3905) Enable shrunken dictionary default
[ https://issues.apache.org/jira/browse/KYLIN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800378#comment-16800378 ]
XiaoXiang Yu commented on KYLIN-3905:
-------------------------------------
h2. Comparsion and Summary
{quote} * CDH cluster with 56 vcore and 110GB Memory
* Fact Table 153326740 rows
* Build cube with three bitmap count-distinct measure, one column's cardinality is 55200325{quote}
h4. Without ShrunkenDict (Cannot completed)
* Build basecuboid cannot completed
h4. With ShrunkenDict (Completed)
* New added step build ShrunkenDict for each map task
[!https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png|width=355!|https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png]
* MapReduce Job Stats
[!https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png|width=1151!|https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png]
* ShrunkenDict in HDFS
[!https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png!|https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png]
[!https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png!|https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png]
> Enable shrunken dictionary default
> ----------------------------------
>
> Key: KYLIN-3905
> URL: https://issues.apache.org/jira/browse/KYLIN-3905
> Project: Kylin
> Issue Type: Improvement
> Reporter: XiaoXiang Yu
> Assignee: XiaoXiang Yu
> Priority: Minor
>
> When using bitmap measure on a large cardinality column(require global dictionaty), build base cuboid step need frequent cache swap so it cannot finished within a reasonable period.
> When shrunken dictionary enabled, a new step will be added to build separated dictionary for each `InputSplit`, Mapper of **BuildBaseCuboid** step only has to fetch a smaller dictionary for itself, instead of a larger global dictionary. It will reduce cache swap and make **BuildBaseCuboid** step run as quicker as possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)