You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "XiaoXiang Yu (JIRA)" <ji...@apache.org> on 2019/03/25 03:29:00 UTC
[jira] [Comment Edited] (KYLIN-3905) Enable shrunken dictionary
default
[ https://issues.apache.org/jira/browse/KYLIN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800378#comment-16800378 ]
XiaoXiang Yu edited comment on KYLIN-3905 at 3/25/19 3:28 AM:
--------------------------------------------------------------
h2. Comparsion and Summary
{quote} * CDH cluster with 56 vcore and 110GB Memory
* Fact Table 153326740 rows
* Build cube with three bitmap count-distinct measure, one column's cardinality is 55200325{quote}
h4. Without ShrunkenDict (Cannot completed)
* Build basecuboid cannot completed
!image-2019-03-25-11-26-59-198.png!
h4. With ShrunkenDict (Completed)
* New added step build ShrunkenDict for each map task
!image-2019-03-25-11-27-26-149.png!
* MapReduce Job Stats
!image-2019-03-25-11-27-46-175.png![!https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png|width=1151!|https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png]
!image-2019-03-25-11-28-14-256.png!
[!https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png!|https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png]
[!https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png!|https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png]
was (Author: hit_lacus):
h2. Comparsion and Summary
{quote} * CDH cluster with 56 vcore and 110GB Memory
* Fact Table 153326740 rows
* Build cube with three bitmap count-distinct measure, one column's cardinality is 55200325{quote}
h4. Without ShrunkenDict (Cannot completed)
* Build basecuboid cannot completed
h4. With ShrunkenDict (Completed)
* New added step build ShrunkenDict for each map task
[!https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png|width=355!|https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png]
* MapReduce Job Stats
[!https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png|width=1151!|https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png]
* ShrunkenDict in HDFS
[!https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png!|https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png]
[!https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png!|https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png]
> Enable shrunken dictionary default
> ----------------------------------
>
> Key: KYLIN-3905
> URL: https://issues.apache.org/jira/browse/KYLIN-3905
> Project: Kylin
> Issue Type: Improvement
> Reporter: XiaoXiang Yu
> Assignee: XiaoXiang Yu
> Priority: Minor
> Attachments: image-2019-03-25-11-26-59-198.png, image-2019-03-25-11-27-26-149.png, image-2019-03-25-11-27-46-175.png, image-2019-03-25-11-28-14-256.png
>
>
> When using bitmap measure on a large cardinality column(require global dictionaty), build base cuboid step need frequent cache swap so it cannot finished within a reasonable period.
> When shrunken dictionary enabled, a new step will be added to build separated dictionary for each `InputSplit`, Mapper of **BuildBaseCuboid** step only has to fetch a smaller dictionary for itself, instead of a larger global dictionary. It will reduce cache swap and make **BuildBaseCuboid** step run as quicker as possible.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)