You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "XiaoXiang Yu (JIRA)" <ji...@apache.org> on 2019/03/25 03:29:00 UTC

[jira] [Comment Edited] (KYLIN-3905) Enable shrunken dictionary default

    [ https://issues.apache.org/jira/browse/KYLIN-3905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800378#comment-16800378 ] 

XiaoXiang Yu edited comment on KYLIN-3905 at 3/25/19 3:28 AM:
--------------------------------------------------------------

h2. Comparsion and Summary
{quote} * CDH cluster with 56 vcore and 110GB Memory
 * Fact Table 153326740 rows
 * Build cube with three bitmap count-distinct measure, one column's cardinality is 55200325{quote}
h4. Without ShrunkenDict (Cannot completed)
 * Build basecuboid cannot completed

!image-2019-03-25-11-26-59-198.png!
h4. With ShrunkenDict (Completed)
 * New added step build ShrunkenDict for each map task

!image-2019-03-25-11-27-26-149.png!
 * MapReduce Job Stats

!image-2019-03-25-11-27-46-175.png![!https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png|width=1151!|https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png]

!image-2019-03-25-11-28-14-256.png!

[!https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png!|https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png]

[!https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png!|https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png]


was (Author: hit_lacus):
h2. Comparsion and Summary
{quote} * CDH cluster with 56 vcore and 110GB Memory
 * Fact Table 153326740 rows
 * Build cube with three bitmap count-distinct measure, one column's cardinality is 55200325{quote}
h4. Without ShrunkenDict (Cannot completed)
 * Build basecuboid cannot completed

h4. With ShrunkenDict (Completed)
 * New added step build ShrunkenDict for each map task

[!https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png|width=355!|https://user-images.githubusercontent.com/14030549/54500164-cd2efd00-4954-11e9-85a1-8ae5e67063c7.png]
 * MapReduce Job Stats

[!https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png|width=1151!|https://user-images.githubusercontent.com/14030549/54500186-12532f00-4955-11e9-9d61-202f92ca54e5.png]
 * ShrunkenDict in HDFS
[!https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png!|https://user-images.githubusercontent.com/14030549/54341171-286ea000-4674-11e9-8d99-560e94d37cc4.png]

[!https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png!|https://user-images.githubusercontent.com/14030549/54341626-3a9d0e00-4675-11e9-962c-c6a805f3208f.png]

> Enable shrunken dictionary default
> ----------------------------------
>
>                 Key: KYLIN-3905
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3905
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: XiaoXiang Yu
>            Assignee: XiaoXiang Yu
>            Priority: Minor
>         Attachments: image-2019-03-25-11-26-59-198.png, image-2019-03-25-11-27-26-149.png, image-2019-03-25-11-27-46-175.png, image-2019-03-25-11-28-14-256.png
>
>
> When using bitmap measure on a large cardinality column(require global dictionaty), build base cuboid step need frequent cache swap so it cannot finished within a reasonable period.
> When shrunken dictionary enabled, a new step will be added to build separated dictionary for each `InputSplit`, Mapper of **BuildBaseCuboid** step only has to fetch a smaller dictionary for itself, instead of a larger global dictionary. It will reduce cache swap and make **BuildBaseCuboid** step run as quicker as possible.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)