You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/01/27 14:58:00 UTC

[jira] [Commented] (KYLIN-3729) CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC global dict

    [ https://issues.apache.org/jira/browse/KYLIN-3729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753410#comment-16753410 ] 

ASF subversion and git services commented on KYLIN-3729:
--------------------------------------------------------

Commit 72a9a9f7cc3679c5eaae3357d80e15c7698d1671 in kylin's branch refs/heads/2.5.x from dengfangyuan
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=72a9a9f ]

KYLIN-3729 CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC global dict


> CLUSTER BY CAST(field AS STRING) will accelerate base cuboid build with UHC global dict
> ---------------------------------------------------------------------------------------
>
>                 Key: KYLIN-3729
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3729
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v2.5.2
>            Reporter: Fangyuan Deng
>            Assignee: Fangyuan Deng
>            Priority: Minor
>             Fix For: v2.6.0
>
>         Attachments: KYLIN-3729.1.patch, image-2018-12-19-12-01-20-430.png, image-2018-12-19-12-02-08-913.png
>
>
> As we know global dict is a sliced  appendTrieTree using cache-loader , so if we convert values to ids using global dict, ordered values will help.
> And now we can set kylin.source.hive.flat-table-cluster-by-dict-column = uhc column, to make source data CLUSTER BY uhc-column, this get better.
> But the appendTrieTree is order by string, so we can  CLUSTER BY CAST(uhc-column AS STRING), to optimize most.
> We can see the hdfs bytes read (most is global dict) reduce to 30%
> !image-2018-12-19-12-01-20-430.png!!image-2018-12-19-12-02-08-913.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)