You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Xiaoxiang Yu (Jira)" <ji...@apache.org> on 2020/07/31 12:23:01 UTC

[jira] [Closed] (KYLIN-4342) Build Global Dict by MR/Hive New Version

     [ https://issues.apache.org/jira/browse/KYLIN-4342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiaoxiang Yu closed KYLIN-4342.
-------------------------------

Resolved in release 3.1.0 (2020-07-03)

> Build Global Dict by MR/Hive New Version
> ----------------------------------------
>
>                 Key: KYLIN-4342
>                 URL: https://issues.apache.org/jira/browse/KYLIN-4342
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: wangxiaojing
>            Assignee: wangxiaojing
>            Priority: Major
>             Fix For: v3.1.0
>
>
> At present, there are two limitations and some distributed concurrency lock bugs in the implementation of global dictionary through MR/Hive:
> 1. Limited by Hive order by global sorting on the shuffle stage, the memory and build time becomes uncontrollable with data volume reaching billion level. We have tested the base of 800 million level to configure 15g memory, and the build time of build dictionary needs more than 10 hours;
> 2. Multi global dictionary columns is calculated serially.
> 3. Some distributed concurrency lock bugs.
> We have improved the original version.The general idea of the new version is the same as the previous Mr / Hive implementation, that is, to complete global dictionary coding through Hive or MR, and then replace the original value in the flat table with the dictionary encoded value.[Mr /Hive V1|[http://kylin.apache.org/docs30/howto/howto_use_hive_mr_dict.html]]
>  However, in the new version, will add "parallel part build" and "parallel total build" two steps by mr to replace the original "build dict" step, so as to solve the above two limitations.And use ZK to solve the distributed concurrency lock bugs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)