You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "kangkaisen (JIRA)" <ji...@apache.org> on 2017/09/02 11:25:00 UTC

[jira] [Updated] (KYLIN-2764) Build the dict for UHC column with MR

     [ https://issues.apache.org/jira/browse/KYLIN-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

kangkaisen updated KYLIN-2764:
------------------------------
    Attachment: job-memory-before.png
                job-memory-after.png

This commit has run a long time in our prod env. 

The two pictures show this commit could remarkably reducer memory usage for Kylin JobServer, in addition to this, which could remarkably improve Concurrent ability for Kylin JobServer.  After applied this commit, we have removed one JobServer from all three JobServers.

> Build the dict for UHC column with MR
> -------------------------------------
>
>                 Key: KYLIN-2764
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2764
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Job Engine
>    Affects Versions: v2.0.0
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>         Attachments: job-memory-after.png, job-memory-before.png
>
>
> KYLIN-2217 has built dict for  normal column with MR,  but the UHC column still build dict in JobServer. Like KYLIN-2217, we also could use MR build dict for UHC column. which could thoroughly release the memory pressure and  improve job concurrent for JobServer  as well as speed up multi UHC columns procedure.
> The MR input is the output of  "Extract Fact Table Distinct Columns", the MR output is the UHC column dict. Because it is very hard build global dict with multi reducers, I use one reducer handle one UHC column and allocate enough memory to the reducer. According to my test, 8G memory is enough.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)