You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "kangkaisen (JIRA)" <ji...@apache.org> on 2016/12/12 04:41:58 UTC

[jira] [Commented] (KYLIN-2269) Reduce MR memory usage for global dict

    [ https://issues.apache.org/jira/browse/KYLIN-2269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15740954#comment-15740954 ] 

kangkaisen commented on KYLIN-2269:
-----------------------------------

To resolve the issue, we could use {{CLUSTER BY}} to make the the mapper input of {{Build Base Cuboid Data}} is sequential.  since the input is sequential, we could only use default memory size for mapper to load the global dict slice in turn. 

Of course, this method could only handle one ultra high cardinality column well. but which is most scenarios.



> Reduce MR memory usage for global dict
> --------------------------------------
>
>                 Key: KYLIN-2269
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2269
>             Project: Kylin
>          Issue Type: Improvement
>    Affects Versions: v1.6.0
>            Reporter: kangkaisen
>            Assignee: kangkaisen
>
> currently, in {{Build Base Cuboid Data}}, if user use the global dict and the global dict size significantly larger the mapper memory size, the {{CachedTreeMap}} will load all values as much as possible and the soft references object will stick around for a while when GC, So which will make the {{Build Base Cuboid Data}}  mapper pause for a long time even could not  finish.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)