You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kylin.apache.org by "Luke Han (JIRA)" <ji...@apache.org> on 2015/06/20 00:36:00 UTC

[jira] [Updated] (KYLIN-607) More efficient cube building

     [ https://issues.apache.org/jira/browse/KYLIN-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luke Han updated KYLIN-607:
---------------------------
    Sprint: Sprint 42, Sprint 43  (was: Sprint 42)

> More efficient cube building
> ----------------------------
>
>                 Key: KYLIN-607
>                 URL: https://issues.apache.org/jira/browse/KYLIN-607
>             Project: Kylin
>          Issue Type: New Feature
>          Components: Job Engine
>            Reporter: liyang
>            Assignee: liyang
>             Fix For: v0.8.1
>
>
> Right now cube building is by layer of spanning trees. The algorithm results a total shuffle size around [Avg Cardinality] * [Total Cube Size]. This is the current biggest bottleneck of cube building in eBay deployment.
> Propose a different algorithm:
> 1. Each mapper builds a cube segment independent, and output.
> 2. One round of shuffle merge sorts the segments.
> 3. Reducer outputs the final merged cube.
> This could achieve 1 * [Total Cube Size] shuffling when there's a mandatory dimension and each mapper takes a different piece on the dimension. E.g. month is mandatory and each mapper is assign a different month data.
> This algorithm is also more friendly to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)