You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "Luke Han (JIRA)" <ji...@apache.org> on 2015/06/20 00:36:00 UTC
[jira] [Updated] (KYLIN-607) More efficient cube building
[ https://issues.apache.org/jira/browse/KYLIN-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Han updated KYLIN-607:
---------------------------
Sprint: Sprint 42, Sprint 43 (was: Sprint 42)
> More efficient cube building
> ----------------------------
>
> Key: KYLIN-607
> URL: https://issues.apache.org/jira/browse/KYLIN-607
> Project: Kylin
> Issue Type: New Feature
> Components: Job Engine
> Reporter: liyang
> Assignee: liyang
> Fix For: v0.8.1
>
>
> Right now cube building is by layer of spanning trees. The algorithm results a total shuffle size around [Avg Cardinality] * [Total Cube Size]. This is the current biggest bottleneck of cube building in eBay deployment.
> Propose a different algorithm:
> 1. Each mapper builds a cube segment independent, and output.
> 2. One round of shuffle merge sorts the segments.
> 3. Reducer outputs the final merged cube.
> This could achieve 1 * [Total Cube Size] shuffling when there's a mandatory dimension and each mapper takes a different piece on the dimension. E.g. month is mandatory and each mapper is assign a different month data.
> This algorithm is also more friendly to streaming.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)