You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kylin.apache.org by "ZhouQianhao (JIRA)" <ji...@apache.org> on 2015/10/25 17:55:27 UTC

[jira] [Created] (KYLIN-1094) improve performance of spark cubing

ZhouQianhao created KYLIN-1094:
----------------------------------

             Summary: improve performance of spark cubing
                 Key: KYLIN-1094
                 URL: https://issues.apache.org/jira/browse/KYLIN-1094
             Project: Kylin
          Issue Type: Improvement
          Components: Spark Engine
    Affects Versions: v2.0
            Reporter: ZhouQianhao
            Assignee: ZhouQianhao


POC result of spark cubing shows that, on a dataset of 150 million records, MR is about 100% faster than Spark, however we believe that Spark could be at least at same speed as MR, so optimization is needed here.
We are asking Spark community for help now.

the cluster info:
vm: 8 nodes * (128G mem + 64 core)
hadoop cluster: hdp 2.2.6
spark running mode: yarn-client
spark version: 1.5.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)