You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@kylin.apache.org by "vu thanh dat (JIRA)" <ji...@apache.org> on 2017/12/21 07:58:00 UTC

[jira] [Created] (KYLIN-3123) Improve Spark Cubing

vu thanh dat created KYLIN-3123:
-----------------------------------

             Summary: Improve Spark Cubing
                 Key: KYLIN-3123
                 URL: https://issues.apache.org/jira/browse/KYLIN-3123
             Project: Kylin
          Issue Type: Improvement
          Components: Spark Engine
    Affects Versions: v2.2.0
         Environment: HDP , Hbase, Spark 2.6, Centos7
            Reporter: vu thanh dat
             Fix For: v2.2.0
         Attachments: dimension.bmp, measures.bmp, rowkeys.bmp, spark_so_slow_2.bmp

Hi all,
Im using Spark to bulid Kylin cube.
Data is about 13 millions rows for one step. Partition by date, 10 dimension, no measures.
I set config:
kylin.storage.hbase.compression-codec=snappy
kylin.engine.spark.rdd-partition-cut-mb=1000
kylin.engine.spark.max-partition=5000
kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=100
kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=10240
kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.shuffle.service.port=7337
kylin.engine.spark-conf.spark.yarn.queue=default
kylin.engine.spark-conf.spark.executor.memory=4G
kylin.engine.spark-conf.spark.executor.cores=4
Step Build Cube with Spark so slow, about 1hour for this step, can you show me to custom kylin config for speed up this step. I have 30s servers centos, storage 5.87T and 448 cores.
I'm attach my config.
Best regards and thanks!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)