You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kylin.apache.org by "Shaofeng SHI (JIRA)" <ji...@apache.org> on 2017/12/21 09:58:00 UTC
[jira] [Commented] (KYLIN-3123) Improve Spark Cubing
[ https://issues.apache.org/jira/browse/KYLIN-3123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16299790#comment-16299790 ]
Shaofeng SHI commented on KYLIN-3123:
-------------------------------------
There seems only 1 partition in the RDD, so no parallelism; Change this config to a smaller value like 50 as your cube has no "count distinct" measure:
kylin.engine.spark.rdd-partition-cut-mb=50
Besides, "minExecutors" is too big I think; many executors might be idle; you can set a smaller value to it.
Just take a try.
> Improve Spark Cubing
> --------------------
>
> Key: KYLIN-3123
> URL: https://issues.apache.org/jira/browse/KYLIN-3123
> Project: Kylin
> Issue Type: Improvement
> Components: Spark Engine
> Affects Versions: v2.2.0
> Environment: HDP , Hbase, Spark 2.6, Centos7
> Reporter: vu thanh dat
> Labels: beginner
> Fix For: v2.2.0
>
> Attachments: dimension.bmp, measures.bmp, rowkeys.bmp, spark_so_slow_2.bmp
>
>
> Hi all,
> Im using Spark to bulid Kylin cube.
> Data is about 13 millions rows for one step. Partition by date, 10 dimension, no measures.
> I set config:
> kylin.storage.hbase.compression-codec=snappy
> kylin.engine.spark.rdd-partition-cut-mb=1000
> kylin.engine.spark.max-partition=5000
> kylin.engine.spark-conf.spark.master=yarn
> kylin.engine.spark-conf.spark.submit.deployMode=cluster
> kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
> kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=100
> kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=10240
> kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
> kylin.engine.spark-conf.spark.shuffle.service.enabled=true
> kylin.engine.spark-conf.spark.shuffle.service.port=7337
> kylin.engine.spark-conf.spark.yarn.queue=default
> kylin.engine.spark-conf.spark.executor.memory=4G
> kylin.engine.spark-conf.spark.executor.cores=4
> Step Build Cube with Spark so slow, about 1hour for this step, can you show me to custom kylin config for speed up this step. I have 30s servers centos, storage 5.87T and 448 cores.
> I'm attach my config.
> Best regards and thanks!
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)