You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by sh...@apache.org on 2018/07/27 10:47:01 UTC

[kylin] branch document updated: Update configurations for spark engine

This is an automated email from the ASF dual-hosted git repository.

shaofengshi pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


The following commit(s) were added to refs/heads/document by this push:
     new b65f6b1  Update configurations for spark engine
b65f6b1 is described below

commit b65f6b1a80667c8d18e5c2dc4323da8c3118c463
Author: shaofengshi <sh...@apache.org>
AuthorDate: Fri Jul 27 18:46:52 2018 +0800

    Update configurations for spark engine
---
 website/_docs/tutorial/cube_spark.cn.md | 17 +++++++++++++----
 website/_docs/tutorial/cube_spark.md    | 22 ++++++++++++++++------
 2 files changed, 29 insertions(+), 10 deletions(-)

diff --git a/website/_docs/tutorial/cube_spark.cn.md b/website/_docs/tutorial/cube_spark.cn.md
index d3cc58b..b1909cc 100644
--- a/website/_docs/tutorial/cube_spark.cn.md
+++ b/website/_docs/tutorial/cube_spark.cn.md
@@ -37,22 +37,31 @@ kylin.env.hadoop-conf-dir=/etc/hadoop/conf
 
 Kylin 在 $KYLIN_HOME/spark 中嵌入一个 Spark binary (v2.1.2),所有使用 *"kylin.engine.spark-conf."* 作为前缀的 Spark 配置属性都能在 $KYLIN_HOME/conf/kylin.properties 中进行管理。这些属性当运行提交 Spark job 时会被提取并应用;例如,如果您配置 "kylin.engine.spark-conf.spark.executor.memory=4G",Kylin 将会在执行 "spark-submit" 操作时使用 "--conf spark.executor.memory=4G" 作为参数。
 
-运行 Spark cubing 前,建议查看一下这些配置并根据您集群的情况进行自定义。下面是默认配置,也是 sandbox 最低要求的配置 (1 个 1GB memory 的 executor);通常一个集群,需要更多的 executors 且每一个至少有 4GB memory 和 2 cores:
+运行 Spark cubing 前,建议查看一下这些配置并根据您集群的情况进行自定义。下面是建议配置,开启了 Spark 动态资源分配:
 
 {% highlight Groff markup %}
 kylin.engine.spark-conf.spark.master=yarn
 kylin.engine.spark-conf.spark.submit.deployMode=cluster
+kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
+kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1
+kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000
+kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
 kylin.engine.spark-conf.spark.yarn.queue=default
+kylin.engine.spark-conf.spark.driver.memory=2G
 kylin.engine.spark-conf.spark.executor.memory=4G
 kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
-kylin.engine.spark-conf.spark.executor.cores=2
-kylin.engine.spark-conf.spark.executor.instances=40
+kylin.engine.spark-conf.spark.executor.cores=1
+kylin.engine.spark-conf.spark.network.timeout=600
 kylin.engine.spark-conf.spark.shuffle.service.enabled=true
+#kylin.engine.spark-conf.spark.executor.instances=1
 kylin.engine.spark-conf.spark.eventLog.enabled=true
+kylin.engine.spark-conf.spark.hadoop.dfs.replication=2
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec
+kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
 kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
 
-#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 
 ## uncomment for HDP
 #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
diff --git a/website/_docs/tutorial/cube_spark.md b/website/_docs/tutorial/cube_spark.md
index 4770e48..84e0ae5 100644
--- a/website/_docs/tutorial/cube_spark.md
+++ b/website/_docs/tutorial/cube_spark.md
@@ -31,21 +31,31 @@ To run Spark on Yarn, need specify **HADOOP_CONF_DIR** environment variable, whi
 
 Kylin embedes a Spark binary (v2.1.0) in $KYLIN_HOME/spark, all the Spark configurations can be managed in $KYLIN_HOME/conf/kylin.properties with prefix *"kylin.engine.spark-conf."*. These properties will be extracted and applied when runs submit Spark job; E.g, if you configure "kylin.engine.spark-conf.spark.executor.memory=4G", Kylin will use "--conf spark.executor.memory=4G" as parameter when execute "spark-submit".
 
-Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the default configurations, which is also the minimal config for a sandbox (1 executor with 1GB memory); usually in a normal cluster, need much more executors and each has at least 4GB memory and 2 cores:
+Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the recommended configurations:
 
 {% highlight Groff markup %}
 kylin.engine.spark-conf.spark.master=yarn
 kylin.engine.spark-conf.spark.submit.deployMode=cluster
+kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
+kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1
+kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000
+kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
 kylin.engine.spark-conf.spark.yarn.queue=default
-kylin.engine.spark-conf.spark.executor.memory=1G
-kylin.engine.spark-conf.spark.executor.cores=2
-kylin.engine.spark-conf.spark.executor.instances=1
+kylin.engine.spark-conf.spark.driver.memory=2G
+kylin.engine.spark-conf.spark.executor.memory=4G
+kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
+kylin.engine.spark-conf.spark.executor.cores=1
+kylin.engine.spark-conf.spark.network.timeout=600
+kylin.engine.spark-conf.spark.shuffle.service.enabled=true
+#kylin.engine.spark-conf.spark.executor.instances=1
 kylin.engine.spark-conf.spark.eventLog.enabled=true
+kylin.engine.spark-conf.spark.hadoop.dfs.replication=2
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec
+kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
 kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
 
-#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
-
 ## uncomment for HDP
 #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
 #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current