You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2018/07/28 14:09:14 UTC

svn commit: r1836910 - in /kylin/site: cn/docs/tutorial/cube_spark.html docs/tutorial/cube_spark.html feed.xml

Author: lidong
Date: Sat Jul 28 14:09:14 2018
New Revision: 1836910

URL: http://svn.apache.org/viewvc?rev=1836910&view=rev
Log:
Update configurations for spark engine

Modified:
    kylin/site/cn/docs/tutorial/cube_spark.html
    kylin/site/docs/tutorial/cube_spark.html
    kylin/site/feed.xml

Modified: kylin/site/cn/docs/tutorial/cube_spark.html
URL: http://svn.apache.org/viewvc/kylin/site/cn/docs/tutorial/cube_spark.html?rev=1836910&r1=1836909&r2=1836910&view=diff
==============================================================================
--- kylin/site/cn/docs/tutorial/cube_spark.html (original)
+++ kylin/site/cn/docs/tutorial/cube_spark.html Sat Jul 28 14:09:14 2018
@@ -193,21 +193,30 @@ export KYLIN_HOME=/usr/local/apache-kyli
 
 <p>Kylin 在 $KYLIN_HOME/spark 中嵌入一个 Spark binary (v2.1.2),所有使用 <em>“kylin.engine.spark-conf.”</em> 作为前缀的 Spark 配置属性都能在 $KYLIN_HOME/conf/kylin.properties 中进行管理。这些属性当运行提交 Spark job 时会被提取并应用;例如,如果您配置 “kylin.engine.spark-conf.spark.executor.memory=4G”,Kylin 将会在执行 “spark-submit” 操作时使用 “–conf spark.executor.memory=4G” 作为参数。</p>
 
-<p>运行 Spark cubing 前,建议查看一下这些配置并根据您集群的情况进行自定义。下面是默认配置,也是 sandbox 最低要求的配置 (1 个 1GB memory 的 executor);通常一个集群,需要更多的 executors 且每一个至少有 4GB memory 和 2 cores:</p>
+<p>运行 Spark cubing 前,建议查看一下这些配置并根据您集群的情况进行自定义。下面是建议配置,开启了 Spark 动态资源分配:</p>
 
 <div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.master=yarn
 kylin.engine.spark-conf.spark.submit.deployMode=cluster
+kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
+kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1
+kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000
+kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
 kylin.engine.spark-conf.spark.yarn.queue=default
+kylin.engine.spark-conf.spark.driver.memory=2G
 kylin.engine.spark-conf.spark.executor.memory=4G
 kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
-kylin.engine.spark-conf.spark.executor.cores=2
-kylin.engine.spark-conf.spark.executor.instances=40
+kylin.engine.spark-conf.spark.executor.cores=1
+kylin.engine.spark-conf.spark.network.timeout=600
 kylin.engine.spark-conf.spark.shuffle.service.enabled=true
+#kylin.engine.spark-conf.spark.executor.instances=1
 kylin.engine.spark-conf.spark.eventLog.enabled=true
+kylin.engine.spark-conf.spark.hadoop.dfs.replication=2
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec
+kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
 kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
 
-#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 
 ## uncomment for HDP
 #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current

Modified: kylin/site/docs/tutorial/cube_spark.html
URL: http://svn.apache.org/viewvc/kylin/site/docs/tutorial/cube_spark.html?rev=1836910&r1=1836909&r2=1836910&view=diff
==============================================================================
--- kylin/site/docs/tutorial/cube_spark.html (original)
+++ kylin/site/docs/tutorial/cube_spark.html Sat Jul 28 14:09:14 2018
@@ -5624,20 +5624,30 @@ export KYLIN_HOME=/usr/local/apache-kyli
 
 <p>Kylin embedes a Spark binary (v2.1.0) in $KYLIN_HOME/spark, all the Spark configurations can be managed in $KYLIN_HOME/conf/kylin.properties with prefix <em>“kylin.engine.spark-conf.”</em>. These properties will be extracted and applied when runs submit Spark job; E.g, if you configure “kylin.engine.spark-conf.spark.executor.memory=4G”, Kylin will use “–conf spark.executor.memory=4G” as parameter when execute “spark-submit”.</p>
 
-<p>Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the default configurations, which is also the minimal config for a sandbox (1 executor with 1GB memory); usually in a normal cluster, need much more executors and each has at least 4GB memory and 2 cores:</p>
+<p>Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the recommended configurations:</p>
 
 <div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.master=yarn
 kylin.engine.spark-conf.spark.submit.deployMode=cluster
+kylin.engine.spark-conf.spark.dynamicAllocation.enabled=true
+kylin.engine.spark-conf.spark.dynamicAllocation.minExecutors=1
+kylin.engine.spark-conf.spark.dynamicAllocation.maxExecutors=1000
+kylin.engine.spark-conf.spark.dynamicAllocation.executorIdleTimeout=300
 kylin.engine.spark-conf.spark.yarn.queue=default
-kylin.engine.spark-conf.spark.executor.memory=1G
-kylin.engine.spark-conf.spark.executor.cores=2
-kylin.engine.spark-conf.spark.executor.instances=1
+kylin.engine.spark-conf.spark.driver.memory=2G
+kylin.engine.spark-conf.spark.executor.memory=4G
+kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
+kylin.engine.spark-conf.spark.executor.cores=1
+kylin.engine.spark-conf.spark.network.timeout=600
+kylin.engine.spark-conf.spark.shuffle.service.enabled=true
+#kylin.engine.spark-conf.spark.executor.instances=1
 kylin.engine.spark-conf.spark.eventLog.enabled=true
+kylin.engine.spark-conf.spark.hadoop.dfs.replication=2
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress=true
+kylin.engine.spark-conf.spark.hadoop.mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.DefaultCodec
+kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
 kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
 kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
 
-#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
-
 ## uncomment for HDP
 #kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
 #kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current

Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1836910&r1=1836909&r2=1836910&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Sat Jul 28 14:09:14 2018
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 25 Jul 2018 06:59:25 -0700</pubDate>
-    <lastBuildDate>Wed, 25 Jul 2018 06:59:25 -0700</lastBuildDate>
+    <pubDate>Sat, 28 Jul 2018 06:59:23 -0700</pubDate>
+    <lastBuildDate>Sat, 28 Jul 2018 06:59:23 -0700</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>