You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2018/07/19 14:07:54 UTC
svn commit: r1836274 - in /kylin/site: cn/docs/tutorial/cube_spark.html
feed.xml
Author: lidong
Date: Thu Jul 19 14:07:53 2018
New Revision: 1836274
URL: http://svn.apache.org/viewvc?rev=1836274&view=rev
Log:
update spark cubing cn doc
Modified:
kylin/site/cn/docs/tutorial/cube_spark.html
kylin/site/feed.xml
Modified: kylin/site/cn/docs/tutorial/cube_spark.html
URL: http://svn.apache.org/viewvc/kylin/site/cn/docs/tutorial/cube_spark.html?rev=1836274&r1=1836273&r2=1836274&view=diff
==============================================================================
--- kylin/site/cn/docs/tutorial/cube_spark.html (original)
+++ kylin/site/cn/docs/tutorial/cube_spark.html Thu Jul 19 14:07:53 2018
@@ -183,34 +183,26 @@ export KYLIN_HOME=/usr/local/apache-kyli
<h2 id="kylinenvhadoop-conf-dir">åå¤ âkylin.env.hadoop-conf-dirâ</h2>
-<p>为使 Spark è¿è¡å¨ Yarn ä¸ï¼éæå® <strong>HADOOP_CONF_DIR</strong> ç¯å¢åéï¼å
¶æ¯ä¸ä¸ªå
å« Hadoopï¼å®¢æ·ç«¯) é
ç½®æ件çç®å½ãè®¸å¤ Hadoop åå¸å¼çç®å½è®¾ç½®ä¸º â/etc/hadoop/confâï¼ä½ Kylin ä¸ä»
éè¦è®¿é® HDFSï¼Yarn å Hiveï¼è¿æ HBaseï¼å æ¤é»è®¤çç®å½å¯è½å¹¶æªå
å«ææéè¦çæ件ãå¨æ¤ç¨ä¾ä¸ï¼æ¨éè¦å建ä¸ä¸ªæ°çç®å½ç¶åæ·è´æè
è¿æ¥è¿äºå®¢æ·ç«¯æ件 (core-site.xmlï¼hdfs-site.xmlï¼yarn-site.xmlï¼hive-site.xml å hbase-site.xml) å°è¿ä¸ªç®å½ä¸ãå¨ HDP 2.4 ä¸ï¼hive-tez å Spark ä¹é´æä
¸ªå²çªï¼å æ¤å½ä¸º Kylin è¿è¡å¤å¶æ¶ï¼éè¦å°é»è®¤ç engine ç± âtezâ æ¢ä¸º âmrâã</p>
+<p>为使 Spark è¿è¡å¨ Yarn ä¸ï¼éæå® <strong>HADOOP_CONF_DIR</strong> ç¯å¢åéï¼å
¶æ¯ä¸ä¸ªå
å« Hadoopï¼å®¢æ·ç«¯) é
ç½®æ件çç®å½ï¼éå¸¸æ¯ <code class="highlighter-rouge">/etc/hadoop/conf</code>ã</p>
-<div class="highlight"><pre><code class="language-groff" data-lang="groff">mkdir $KYLIN_HOME/hadoop-conf
-ln -s /etc/hadoop/conf/core-site.xml $KYLIN_HOME/hadoop-conf/core-site.xml
-ln -s /etc/hadoop/conf/hdfs-site.xml $KYLIN_HOME/hadoop-conf/hdfs-site.xml
-ln -s /etc/hadoop/conf/yarn-site.xml $KYLIN_HOME/hadoop-conf/yarn-site.xml
-ln -s /etc/hbase/2.4.0.0-169/0/hbase-site.xml $KYLIN_HOME/hadoop-conf/hbase-site.xml
-cp /etc/hive/2.4.0.0-169/0/hive-site.xml $KYLIN_HOME/hadoop-conf/hive-site.xml
-vi $KYLIN_HOME/hadoop-conf/hive-site.xml (change "hive.execution.engine" value from "tez" to "mr")</code></pre></div>
+<p>é常 Kylin ä¼å¨å¯å¨æ¶ä» Java classpath ä¸æ£æµ Hadoop é
ç½®ç®å½ï¼å¹¶ä½¿ç¨å®æ¥å¯å¨ Sparkã å¦ææ¨çç¯å¢ä¸æªè½æ£ç¡®åç°æ¤ç®å½ï¼é£ä¹å¯ä»¥æ¾å¼å°æå®æ¤ç®å½ï¼å¨ <code class="highlighter-rouge">kylin.properties</code> ä¸è®¾ç½®å±æ§ âkylin.env.hadoop-conf-dirâ 好让 Kylin ç¥éè¿ä¸ªç®å½:</p>
-<p>ç°å¨ï¼å¨ kylin.properties ä¸è®¾ç½®å±æ§ âkylin.env.hadoop-conf-dirâ 好让 Kylin ç¥éè¿ä¸ªç®å½:</p>
-
-<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.env.hadoop-conf-dir=/usr/local/apache-kylin-2.1.0-bin-hbase1x/hadoop-conf</code></pre></div>
-
-<p>å¦æè¿ä¸ªå±æ§æ²¡æ设置ï¼Kylin å°ä¼ä½¿ç¨ âhive-site.xmlâ ä¸çé»è®¤ç®å½ï¼ç¶èé£ä¸ªæ件夹å¯è½å¹¶æ²¡æ âhbase-site.xmlâï¼ä¼å¯¼è´ Spark ç HBase/ZK è¿æ¥é误ã</p>
+<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.env.hadoop-conf-dir=/etc/hadoop/conf</code></pre></div>
<h2 id="spark-">æ£æ¥ Spark é
ç½®</h2>
-<p>Kylin å¨ $KYLIN_HOME/spark ä¸åµå
¥ä¸ä¸ª Spark binary (v2.1.0)ï¼ææä½¿ç¨ <em>âkylin.engine.spark-conf.â</em> ä½ä¸ºåç¼ç Spark é
ç½®å±æ§é½è½å¨ $KYLIN_HOME/conf/kylin.properties ä¸è¿è¡ç®¡çãè¿äºå±æ§å½è¿è¡æ交 Spark job æ¶ä¼è¢«æå并åºç¨ï¼ä¾å¦ï¼å¦ææ¨é
ç½® âkylin.engine.spark-conf.spark.executor.memory=4Gâï¼Kylin å°ä¼å¨æ§è¡ âspark-submitâ æä½æ¶ä½¿ç¨ ââconf spark.executor.memory=4Gâ ä½ä¸ºåæ°ã</p>
+<p>Kylin å¨ $KYLIN_HOME/spark ä¸åµå
¥ä¸ä¸ª Spark binary (v2.1.2)ï¼ææä½¿ç¨ <em>âkylin.engine.spark-conf.â</em> ä½ä¸ºåç¼ç Spark é
ç½®å±æ§é½è½å¨ $KYLIN_HOME/conf/kylin.properties ä¸è¿è¡ç®¡çãè¿äºå±æ§å½è¿è¡æ交 Spark job æ¶ä¼è¢«æå并åºç¨ï¼ä¾å¦ï¼å¦ææ¨é
ç½® âkylin.engine.spark-conf.spark.executor.memory=4Gâï¼Kylin å°ä¼å¨æ§è¡ âspark-submitâ æä½æ¶ä½¿ç¨ ââconf spark.executor.memory=4Gâ ä½ä¸ºåæ°ã</p>
<p>è¿è¡ Spark cubing åï¼å»ºè®®æ¥çä¸ä¸è¿äºé
ç½®å¹¶æ ¹æ®æ¨é群çæ
åµè¿è¡èªå®ä¹ãä¸é¢æ¯é»è®¤é
ç½®ï¼ä¹æ¯ sandbox æä½è¦æ±çé
ç½® (1 个 1GB memory ç executor)ï¼é常ä¸ä¸ªé群ï¼éè¦æ´å¤ç executors ä¸æ¯ä¸ä¸ªè³å°æ 4GB memory å 2 cores:</p>
<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.master=yarn
kylin.engine.spark-conf.spark.submit.deployMode=cluster
kylin.engine.spark-conf.spark.yarn.queue=default
-kylin.engine.spark-conf.spark.executor.memory=1G
+kylin.engine.spark-conf.spark.executor.memory=4G
+kylin.engine.spark-conf.spark.yarn.executor.memoryOverhead=1024
kylin.engine.spark-conf.spark.executor.cores=2
-kylin.engine.spark-conf.spark.executor.instances=1
+kylin.engine.spark-conf.spark.executor.instances=40
+kylin.engine.spark-conf.spark.shuffle.service.enabled=true
kylin.engine.spark-conf.spark.eventLog.enabled=true
kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
@@ -222,9 +214,9 @@ kylin.engine.spark-conf.spark.history.fs
#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current</code></pre></div>
-<p>为äºå¨ Hortonworks å¹³å°ä¸è¿è¡ï¼éè¦å° âhdp.versionâ æå®ä¸º Yarn 容å¨ç Java é项ï¼å æ¤è¯·åæ¶ kylin.properties çæåä¸è¡ã</p>
+<p>为äºå¨ Hortonworks å¹³å°ä¸è¿è¡ï¼éè¦å° âhdp.versionâ æå®ä¸º Yarn 容å¨ç Java é项ï¼å æ¤è¯·åæ¶ kylin.properties çæåä¸è¡ç注éã</p>
-<p>é¤æ¤ä¹å¤ï¼ä¸ºäºé¿å
éå¤ä¸ä¼ Spark jar å
å° Yarnï¼æ¨å¯ä»¥æå¨ä¸ä¼ ä¸æ¬¡ï¼ç¶åé
ç½® jar å
ç HDFS è·¯å¾ï¼è¯·æ³¨æï¼HDFS è·¯å¾å¿
é¡»æ¯å
¨éå®åã</p>
+<p>é¤æ¤ä¹å¤ï¼ä¸ºäºé¿å
éå¤ä¸ä¼ Spark jar å
å° Yarnï¼æ¨å¯ä»¥æå¨ä¸ä¼ ä¸æ¬¡ï¼ç¶åé
ç½® jar å
ç HDFS è·¯å¾ï¼è¯·æ³¨æï¼HDFS è·¯å¾å¿
é¡»æ¯å
¨è·¯å¾åã</p>
<div class="highlight"><pre><code class="language-groff" data-lang="groff">jar cv0f spark-libs.jar -C $KYLIN_HOME/spark/jars/ .
hadoop fs -mkdir -p /kylin/spark/
@@ -232,12 +224,9 @@ hadoop fs -put spark-libs.jar /kylin/spa
<p>ç¶åï¼è¦å¨ kylin.properties ä¸è¿è¡å¦ä¸é
ç½®:</p>
-<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.yarn.archive=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-libs.jar
-kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
-kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
-kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current</code></pre></div>
+<div class="highlight"><pre><code class="language-groff" data-lang="groff">kylin.engine.spark-conf.spark.yarn.archive=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-libs.jar</code></pre></div>
-<p>ææ âkylin.engine.spark-conf.*â åæ°é½å¯ä»¥å¨ Cube æ Project 级å«è¿è¡éåï¼è¿ä¸ºç¨æ·æä¾äºæ大ççµæ´»æ§ã</p>
+<p>ææ âkylin.engine.spark-conf.*â åæ°é½å¯ä»¥å¨ Cube æ Project 级å«è¿è¡éåï¼è¿ä¸ºç¨æ·æä¾äºçµæ´»æ§ã</p>
<h2 id="cube">å建åä¿®æ¹æ ·ä¾ cube</h2>
@@ -254,7 +243,9 @@ $KYLIN_HOME/bin/kylin.sh start</code></p
<p><img src="/images/tutorial/2.0/Spark-Cubing-Tutorial/2_overwrite_partition.png" alt="" /></p>
-<p>æ ·ä¾ cube æ两个èå°½å
åç度é: âCOUNT DISTINCTâ å âTOPN(100)âï¼å½æºæ°æ®è¾å°æ¶ï¼ä»ä»¬ç大å°ä¼°è®¡çä¸å¤ªåç¡®: é¢ä¼°ç大å°ä¼æ¯çå®ç大å¾å¤ï¼å¯¼è´äºæ´å¤ç RDD partitions 被ååï¼ä½¿å¾ build çé度éä½ã100 对äºå
¶æ¯ä¸ä¸ªè¾ä¸ºåççæ°åãç¹å» âNextâ å âSaveâ ä¿å cubeã</p>
+<p>æ ·ä¾ cube æ两个èå°½å
åç度é: âCOUNT DISTINCTâ å âTOPN(100)âï¼å½æºæ°æ®è¾å°æ¶ï¼ä»ä»¬ç大å°ä¼°è®¡çä¸å¤ªåç¡®: é¢ä¼°ç大å°ä¼æ¯çå®ç大å¾å¤ï¼å¯¼è´äºæ´å¤ç RDD partitions 被ååï¼ä½¿å¾ build çé度éä½ã500 对äºå
¶æ¯ä¸ä¸ªè¾ä¸ºåççæ°åãç¹å» âNextâ å âSaveâ ä¿å cubeã</p>
+
+<p>对äºæ²¡æâCOUNT DISTINCTâ å âTOPNâ ç cubeï¼è¯·ä¿çé»è®¤é
ç½®ã</p>
<h2 id="spark--cube">ç¨ Spark æ建 Cube</h2>
@@ -294,7 +285,7 @@ $KYLIN_HOME/bin/kylin.sh start</code></p
<h2 id="section-2">è¿ä¸æ¥</h2>
-<p>å¦ææ¨æ¯ Kylin ç管çåä½æ¯å¯¹äº Spark æ¯æ°æï¼å»ºè®®æ¨æµè§ <a href="https://spark.apache.org/docs/2.1.0/">Spark ææ¡£</a>ï¼å«å¿è®°ç¸åºå°å»æ´æ°é
ç½®ãæ¨å¯ä»¥è®© Spark ç <a href="https://spark.apache.org/docs/2.1.0/job-scheduling.html#dynamic-resource-allocation">Dynamic Resource Allocation</a> çæ以便å
¶å¯¹äºä¸åçå·¥ä½è´è½½è½èªå¨ä¼¸ç¼©ãSpark æ§è½ä¾èµäºé群çå
åå CPU èµæºï¼å½æå¤ææ°æ®æ¨¡åå巨大çæ°æ®éä¸æ¬¡æå»ºæ¶ Kylin ç Cube æ建å°ä¼æ¯ä¸é¡¹ç¹éçä»»å¡ãå¦ææ¨çé群èµæºä¸è½å¤æ§è¡ï¼Spark executors å°±ä¼
æåºå¦ âOutOfMemorryâ è¿æ ·çé误ï¼å æ¤è¯·åçç使ç¨ã对äºæ UHC dimensionï¼è¿å¤ç»å (ä¾å¦ï¼ä¸ä¸ª cube è¶
è¿ 12 dimensions)ï¼æèå°½å
åç度é (Count Distinctï¼Top-N) ç Cubeï¼å»ºè®®æ¨ä½¿ç¨ MapReduce engineãå¦ææ¨ç Cube 模åè¾ä¸ºç®åï¼ææçé½æ¯ SUM/MIN/MAX/COUNTï¼æºæ°æ®è§æ¨¡å°è³ä¸çï¼Spark engine å°ä¼æ¯ä¸ªå¥½çéæ©ãé¤æ¤ä¹å¤ï¼Streaming æå»ºå¨ engine ä¸ç®åè¿ä¸æ¯æ(KYLIN-2484)ã</p>
+<p>å¦ææ¨æ¯ Kylin ç管çåä½æ¯å¯¹äº Spark æ¯æ°æï¼å»ºè®®æ¨æµè§ <a href="https://spark.apache.org/docs/2.1.2/">Spark ææ¡£</a>ï¼å«å¿è®°ç¸åºå°å»æ´æ°é
ç½®ãæ¨å¯ä»¥å¼å¯ Spark ç <a href="https://spark.apache.org/docs/2.1.2/job-scheduling.html#dynamic-resource-allocation">Dynamic Resource Allocation</a> ï¼ä»¥ä¾¿å
¶å¯¹äºä¸åçå·¥ä½è´è½½è½èªå¨ä¼¸ç¼©ãSpark æ§è½ä¾èµäºé群çå
åå CPU èµæºï¼å½æå¤ææ°æ®æ¨¡åå巨大çæ°æ®éä¸æ¬¡æå»ºæ¶ Kylin ç Cube æ建å°ä¼æ¯ä¸é¡¹ç¹éçä»»å¡ãå¦ææ¨çé群èµæºä¸è½å¤æ§è¡ï¼Spark executors å°±ä¼
æåºå¦ âOutOfMemorryâ è¿æ ·çé误ï¼å æ¤è¯·åçç使ç¨ã对äºæ UHC dimensionï¼è¿å¤ç»å (ä¾å¦ï¼ä¸ä¸ª cube è¶
è¿ 12 dimensions)ï¼æèå°½å
åç度é (Count Distinctï¼Top-N) ç Cubeï¼å»ºè®®æ¨ä½¿ç¨ MapReduce engineãå¦ææ¨ç Cube 模åè¾ä¸ºç®åï¼ææ度éé½æ¯ SUM/MIN/MAX/COUNTï¼æºæ°æ®è§æ¨¡å°è³ä¸çï¼Spark engine å°ä¼æ¯ä¸ªå¥½çéæ©ã</p>
<p>å¦ææ¨æä»»ä½é®é¢ï¼æè§ï¼æ bug ä¿®å¤ï¼æ¬¢è¿å¨ dev@kylin.apache.org ä¸è®¨è®ºã</p>
Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1836274&r1=1836273&r2=1836274&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Thu Jul 19 14:07:53 2018
@@ -19,8 +19,8 @@
<description>Apache Kylin Home</description>
<link>http://kylin.apache.org/</link>
<atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
- <pubDate>Thu, 19 Jul 2018 00:27:24 -0700</pubDate>
- <lastBuildDate>Thu, 19 Jul 2018 00:27:24 -0700</lastBuildDate>
+ <pubDate>Thu, 19 Jul 2018 06:59:26 -0700</pubDate>
+ <lastBuildDate>Thu, 19 Jul 2018 06:59:26 -0700</lastBuildDate>
<generator>Jekyll v2.5.3</generator>
<item>