You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by li...@apache.org on 2017/11/27 14:01:30 UTC

svn commit: r1816452 - in /kylin/site: docs21/install/kylin_aws_emr.html feed.xml

Author: lidong
Date: Mon Nov 27 14:01:30 2017
New Revision: 1816452

URL: http://svn.apache.org/viewvc?rev=1816452&view=rev
Log:
Update kylin emr doc

Modified:
    kylin/site/docs21/install/kylin_aws_emr.html
    kylin/site/feed.xml

Modified: kylin/site/docs21/install/kylin_aws_emr.html
URL: http://svn.apache.org/viewvc/kylin/site/docs21/install/kylin_aws_emr.html?rev=1816452&r1=1816451&r2=1816452&view=diff
==============================================================================
--- kylin/site/docs21/install/kylin_aws_emr.html (original)
+++ kylin/site/docs21/install/kylin_aws_emr.html Mon Nov 27 14:01:30 2017
@@ -3251,10 +3251,10 @@ tar –zxvf apache-kylin-2.2.0-bin-hb
 </div>
 
 <ul>
-  <li>Use HDFS as “kylin.env.hdfs-working-dir”</li>
+  <li>Use HDFS as “kylin.env.hdfs-working-dir” (Recommended)</li>
 </ul>
 
-<p>EMR recommends to “use HDFS for intermediate data storage while the cluster is running and Amazon S3 only to input the initial data and output the final results”.</p>
+<p>EMR recommends to <strong>“use HDFS for intermediate data storage while the cluster is running and Amazon S3 only to input the initial data and output the final results”</strong>. Kylin’s ‘hdfs-working-dir’ is for putting the intermediate data for Cube building, cuboid files and also some metadata files (like dictionary and table snapshots which are not good in HBase); so it is best to configure HDFS for this.</p>
 
 <p>If using HDFS as Kylin working directory, you just leave configurations unchanged as EMR’s default FS is HDFS:</p>
 
@@ -3262,21 +3262,24 @@ tar –zxvf apache-kylin-2.2.0-bin-hb
 </code></pre>
 </div>
 
-<p>Before you shudown/restart the cluster, you can backup the data on HDFS to S3 with <a href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html">S3DistCp</a>.</p>
+<p>Before you shudown/restart the cluster, you must backup the “/kylin” data on HDFS to S3 with <a href="https://docs.aws.amazon.com/emr/latest/ReleaseGuide/UsingEMR_s3distcp.html">S3DistCp</a>, or you may lost data and couldn’t recover the cluster later.</p>
 
 <ul>
   <li>Use S3 as “kylin.env.hdfs-working-dir”</li>
 </ul>
 
-<p>If you want to totally use S3 as storage (assume HBase is also on S3), configure the following 2 parameters:</p>
+<p>If you want to use S3 as storage (assume HBase is also on S3), you need configure the following parameters:</p>
 
 <div class="highlighter-rouge"><pre class="highlight"><code>kylin.env.hdfs-working-dir=s3://yourbucket/kylin
 kylin.storage.hbase.cluster-fs=s3://yourbucket
-
+kylin.source.hive.redistribute-flat-table=false
 </code></pre>
 </div>
 
-<p>The intermediate file and the HFile will all be written to S3. The build performance should be slower than HDFS. Make sure you have a good understanding about the difference between S3 and HDFS.</p>
+<p>The intermediate file and the HFile will all be written to S3. The build performance would be slower than HDFS. Make sure you have a good understanding about the difference between S3 and HDFS. Read the following articles from AWS:</p>
+
+<p><a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-errors-io.html">Input and Output Errors</a><br />
+<a href="https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-error-hive.html#emr-troubleshoot-error-hive-3">Are you having trouble loading data to or from Amazon S3 into Hive</a></p>
 
 <ul>
   <li>Hadoop configurations</li>

Modified: kylin/site/feed.xml
URL: http://svn.apache.org/viewvc/kylin/site/feed.xml?rev=1816452&r1=1816451&r2=1816452&view=diff
==============================================================================
--- kylin/site/feed.xml (original)
+++ kylin/site/feed.xml Mon Nov 27 14:01:30 2017
@@ -19,8 +19,8 @@
     <description>Apache Kylin Home</description>
     <link>http://kylin.apache.org/</link>
     <atom:link href="http://kylin.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 22 Nov 2017 19:21:06 -0800</pubDate>
-    <lastBuildDate>Wed, 22 Nov 2017 19:21:06 -0800</lastBuildDate>
+    <pubDate>Mon, 27 Nov 2017 05:59:29 -0800</pubDate>
+    <lastBuildDate>Mon, 27 Nov 2017 05:59:29 -0800</lastBuildDate>
     <generator>Jekyll v2.5.3</generator>
     
       <item>