You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by sh...@apache.org on 2018/08/31 01:44:11 UTC

[kylin] branch document updated: Update document for EMR, jobengine HA, etc

This is an automated email from the ASF dual-hosted git repository.

shaofengshi pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git


The following commit(s) were added to refs/heads/document by this push:
     new a33d582  Update document for EMR, jobengine HA, etc
a33d582 is described below

commit a33d582655d1e06dd58c349ac6fa2ae0fb59ee2f
Author: shaofengshi <sh...@apache.org>
AuthorDate: Fri Aug 31 09:44:02 2018 +0800

    Update document for EMR, jobengine HA, etc
---
 website/_docs/howto/howto_ldap_and_sso.md      |  2 +-
 website/_docs/install/advance_settings.cn.md   |  2 +-
 website/_docs/install/advance_settings.md      |  2 +-
 website/_docs/install/kylin_aws_emr.cn.md      | 16 ++++++++++++++++
 website/_docs/install/kylin_aws_emr.md         | 16 ++++++++++++++++
 website/_docs/install/kylin_cluster.cn.md      | 14 +++++++++++++-
 website/_docs/install/kylin_cluster.md         | 13 ++++++++++++-
 website/_docs23/howto/howto_ldap_and_sso.md    |  2 +-
 website/_docs23/install/advance_settings.cn.md |  2 +-
 website/_docs23/install/advance_settings.md    |  2 +-
 10 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/website/_docs/howto/howto_ldap_and_sso.md b/website/_docs/howto/howto_ldap_and_sso.md
index dbb1607..762d7d8 100644
--- a/website/_docs/howto/howto_ldap_and_sso.md
+++ b/website/_docs/howto/howto_ldap_and_sso.md
@@ -15,7 +15,7 @@ Firstly, provide LDAP URL, and username/password if the LDAP server is secured;
 
 ```
 cd $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib
-java -classpath kylin-server-base-\<versioin\>.jar:spring-beans-3.2.17.RELEASE.jar:spring-core-3.2.17.RELEASE.jar:commons-codec-1.7.jar org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES <your_password>
+java -classpath kylin-server-base-\<versioin\>.jar:kylin-core-common-\<versioin\>.jar:spring-beans-4.3.10.RELEASE.jar:spring-core-4.3.10.RELEASE.jar:commons-codec-1.7.jar org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES <your_password>
 ```
 
 Config them in the conf/kylin.properties:
diff --git a/website/_docs/install/advance_settings.cn.md b/website/_docs/install/advance_settings.cn.md
index 37f6a48..6ed7c35 100644
--- a/website/_docs/install/advance_settings.cn.md
+++ b/website/_docs/install/advance_settings.cn.md
@@ -87,7 +87,7 @@ export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K -XX:MaxPermSize=128M -v
 
 ```
 kylin.job.scheduler.default=2
-kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperDistributedJobLock
+kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
 ```
 并记得将所有任务和查询节点的地址注册到 `kylin.server.cluster-servers`.
 
diff --git a/website/_docs/install/advance_settings.md b/website/_docs/install/advance_settings.md
index ce21b91..52bf044 100644
--- a/website/_docs/install/advance_settings.md
+++ b/website/_docs/install/advance_settings.md
@@ -87,7 +87,7 @@ To enable the distributed job scheduler, you need to set or update the configs i
 
 ```
 kylin.job.scheduler.default=2
-kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperDistributedJobLock
+kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
 ```
 Please add all job servers and query servers to the `kylin.server.cluster-servers`.
 
diff --git a/website/_docs/install/kylin_aws_emr.cn.md b/website/_docs/install/kylin_aws_emr.cn.md
index be1a8b9..99a1925 100644
--- a/website/_docs/install/kylin_aws_emr.cn.md
+++ b/website/_docs/install/kylin_aws_emr.cn.md
@@ -145,6 +145,22 @@ $KYLIN_HOME/bin/kylin.sh start
 
 Build 同一个 Cube,当 Cube 准备好后运行查询。您可以浏览 S3 查看数据是否安全的持久化了。
 
+### Spark 配置
+
+EMR 的 Spark 版本很可能与 Kylin 编译的版本不一致,因此您通常不能直接使用 EMR 打包的 Spark 用于 Kylin 的任务。 您需要在启动 Kylin 之前,将 "SPARK_HOME" 环境变量设置指向 Kylin 的 Spark 子目录 (KYLIN_HOME/spark) 。此外,为了从 Spark 中访问 S3 或 EMRFS 上的文件,您需要将 EMR 的扩展类从 EMR 的目录拷贝到 Kylin 的 Spark 下。
+
+```
+export SPARK_HOME=$KYLIN_HOME/spark
+
+cp /usr/lib/hadoop-lzo/lib/*.jar $KYLIN_HOME/spark/jars/
+cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-*.jar $KYLIN_HOME/spark/jars/
+cp /usr/lib/hadoop/hadoop-common*-amzn-*.jar $KYLIN_HOME/spark/jars/
+
+$KYLIN_HOME/bin/kylin.sh start
+```
+
+您也可以参考 EMR Spark 的 spark-defauts 来设置 Kylin 的 Spark 配置,以获得更好的对集群资源的适配。
+
 ### 关闭 EMR 集群
 
 关闭 EMR 集群前,我们建议您为 Kylin metadata 做备份且将其上传到 S3。
diff --git a/website/_docs/install/kylin_aws_emr.md b/website/_docs/install/kylin_aws_emr.md
index dc8003c..fef3384 100644
--- a/website/_docs/install/kylin_aws_emr.md
+++ b/website/_docs/install/kylin_aws_emr.md
@@ -145,6 +145,22 @@ Don't forget to enable the 7070 port access in the security group for EMR master
 
 Build the sample Cube, and then run queries when the Cube is ready. You can browse S3 to see whether the data is safely persisted.
 
+### Spark Configuration
+
+EMR's Spark version may be incompatible with Kylin, so you couldn't directly use EMR's Spark. You need to set "SPARK_HOME" environment variable to Kylin's Spark folder (KYLIN_HOME/spark) before start Kylin. To access files on S3 or EMRFS, we need to copy EMR's implementation jars to Spark.
+
+```
+export SPARK_HOME=$KYLIN_HOME/spark
+
+cp /usr/lib/hadoop-lzo/lib/*.jar $KYLIN_HOME/spark/jars/
+cp /usr/share/aws/emr/emrfs/lib/emrfs-hadoop-assembly-*.jar $KYLIN_HOME/spark/jars/
+cp /usr/lib/hadoop/hadoop-common*-amzn-*.jar $KYLIN_HOME/spark/jars/
+
+$KYLIN_HOME/bin/kylin.sh start
+```
+
+You can also copy EMR's spark-defauts configuration to Kylin's spark for a better utilization of the cluster resources.
+
 ### Shut down EMR Cluster
 
 Before you shut down EMR cluster, we suggest you take a backup for Kylin metadata and upload it to S3.
diff --git a/website/_docs/install/kylin_cluster.cn.md b/website/_docs/install/kylin_cluster.cn.md
index 39e3ffa..a612cd1 100644
--- a/website/_docs/install/kylin_cluster.cn.md
+++ b/website/_docs/install/kylin_cluster.cn.md
@@ -34,13 +34,25 @@ kylin.rest.servers=host1:7070,host2:7070
 ```
 
  *  `kylin.server.mode`
-	确保只有一个实例的 `kylin.server.mode` 设置为 "all" 或 "job", 其余的应该为 "query"
+
+
+默认情况下,只有一个实例的 `kylin.server.mode` 设置为 "all" 或 "job", 其余的为 "query"。
 
 ```
 kylin.server.mode=all
 ```
 
+也即默认情况下,只有一个节点用于调度构建任务的执行。如果您需要配置多个节点同时执行任务构建,以满足高可用和高并发的需求,请参考 "启用多个任务引擎" 的内容,在 [高级设置](advance_settings.html) 页.
+
 ### 安装负载均衡器
 
 为确保 Kylin 服务器的高可用性, 您需要在这些服务器之前安装负载均衡器, 让其将传入的请求路由至集群。客户端和负载均衡器通信代替和特定的 Kylin 实例通信。安装负载均衡器超出了范围,您可以选择像 Nginx, F5 或 cloud LB 服务这样的实现。
 	
+### 读/写分离的双集群配置
+
+Kylin 可以连接两个集群以获得更好的稳定性和性能:
+
+ * 一个 Hadoop 集群用作 Cube 构建; 这个集群可以是一个大的、与其它应用共享的集群;
+ * 一个 HBase 集群用作 SQL 查询;通常这个集群是专门为 Kylin 配置的,节点数不用像 Hadoop 集群那么多。HBase 的配置可以更加针对 Kylin Cube 只读的特性而进行优化。  
+
+这种部署策略已经被很多大企业所采纳并得到验证。它是迄今我们知道适合生产环境的最佳部署方案。关于如何配置这种架构,请参考 [Deploy Apache Kylin with Standalone HBase Cluster](/blog/2016/06/10/standalone-hbase-cluster/)
\ No newline at end of file
diff --git a/website/_docs/install/kylin_cluster.md b/website/_docs/install/kylin_cluster.md
index f2dcfbf..6a3545e 100644
--- a/website/_docs/install/kylin_cluster.md
+++ b/website/_docs/install/kylin_cluster.md
@@ -16,7 +16,9 @@ Each of the Kylin instance has a "kylin.server.mode" entry in `conf/kylin.proper
  *  **query** : run query engine only; Kylin query engine accepts and answers your SQL queries;
  *  **all** : run both job engine and query engines in this instance. 
 
- Notice that only one instance can run the job engine ("all" or "job" mode), the others must be "query" mode. 
+ By default only one instance can run the job engine ("all" or "job" mode), the others should be in the "query" mode. 
+
+ If you want to run multiple job engines to get high availability or handle heavy concurrent jobs, please check "Enable multiple job engines" in [Advanced settings](advance_settings.html) page.
 
 A typical scenario is depicted in the following chart:
 
@@ -44,3 +46,12 @@ kylin.server.mode=all
 
 To enable Kylin service high availability, you need setup a load balancer in front of these servers, letting it routes the incoming requests to the cluster. Client side communicates with the load balancer, instead of with a specific Kylin instance. The setup of load balancer is out of the scope; you may select an implementation like Nginx, F5 or cloud LB service. 
 	
+
+### Configure Read/Write separated deployment
+
+Kylin can work with two clusters to gain better stability and performance:
+
+ * A Hadoop cluster for Cube building; This can be a shared, large cluster.
+ * A HBase cluster for SQL queries; Usually this is a dedicated cluster with less nodes. The HBase configurations can be tuned for better read performance as Cubes are immutable after built.  
+
+This deployment has been adopted and verified by many large companies. It is the best solution for production deployment as we know. For how to do this, please refer to [Deploy Apache Kylin with Standalone HBase Cluster](/blog/2016/06/10/standalone-hbase-cluster/)
\ No newline at end of file
diff --git a/website/_docs23/howto/howto_ldap_and_sso.md b/website/_docs23/howto/howto_ldap_and_sso.md
index b1c8bb7..bae5be0 100644
--- a/website/_docs23/howto/howto_ldap_and_sso.md
+++ b/website/_docs23/howto/howto_ldap_and_sso.md
@@ -15,7 +15,7 @@ Firstly, provide LDAP URL, and username/password if the LDAP server is secured;
 
 ```
 cd $KYLIN_HOME/tomcat/webapps/kylin/WEB-INF/lib
-java -classpath kylin-server-base-\<versioin\>.jar:spring-beans-3.2.17.RELEASE.jar:spring-core-3.2.17.RELEASE.jar:commons-codec-1.7.jar org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES <your_password>
+java -classpath kylin-server-base-\<versioin\>.jar:kylin-core-common-\<versioin\>.jar:spring-beans-4.3.10.RELEASE.jar:spring-core-4.3.10.RELEASE.jar:commons-codec-1.7.jar org.apache.kylin.rest.security.PasswordPlaceholderConfigurer AES <your_password>
 ```
 
 Config them in the conf/kylin.properties:
diff --git a/website/_docs23/install/advance_settings.cn.md b/website/_docs23/install/advance_settings.cn.md
index b1a792d..4b218d0 100644
--- a/website/_docs23/install/advance_settings.cn.md
+++ b/website/_docs23/install/advance_settings.cn.md
@@ -87,7 +87,7 @@ export KYLIN_JVM_SETTINGS="-Xms1024M -Xmx4096M -Xss1024K -XX:MaxPermSize=128M -v
 
 ```
 kylin.job.scheduler.default=2
-kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperDistributedJobLock
+kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
 ```
 并记得将所有任务和查询节点的地址注册到 `kylin.server.cluster-servers`.
 
diff --git a/website/_docs23/install/advance_settings.md b/website/_docs23/install/advance_settings.md
index 04b9dfa..607f396 100644
--- a/website/_docs23/install/advance_settings.md
+++ b/website/_docs23/install/advance_settings.md
@@ -87,7 +87,7 @@ To enable the distributed job scheduler, you need to set or update the configs i
 
 ```
 kylin.job.scheduler.default=2
-kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperDistributedJobLock
+kylin.job.lock=org.apache.kylin.storage.hbase.util.ZookeeperJobLock
 ```
 Please add all job servers and query servers to the `kylin.server.cluster-servers`.