You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by xx...@apache.org on 2021/07/04 12:17:34 UTC

[kylin] 02/03: Update overview of kylin4

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git

commit c40cc301f8dd59cde13d9b64ec9624a9485a5cb9
Author: yaqian.zhang <59...@qq.com>
AuthorDate: Fri Jul 2 18:39:04 2021 +0800

    Update overview of kylin4
---
 website/_docs/install/index.cn.md                  |   4 +-
 website/_docs40/gettingstarted/quickstart.cn.md    |   2 +-
 website/_docs40/gettingstarted/quickstart.md       |   2 +
 website/_docs40/index.cn.md                        | 163 ++++++++++++++------
 website/_docs40/index.md                           | 168 +++++++++++++++------
 website/_docs40/install/kylin_cluster.cn.md        |   4 +-
 website/_docs40/install/kylin_cluster.md           |   3 +
 .../tutorial/4.0/overview/build_duration_ssb.png   | Bin 0 -> 115357 bytes
 .../tutorial/4.0/overview/query_response_ssb.png   | Bin 0 -> 105312 bytes
 .../tutorial/4.0/overview/query_response_tpch.png  | Bin 0 -> 114768 bytes
 .../tutorial/4.0/overview/result_size_ssb.png      | Bin 0 -> 111943 bytes
 11 files changed, 252 insertions(+), 94 deletions(-)

diff --git a/website/_docs/install/index.cn.md b/website/_docs/install/index.cn.md
index 743d7cc..2843535 100644
--- a/website/_docs/install/index.cn.md
+++ b/website/_docs/install/index.cn.md
@@ -42,7 +42,7 @@ Kylin 可以在 Hadoop 集群的任意节点上启动。方便起见,您可以
 cd /usr/local/
 wget http://mirror.bit.edu.cn/apache/kylin/apache-kylin-2.5.0/apache-kylin-2.5.0-bin-hbase1x.tar.gz
 ```
-
+ 
 2. 解压 tar 包,配置环境变量 `$KYLIN_HOME` 指向 Kylin 文件夹。
 
 ```shell
@@ -51,7 +51,7 @@ cd apache-kylin-2.5.0-bin-hbase1x
 export KYLIN_HOME=`pwd`
 ```
 
-从 v2.6.1 开始, Kylin 不再包含 Spark 二进制包; 您需要另外下载 Spark,然后设置 `SPARK_HOME` 系统变量到 Spark 安装目录: 
+3. 从 v2.6.1 开始, Kylin 不再包含 Spark 二进制包; 您需要另外下载 Spark,然后设置 `SPARK_HOME` 系统变量到 Spark 安装目录: 
 
 ```shell
 export SPARK_HOME=/path/to/spark
diff --git a/website/_docs40/gettingstarted/quickstart.cn.md b/website/_docs40/gettingstarted/quickstart.cn.md
index e840919..78e76e0 100644
--- a/website/_docs40/gettingstarted/quickstart.cn.md
+++ b/website/_docs40/gettingstarted/quickstart.cn.md
@@ -67,7 +67,7 @@ $KYLIN_HOME/bin/download-spark.sh
 
 脚本会将解压好的spark放在$KYLIN_HOME目录下,如果系统中没有设置SPARK_HOME,启动kylin时会自动找到$KYLIN_HOME目录下的spark。
 
-### ste4、配置 Mysql 元数据
+### step4、配置 Mysql 元数据
 
 Kylin 4.0 使用 Mysql 作为元数据存储,需要在kylin.properties做如下配置:
 
diff --git a/website/_docs40/gettingstarted/quickstart.md b/website/_docs40/gettingstarted/quickstart.md
index 66d0558..6ca9c75 100644
--- a/website/_docs40/gettingstarted/quickstart.md
+++ b/website/_docs40/gettingstarted/quickstart.md
@@ -43,6 +43,8 @@ When your environment meets the above prerequisites, you can install and start u
 
 #### Step1. Download the Kylin Archive
 Download a kylin4.0 binary package from [Apache Kylin Download Site](https://kylin.apache.org/download/). 
+
+```
 cd /usr/local/
 wget http://apache.website-solution.net/kylin/apache-kylin-4.0.0/apache-kylin-4.0.0-bin.tar.gz
 ```
diff --git a/website/_docs40/index.cn.md b/website/_docs40/index.cn.md
index 495853c..5bd4b4c 100644
--- a/website/_docs40/index.cn.md
+++ b/website/_docs40/index.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs40-cn
-title: 概述
+title: Apache Kylin4 概述
 categories: docs
 permalink: /cn/docs40/index.html
 ---
@@ -16,48 +16,127 @@ Apache Kylin™是一个开源的、分布式的分析型数据仓库,提供 H
 * [v2.4 document](/cn/docs24/)
 * [归档](/archive/)
 
-安装
-------------  
-1. [安装指南](install/index.html)
-2. [Kylin 配置](install/configuration.html)
-3. [集群模式部署](install/kylin_cluster.html)
-4. [高级配置](install/advance_settings.html)
-5. [用 Docker 运行 Kylin](install/kylin_docker.html)
+Apache Kylin4.0 是 Apache Kylin3.x 之后一次重大的版本更新,它采用了全新的 Spark 构建引擎和 Parquet 作为存储,同时使用 Spark 作为查询引擎。
+
+Apache Kylin4.0 的第一个版本 kylin4.0.0-alpha 于 2020 年 7 月份发布,此后相继发布 kylin4.0.0-beta 以及正式版本。
+
+为了方便用户对 Kylin4.x 有更全面更深层的了解,本篇文档会着重从 Kylin4.x 与之前版本有何异同的角度对 Kylin4.x 做全面概述。文章分为以下几个部分:
+
+- 为什么选择 Parquet 替换 HBase
+- 预计算结果在 Kylin4.0 中如何存储
+- Kylin 4.0 的构建引擎
+- Kylin 4.0 的查询引擎
+- Kylin4.0 与 Kylin3.1 功能对比
+- Kylin 4.0 性能表现
+- Kylin 4.0 查询和构建调优
+- Kylin 4.0 用户案例
+
+## 为什么选择 Parquet 替换 HBase
+在 3.x 以及之前的版本中,kylin 一直使用 HBase 作为存储引擎来保存 cube 构建后产生的预计算结果。HBase 作为 HDFS 之上面向列族的数据库,查询表现已经算是比较优秀,但是它仍然存在以下几个缺点:
+1. HBase 不是真正的列式存储;
+2. HBase 没有二级索引,Rowkey 是它唯一的索引;
+3. HBase 没有对存储的数据进行编码,kylin 必须自己进行对数据编码的过程;
+4. HBase 不适合云上部署和自动伸缩;
+5. HBase 不同版本之间的 API 版本不同,存在兼容性问题(比如,0.98,1.0,1.1,2.0);
+6. HBase 存在不同的供应商版本,他们之间有兼容性问题。
+
+针对以上问题,社区提出了对使用 Apache Parquet + Spark 来代替 HBase 的提议,理由如下:
+1. Parquet 是一种开源并且已经成熟稳定的列式存储格式;
+2. Parquet 对云更加友好,可以兼容各种文件系统,包括 HDFS、S3、Azure Blob store、Ali OSS 等;
+3. Parquet 可以很好地与 Hadoop、Hive、Spark、Impala 等集成;
+4. Parquet 支持自定义索引。
+
+## 预计算结果在 Kylin4.0 中如何存储
+在 Kylin4.x 中,预计算结果以 Parquet 格式存储在文件系统中,文件存储结构对于 I/O 优化很重要,提前对存储目录结构进行设计,就能够在查询时通过目录或者文件名过滤数据文件,避免不必要的扫描。
+Kylin4 对 cube 进行构建得到的预计算结果的 Parquet 文件在文件系统中存储的目录结构如下:
+- cube_name
+  - SegmentA
+    - Cuboid-111
+      - part-0000-XXX.snappy.parquet
+      - part-0001-XXX.snappy.parquet
+      - ...
+    - Cuboid-222
+      - part-0000-XXX.snappy.parquet
+      - part-0001-XXX.snappy.parquet
+      - ...
+  - SegmentB
+      - Cuboid-111
+        - part-0000-XXX.snappy.parquet
+        - part-0001-XXX.snappy.parquet
+        - ...
+      - Cuboid-222
+        - part-0000-XXX.snappy.parquet
+        - part-0001-XXX.snappy.parquet
+        - ...               
+
+可以看出,与 HBase 相比,采用 Parquet 存储可以很方便地增删 cuboid 而不影响其他数据。利用这种特点,Kylin4 中实现了支持用户手动增删 cuboid 的功能,请参考:[How to update cuboid list for a cube](https://cwiki.apache.org/confluence/display/KYLIN/How+to+update+cuboid+list+for+a+cube)
+
+## Kylin 4.0 的构建引擎
+在 Kylin4 中,Spark Engine 是唯一的构建引擎,与之前版本中的构建引擎相比,存在如下特点:
+
+1、Kylin4 的构建简化了很多步骤。比如在 Cube Build Job 中, kylin4 只需要资源探测和 cubing 两个步骤,就可以完成构建;
+2、由于 Parquet 会对存储的数据进行编码,所以在 kylin4 中不再需要维度字典和对维度列编码的过程;
+3、Kylin4 对全局字典做了全新的实现,更多细节请参考:[Kylin4 全局字典](https://cwiki.apache.org/confluence/display/KYLIN/Global+Dictionary+on+Spark+CN) ;
+4、Kylin4 会根据集群资源、构建任务情况等对 Spark 进行自动调参;
+5、Kylin4 提高了构建速度。
+
+用户可以通过 `kylin.build.spark-conf` 开头的配置项手动修改构建相关的 Spark 配置,经过用户手动修改的 Spark 配置项不会再参与自动调参。
+
+## Kylin 4.0 的查询引擎
+Kylin4 的查询引擎 `Sparder(SparderContext)` 是由 spark application 后端实现的新型分布式查询引擎,相比于原来的查询引擎,Sparder 的优势体现在以下几点:
+- 分布式的查询引擎,有效避免单点故障;
+- 与构建所使用的计算引擎统一为 Spark;
+- 对于复杂查询的性能有很大提高;
+- 可以从 Spark 的新功能及其生态中获益。
+
+在 Kylin4 中,Sparder 是作为一个 long-running 的 spark application 存在的。 Sparder 会根据 `kylin.query.spark-conf` 开头的配置项中配置的 Spark 参数来获取 Yarn 资源,如果配置的资源参数过大,可能会影响构建任务甚至无法成功启动 Sparder,如果 Sparder 没有成功启动,则所有查询任务都会失败,用户可以在 kylin WebUI 的 System 页面中检查 Sparder 状态。
+
+默认情况下,用于查询的 spark 参数会设置的比较小,在生产环境中,大家可以适当把这些参数调大一些,以提升查询性能。
+`kylin.query.auto-sparder-context` 参数用于控制是否在启动 kylin 的同时启动 Sparder,默认值为 false,即默认情况下会在执行第一条 SQL 的时候才启动 Sparder,由于这个原因,执行第一条 SQL 的时候的会花费较长时间。
+如果你不希望第一条 SQL 的查询速度低于预期,可以设置 `kylin.query.auto-sparder-context` 为 `true`,此时 Sparder 会随 Kylin 一起启动。
+
+## Kylin 4.0 与 Kylin 3.1 功能对比
+
+| Feature                | Kylin 3.1.0                                  | Kylin 4.0                                      |
+| ---------------------  | :------------------------------------------- | :----------------------------------------------|
+| Storage                | HBase                                        | Parquet                                        |
+| BuildEngine            | MapReduce/Spark/Flink                        | New Spark Engine                               |
+| Metastore              | HBase(Default)/Mysql                         | Mysql(Default)                                 |
+| DataSource             | Kafka/Hive/JDBC                              | Hive/CSV                                       |
+| Global Dictionary      | Two implementation                           | New implementation                             |
+| Cube Optimization Tool | Cube Planner                                 | Cube Planner phase1 and Optimize cube manually |
+| Self-monitoring        | System cube and Dashboard                    | System cube and Dashboard                      |
+| PushDown Engine        | Hive/JDBC                                    | Spark SQL                                      |
+| Hadoop platform        | HDP2/HDP3/CDH5/CDH6/EMR5                     | HDP2/CDH5/CDH6/EMR5/EMR6/HDI                   |
+| Deployment mode        | Single node/Cluster/Read and write separation| Single node/Cluster/Read and write separation  |
+
+## Kylin 4.0 性能表现
+为了测试 Kylin4.0 的性能,我们分别在 SSB 数据集和 TPC-H 数据集上做了 benchmark,与 Kylin3.1.0 进行对比。测试环境为 4 个节点的 CDH 集群,所使用的 yarn 队列分配了 400G 内存和 128 cpu cores。
+性能测试对比结果如下:
+- Comparison of build duration and result size(SSB)
+![](/images/tutorial/4.0/overview/build_duration_ssb.png)  
+![](/images/tutorial/4.0/overview/result_size_ssb.png)
+
+测试结果可以体现以下两点:
+- kylin4 的构建速度与 kylin3.1.0 的 Spark Engine 相比有明显提升;
+- Kylin4 构建后得到的预计算结果 Parquet 文件大小与 HBase 相比有明显减小;
+
+- Comparison of query response(SSB and TPC-H)
+![](/images/tutorial/4.0/overview/query_response_ssb.png)
+![](/images/tutorial/4.0/overview/query_response_tpch.png)
+
+从查询结果对比中可以看出,对于***简单查询***,kylin3 与 Kylin4 不相上下,kylin4 略有不足;而对于***复杂查询***,kylin4 则体现出了明显的优势,查询速度比 kylin3 快很多。
+并且,Kylin4 中的***简单查询***的性能还存在很大的优化空间。在有赞使用 Kylin4 的实践中,对于***简单查询***的性能可以优化到 1 秒以内。
+
+## Kylin 4.0 查询和构建调优
+对于 Kylin4 的调优,请参考:[How to improve cube building and query performance](/docs40/howto/howto_optimize_build_and_query.html)
+
+## Kylin 4.0 用户案例
+[Why did Youzan choose Kylin4](/blog/2021/06/17/Why-did-Youzan-choose-Kylin4)
+
+参考链接:
+[Kylin Improvement Proposal 1: Parquet Storage](https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage)
 
-教程
-------------  
-1. [样例 Cube 快速入门](tutorial/kylin_sample.html)
-2. [Web 界面](tutorial/web.html)
-3. [Cube 创建](tutorial/create_cube.html)
-4. [Cube 构建和 Job 监控](tutorial/cube_build_job.html)
-5. [SQL 快速参考](tutorial/sql_reference.html)
-6. [优化 Cube 构建](tutorial/cube_build_performance.html)
-7. [查询下压](tutorial/query_pushdown.html)
-8. [建立 System Cube](tutorial/setup_systemcube.html)
-9. [使用 Cube Planner](tutorial/use_cube_planner.html)
-10. [使用 Dashboard](tutorial/use_dashboard.html)
-11. [优化构建和查询性能](howto/howto_optimize_build_and_query.html)
-
-
-工具集成
-------------  
-1. [ODBC 驱动](tutorial/odbc.html)
-2. [JDBC 驱动](howto/howto_jdbc.html)
-3. [RESTful API 列表](howto/howto_use_restapi.html)
-4. [用 API 构建 Cube](howto/howto_build_cube_with_restapi.html)
-5. [MS Excel 及 PowerBI 教程](tutorial/powerbi.html)
-6. [Tableau 8](tutorial/tableau.html)
-7. [Tableau 9](tutorial/tableau_91.html)
-9. [Qlik Sense 集成](tutorial/Qlik.html)
-10. [Apache Superset](tutorial/superset.html)
-11. [Redash](/blog/2018/05/08/redash-kylin-plugin-strikingly/)
-12. [Davinci](/cn_blog/2019/11/29/Davinci-Kylin-Insight/)
-
-
-帮助
-------------  
-1. [备份元数据](howto/howto_backup_metadata.html)
-2. [清理存储](howto/howto_cleanup_storage.html)
 
 
 
diff --git a/website/_docs40/index.md b/website/_docs40/index.md
index e41b5bf..ccaf410 100644
--- a/website/_docs40/index.md
+++ b/website/_docs40/index.md
@@ -1,6 +1,6 @@
 ---
 layout: docs40
-title: Overview
+title: Overview of Apache Kylin4.x
 categories: docs
 permalink: /docs40/index.html
 ---
@@ -16,53 +16,127 @@ This is the document for Apache Kylin4.0. Document of other versions:
 * [v2.4 document](/docs24)
 * [Archived](/archive/)
 
-Installation & Setup
-------------  
-1. [Installation Guide](install/index.html)
-2. [Configurations](install/configuration.html)
-3. [Deploy in cluster mode](install/kylin_cluster.html)
-4. [Advanced settings](install/advance_settings.html)
-5. [Run Kylin with Docker](install/kylin_docker.html)
+Apache kylin 4.0 is a major version after Apache kylin 3.x. Kylin4 uses a new spark build engine and parquet as storage, and uses spark as query engine.
 
-Tutorial
-------------  
-1. [Quick Start with Sample Cube](tutorial/kylin_sample.html)
-2. [Web Interface](tutorial/web.html)
-3. [Cube Wizard](tutorial/create_cube.html)
-4. [Cube Build and Job Monitoring](tutorial/cube_build_job.html)
-5. [SQL reference](tutorial/sql_reference.html)
-6. [Cube Build Tuning](tutorial/cube_build_performance.html)
-7. [Enable Query Pushdown](tutorial/query_pushdown.html)
-8. [Setup System Cube](tutorial/setup_systemcube.html)
-9. [Optimize with Cube Planner](tutorial/use_cube_planner.html)
-10. [Use System Dashboard](tutorial/use_dashboard.html)
-11. [Optimize build and query](howto/howto_optimize_build_and_query.html)
-12. [Config Spark Pool](howto/howto_config_spark_pool.html)
-
-Connectivity and APIs
-------------  
-1. [ODBC driver](tutorial/odbc.html)
-2. [JDBC driver](howto/howto_jdbc.html)
-3. [RESTful API list](howto/howto_use_restapi.html)
-4. [Build cube with RESTful API](howto/howto_build_cube_with_restapi.html)
-5. [Connect from MS Excel and PowerBI](tutorial/powerbi.html)
-6. [Connect from Tableau 8](tutorial/tableau.html)
-7. [Connect from Tableau 9](tutorial/tableau_91.html)
-8. [Connect from MicroStrategy](tutorial/microstrategy.html)
-9. [Connect from SQuirreL](tutorial/squirrel.html)
-10. [Connect from Apache Flink](tutorial/flink.html)
-11. [Connect from Apache Spark](tutorial/spark.html)
-12. [Connect from Hue](tutorial/hue.html)
-13. [Connect from Qlik Sense](tutorial/Qlik.html)
-14. [Connect from Apache Superset](tutorial/superset.html)
-15. [Connect from Redash](/blog/2018/05/08/redash-kylin-plugin-strikingly/)
-
-
-Operations
-------------  
-1. [Backup/restore Kylin metadata](howto/howto_backup_metadata.html)
-2. [Cleanup storage](howto/howto_cleanup_storage.html)
-3. [Upgrade from old version](howto/howto_upgrade.html)
+Kylin 4.0.0-alpha, the first version of Apache kylin 4.0, was released in July 2020, and then kylin 4.0.0-beta and official version were released.
+
+In order to facilitate users to have a more comprehensive and deeper understanding of kylin4.x, this document will focus on a comprehensive overview of kylin4.x from the perspective of the similarities and differences between kylin4.x and previous versions. 
+
+The article includes the following parts:
+
+- Why replace HBase with Parquet
+- How to store pre calculation results in kylin 4.0
+- Build engine of Kylin 4.0
+- Query engine of kylin 4.0
+- Feature comparison between kylin 4.0 and kylin 3.1
+- Kylin 4.0 performance
+- Kylin 4.0 query and build tuning
+- Kylin 4.0 use case
+
+## Why replace HBase with Parquet
+In versions 3.x and before, Kylin has been using HBase as a storage engine to save the precomputing results generated after cube builds. HBase, as the database of HDFS, has been excellent in query performance, but it still has the following disadvantages:
+
+1. HBase is not real columnar storage;
+2. HBase has no secondary index; Rowkey is the only index;
+3. HBase has no encoding, Kylin has to do the encoding by itself;
+4. HBase does not fit for cloud deployment and auto-scaling;
+5. HBase has different API versions  and has compatible issues (e.g, 0.98, 1.0, 1.1, 2.0);
+6. HBase has different vendor releases and has compatible issues (e.g, Cloudera's is not compatible with others);
+
+In view of the above problems, Kylin community proposed to replace HBase with Apache parquet + spark, for the following reasons:
+
+1. parquet is an open source and mature and stable column storage format;
+2. Parquet is more cloud-friendly, can work with most FS including HDFS, S3, Azure Blob store, Ali OSS, etc;
+3. parquet can be well integrated with Hadoop, hive, spark, impala, etc;
+4. parquet supports custom index.
+
+
+## How to store pre calculation results in kylin 4.0
+In kylin4.x, the pre calculation results are stored in the file system in parquet format. The file storage structure is very important for I/O optimization. If the storage directory structure is designed in advance, the data files can be filtered through the directory or file name during query to avoid unnecessary file scan.
+
+The directory structure of parquet file stored in the file system is as follows:
+- cube_name
+  - SegmentA
+    - Cuboid-111
+      - part-0000-XXX.snappy.parquet
+      - part-0001-XXX.snappy.parquet
+      - ...
+    - Cuboid-222
+      - part-0000-XXX.snappy.parquet
+      - part-0001-XXX.snappy.parquet
+      - ...
+  - SegmentB
+      - Cuboid-111
+        - part-0000-XXX.snappy.parquet
+        - part-0001-XXX.snappy.parquet
+        - ...
+      - Cuboid-222
+        - part-0000-XXX.snappy.parquet
+        - part-0001-XXX.snappy.parquet
+        - ...               
+
+It can be seen that, using parquet storage can add and delete cuboid easily without affecting other data. With this feature, kylin4 realizes the feature of supporting users to add and delete cuboid manually. Please refer to: [how to update cuboid list for a cube]( https://cwiki.apache.org/confluence/display/KYLIN/How+to+update+cuboid+list+for+a+cube )
+
+## Build engine of Kylin 4.0
+In kylin4, spark engine is the only build engine. Compared with the build engine in previous versions, it has the following characteristics:
+
+1. Building kylin4 simplifies many steps. For example, in cube build job, kylin4 only needs two steps: resource detection and cubing;
+2. Since parquet encodes the stored data, the process of dimension dictionary and dimension column encoding is no longer needed in kylin4;
+3. Kylin4 implements a new global dictionary. For more details, please refer to [kylin4 global dictionary](https://cwiki.apache.org/confluence/display/KYLIN/Global+Dictionary+on+Spark);
+4. Kylin4 will automatically adjust parameters of spark according to cluster resources and build job;
+5. Kylin4 can improve the build performance.
+
+Users can manually modify the build the relevant spark configuration through the configuration item beginning with `kylin.build.spark-conf`. The manually modified spark configuration item will no longer participate in automatic parameter adjustment.
+
+## Query engine of kylin 4.0
+
+`Sparder (spardercontext)`, the query engine of Kylin4, is a new distributed query engine implemented by the back end of spark application. Compared with the original query engine, sparder has the following advantages:
+1. Distributed query engine,avoid single-point-of-failure;
+2. Unified calculation engine for building and querying;
+3. There is a substantial increase in complex query performance;
+4. Can benefit from spark new features and ecology.
+
+In kylin4, sparder exists as a long-running spark application. Sparder will obtain the horn resource according to the spark parameter configured in the configuration item beginning with `kylin.query.spark-conf`. If the configured resource parameter is too large, the build engine may be affected, and even sparder cannot be started successfully. If sparder is not started successfully, all query tasks will fail, Users can check the sparder status in the system page of kylin webui.
+By default, the spark parameter used for query will be set smaller. In the production environment, you can increase these parameters appropriately to improve query performance.
+`Kylin.query.auto-sparder-context` parameter is used to control whether to start sparder when kylin is started. The default value is `false`, that is, sparder will be started only when the first SQL is executed by default. For this reason, it will take a long time to execute the first SQL.
+If you don't want the query speed of the first SQL to be lower than expected, you can set `kylin.query.auto-sparder-context` to `true`, and sparder will start with kylin.
+
+## Feature comparison between kylin 4.0 and kylin 3.1
+
+| Feature                | Kylin 3.1.0                                  | Kylin 4.0                                      |
+| ---------------------  | :------------------------------------------- | :----------------------------------------------|
+| Storage                | HBase                                        | Parquet                                        |
+| BuildEngine            | MapReduce/Spark/Flink                        | New Spark Engine                               |
+| Metastore              | HBase(Default)/Mysql                         | Mysql(Default)                                 |
+| DataSource             | Kafka/Hive/JDBC                              | Hive/CSV                                       |
+| Global Dictionary      | Two implementation                           | New implementation                             |
+| Cube Optimization Tool | Cube Planner                                 | Cube Planner phase1 and Optimize cube manually |
+| Self-monitoring        | System cube and Dashboard                    | System cube and Dashboard                      |
+| PushDown Engine        | Hive/JDBC                                    | Spark SQL                                      |
+| Hadoop platform        | HDP2/HDP3/CDH5/CDH6/EMR5                     | HDP2/CDH5/CDH6/EMR5/EMR6/HDI                   |
+| Deployment mode        | Single node/Cluster/Read and write separation| Single node/Cluster/Read and write separation  |
+
+## Kylin 4.0 performance
+In order to test the performance of kylin 4.0, we benchmark SSB dataset and TPC-H dataset respectively, and compare with kylin 3.1.0. The test environment is a 4-node CDH cluster, and the yarn queue is used to allocate 400G memory and 128 CPU cores.
+The results of performance test are as follows:
+
+- Comparison of build duration and result size(SSB)
+
+![](/images/tutorial/4.0/overview/build_duration_ssb.png)  
+![](/images/tutorial/4.0/overview/result_size_ssb.png)
+
+The test results can reflect the following two points:
+- The build speed of kylin4 is significantly higher than that of kylin3.1.0 spark engine;
+- Compared with HBase, the parquet file size of kylin4 is significantly reduced;
+
+From the comparison of query results, it can be seen that kylin3 and kylin4 are the same for ***simple query***, kylin4 is slightly insufficient; However, kylin4 has obvious advantages over kylin3 for ***complex query***.
+Moreover, there is still a lot of room to optimize the performance of ***simple query*** in kylin4. In the practice of Youzan using kylin4, the performance of ***simple query*** can be optimized to less than 1 second.
 
+## Kylin 4.0 query and build tuning
+For kylin4 tuning, please refer to: [How to improve cube building and query performance](/docs40/howto/howto_optimize_build_and_query.html)
 
+## Kylin 4.0 use case
+[Why did Youzan choose Kylin4](/blog/2021/06/17/Why-did-Youzan-choose-Kylin4)
 
+Reference link:
+[Kylin Improvement Proposal 1: Parquet Storage](https://cwiki.apache.org/confluence/display/KYLIN/KIP-1%3A+Parquet+storage)
diff --git a/website/_docs40/install/kylin_cluster.cn.md b/website/_docs40/install/kylin_cluster.cn.md
index f3dc8c2..a021c68 100644
--- a/website/_docs40/install/kylin_cluster.cn.md
+++ b/website/_docs40/install/kylin_cluster.cn.md
@@ -47,9 +47,9 @@ kylin.server.self-discovery-enabled=true
 ```
 更多关于kylin任务调度器的细节可以参考[Apache Kylin Wiki](https://cwiki.apache.org/confluence/display/KYLIN/Comparison+of+Kylin+Job+scheduler).
 
-
-
 ### 安装负载均衡器
 
 为了将查询请求发送给集群而非单个节点,您可以部署一个负载均衡器,如 [Nginx](http://nginx.org/en/), [F5](https://www.f5.com/) 或 [cloudlb](https://rubygems.org/gems/cloudlb/) 等,使得客户端和负载均衡器通信代替和特定的 Kylin 实例通信。
 
+### 读写分离部署
+Kylin4 的读写分离部署方式与 Kylin3 存在一定的差异,请参考文档:[Read Write Separation Deployment for Kylin 4](https://cwiki.apache.org/confluence/display/KYLIN/Read-Write+Separation+Deployment+for+Kylin+4.0)
diff --git a/website/_docs40/install/kylin_cluster.md b/website/_docs40/install/kylin_cluster.md
index e4f7143..a2dd647 100644
--- a/website/_docs40/install/kylin_cluster.md
+++ b/website/_docs40/install/kylin_cluster.md
@@ -51,3 +51,6 @@ For more details about the kylin job scheduler, please refer to [Apache Kylin Wi
 ### Installing a load balancer
 
 To send query requests to a cluster instead of a single node, you can deploy a load balancer such as [Nginx](http://nginx.org/en/), [F5](https://www.f5.com/) or [cloudlb](https://rubygems.org/gems/cloudlb/), etc., so that the client and load balancer communication instead communicate with a specific Kylin instance.
+
+### Read and write separation deployment
+There are some differences between read and write separation deployment of kylin 4 and kylin 3, Please refer to : [Read Write Separation Deployment for Kylin 4](https://cwiki.apache.org/confluence/display/KYLIN/Read-Write+Separation+Deployment+for+Kylin+4.0)
\ No newline at end of file
diff --git a/website/images/tutorial/4.0/overview/build_duration_ssb.png b/website/images/tutorial/4.0/overview/build_duration_ssb.png
new file mode 100644
index 0000000..2484192
Binary files /dev/null and b/website/images/tutorial/4.0/overview/build_duration_ssb.png differ
diff --git a/website/images/tutorial/4.0/overview/query_response_ssb.png b/website/images/tutorial/4.0/overview/query_response_ssb.png
new file mode 100644
index 0000000..4bacf1a
Binary files /dev/null and b/website/images/tutorial/4.0/overview/query_response_ssb.png differ
diff --git a/website/images/tutorial/4.0/overview/query_response_tpch.png b/website/images/tutorial/4.0/overview/query_response_tpch.png
new file mode 100644
index 0000000..5b3cbc4
Binary files /dev/null and b/website/images/tutorial/4.0/overview/query_response_tpch.png differ
diff --git a/website/images/tutorial/4.0/overview/result_size_ssb.png b/website/images/tutorial/4.0/overview/result_size_ssb.png
new file mode 100644
index 0000000..8fd5e1c
Binary files /dev/null and b/website/images/tutorial/4.0/overview/result_size_ssb.png differ