You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by sh...@apache.org on 2018/06/13 10:18:22 UTC

[kylin] 02/02: Refine document

This is an automated email from the ASF dual-hosted git repository.

shaofengshi pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git

commit 41a87af884c3ce655e19bfda6d6d2d5843fed7c4
Author: shaofengshi <sh...@apache.org>
AuthorDate: Wed Jun 13 18:17:20 2018 +0800

    Refine document
---
 website/_data/docs23.yml                           |  1 -
 .../howto/howto_build_cube_with_restapi.cn.md      |  4 +-
 website/_docs23/howto/howto_jdbc.md                |  2 +-
 website/_docs23/howto/howto_optimize_build.cn.md   |  2 +-
 website/_docs23/tutorial/Qlik.cn.md                |  2 +-
 website/_docs23/tutorial/acl.cn.md                 |  2 +-
 website/_docs23/tutorial/acl.md                    |  2 +-
 website/_docs23/tutorial/create_cube.cn.md         | 74 ++++++++++++----------
 website/_docs23/tutorial/create_cube.md            | 66 ++++++++++---------
 website/_docs23/tutorial/cube_build_job.cn.md      | 10 +--
 website/_docs23/tutorial/kylin_client_tool.cn.md   |  2 +-
 website/_docs23/tutorial/odbc.cn.md                |  2 +-
 website/_docs23/tutorial/powerbi.cn.md             |  4 +-
 website/_docs23/tutorial/query_pushdown.cn.md      |  4 +-
 website/_docs23/tutorial/tableau.cn.md             |  2 +-
 website/_docs23/tutorial/tableau_91.cn.md          |  2 +-
 website/_docs23/tutorial/web.cn.md                 |  6 +-
 website/_docs23/tutorial/web.md                    |  8 +--
 18 files changed, 103 insertions(+), 92 deletions(-)

diff --git a/website/_data/docs23.yml b/website/_data/docs23.yml
index 44efdef..e27a82a 100644
--- a/website/_data/docs23.yml
+++ b/website/_data/docs23.yml
@@ -41,7 +41,6 @@
   - tutorial/web
   - tutorial/create_cube
   - tutorial/cube_build_job
-  - tutorial/acl
   - tutorial/project_level_acl
   - tutorial/cube_spark
   - tutorial/cube_build_performance
diff --git a/website/_docs23/howto/howto_build_cube_with_restapi.cn.md b/website/_docs23/howto/howto_build_cube_with_restapi.cn.md
index c5e8fc3..555f07e 100644
--- a/website/_docs23/howto/howto_build_cube_with_restapi.cn.md
+++ b/website/_docs23/howto/howto_build_cube_with_restapi.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  用API构建cube
+title:  用 API 构建 Cube
 categories: 帮助
 permalink: /cn/docs23/howto/howto_build_cube_with_restapi.html
 ---
@@ -50,5 +50,5 @@ Content-Type: application/json;charset=UTF-8
 *   `GET http://localhost:7070/kylin/api/jobs/{job_uuid}`
 *   返回的 `job_status` 代表job的当前状态。
 
-### 5.	如果构建任务出现错误,可以重新开始它
+	## 5.	如果构建任务出现错误,可以重新开始它
 *   `PUT http://localhost:7070/kylin/api/jobs/{job_uuid}/resume`
diff --git a/website/_docs23/howto/howto_jdbc.md b/website/_docs23/howto/howto_jdbc.md
index 7243436..71cbf2c 100644
--- a/website/_docs23/howto/howto_jdbc.md
+++ b/website/_docs23/howto/howto_jdbc.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23
-title:  Kylin JDBC Driver
+title:  JDBC Driver
 categories: howto
 permalink: /docs23/howto/howto_jdbc.html
 ---
diff --git a/website/_docs23/howto/howto_optimize_build.cn.md b/website/_docs23/howto/howto_optimize_build.cn.md
index 6103acf..2ac7cd3 100644
--- a/website/_docs23/howto/howto_optimize_build.cn.md
+++ b/website/_docs23/howto/howto_optimize_build.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  优化cube构建
+title:  优化 Cube 构建
 categories: 帮助
 permalink: /cn/docs23/howto/howto_optimize_build.html
 ---
diff --git a/website/_docs23/tutorial/Qlik.cn.md b/website/_docs23/tutorial/Qlik.cn.md
index 796d474..c3d9450 100644
--- a/website/_docs23/tutorial/Qlik.cn.md
+++ b/website/_docs23/tutorial/Qlik.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  与Qlik Sense集成
+title:  Qlik Sense 集成
 categories: tutorial
 permalink: /cn/docs23/tutorial/Qlik.html
 since: v2.2
diff --git a/website/_docs23/tutorial/acl.cn.md b/website/_docs23/tutorial/acl.cn.md
index 2042478..3282480 100644
--- a/website/_docs23/tutorial/acl.cn.md
+++ b/website/_docs23/tutorial/acl.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  Kylin Cube 权限授予教程
+title:  Cube 权限授予(v2.1)
 categories: 教程
 permalink: /cn/docs23/tutorial/acl.html
 version: v1.2
diff --git a/website/_docs23/tutorial/acl.md b/website/_docs23/tutorial/acl.md
index 0f9a864..b51d815 100644
--- a/website/_docs23/tutorial/acl.md
+++ b/website/_docs23/tutorial/acl.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23
-title: Cube Permission (v2.1.x)
+title: Cube Permission (v2.1)
 categories: tutorial
 permalink: /docs23/tutorial/acl.html
 since: v0.7.1
diff --git a/website/_docs23/tutorial/create_cube.cn.md b/website/_docs23/tutorial/create_cube.cn.md
index a293ce3..824ac7a 100644
--- a/website/_docs23/tutorial/create_cube.cn.md
+++ b/website/_docs23/tutorial/create_cube.cn.md
@@ -1,19 +1,19 @@
----
+---
 layout: docs23-cn
-title:  Kylin Cube 创建教程
+title:  Cube 创建
 categories: 教程
 permalink: /cn/docs23/tutorial/create_cube.html
 version: v1.2
 since: v0.7.1
 ---
-  
-  
-### I. 新建一个项目
+
+
+### I. 新建项目
 1. 由顶部菜单栏进入 `Model` 页面,然后点击 `Manage Projects`。
 
    ![](/images/Kylin-Cube-Creation-Tutorial/1 manage-prject.png)
 
-2. 点击 `+ Project` 按钮添加一个新的项目或者忽略第一步并且点击 `Add Project`(第一张图片中右边的按钮)。
+2. 点击 `+ Project` 按钮添加一个新的项目。
 
    ![](/images/Kylin-Cube-Creation-Tutorial/2 %2Bproject.png)
 
@@ -25,8 +25,8 @@ since: v0.7.1
 
    ![](/images/Kylin-Cube-Creation-Tutorial/3.1 pj-created.png)
 
-### II. 同步一张表
-1. 在顶部菜单栏点击 `Model`,然后点击左边的 `Data Source` 标签,它会列出所有加载进Kylin的表,点击 `Load Table` 按钮。
+### II. 同步Hive表
+1. 在顶部菜单栏点击 `Model`,然后点击左边的 `Data Source` 标签,它会列出所有加载进 Kylin 的表,点击 `Load Table` 按钮。
 
    ![](/images/Kylin-Cube-Creation-Tutorial/4 %2Btable.png)
 
@@ -34,7 +34,7 @@ since: v0.7.1
 
    ![](/images/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
 
-3. 【可选】如果你想要浏览hive数据库来选择表,点击 `Load Table From Tree` 按钮。
+3. 【可选】如果你想要浏览 hive 数据库来选择表,点击 `Load Table From Tree` 按钮。
 
    ![](/images/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
 
@@ -46,18 +46,18 @@ since: v0.7.1
 
    ![](/images/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
 
-6. 在后台,Kylin 将会执行 MapReduce 任务计算新 sync 表的 cardinality,任务完成后,刷新页面并点击表名,cardinality的值将会显示在表信息中。
+6. 在后台,Kylin 将会执行 MapReduce 任务计算新同步表的基数(cardinality),任务完成后,刷新页面并点击表名,基数值将会显示在表信息中。
 
    ![](/images/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
 
-### III. 新建一个 Data Model
-创建 cube 前,需定义一个数据模型。数据模型定义为星形模型。一个模型可以被多个 cube 使用。
+### III. 新建 Data Model
+创建 cube 前,需定义一个数据模型。数据模型定义了一个星型(star schema)或雪花(snowflake schema)模型。一个模型可以被多个 cube 使用。
 
 ![](/images/Kylin-Cube-Creation-Tutorial/6 %2Bcube.png)
 
 1. 点击顶部的 `Model` ,然后点击 `Models` 标签。点击 `+New` 按钮,在下拉框中选择 `New Model`。
 
-2. 输入 model 的名字和可选的 description。
+2. 输入 model 的名字和可选的描述。
 
 ![](/images/Kylin-Cube-Creation-Tutorial/7 cube-info.png)
 
@@ -65,11 +65,11 @@ since: v0.7.1
 
     ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-factable.png)
 
-4. 【可选】点击 `Add Lookup Table` 按钮添加一个 lookup 表。选择表名和 join 类型(内连接或左连接)
+4. 【可选】点击 `Add Lookup Table` 按钮添加一个 lookup 表。选择表名和关联类型(内连接或左连接)
 
     ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-%2Bdim.png)
 
-5. 【可选】点击 `New Join Condition` 按钮,左边选择事实表的外键,右边选择 lookup 表的主键。如果有多于一个 join 列重复执行。
+5. 点击 `New Join Condition` 按钮,左边选择事实表的外键,右边选择 lookup 表的主键。如果有多于一个 join 列重复执行。
 
     ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-typeA.png)
 
@@ -82,7 +82,7 @@ since: v0.7.1
 
 ![](/images/Kylin-Cube-Creation-Tutorial/7 cube-info.png)
 
-9. 点击 “Next” 到达 “Settings” 页面,如果事实表中的数据每日增长,选择 `Partition Date Column` 中相应的 date 列以及 date 格式,否则就将其留白。
+9. 点击 “Next” 到达 “Settings” 页面,如果事实表中的数据每日增长,选择 `Partition Date Column` 中相应的 日期列以及日期格式,否则就将其留白。
 
     ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-factable.png)
 
@@ -90,14 +90,14 @@ since: v0.7.1
 
     ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-%2Bdim.png)
 
-11. 【可选】如果一些 records 想从 cube 中移除出去,就像脏数据,可以在 `Filter` 中输入条件。
+11. 【可选】如果在从 hive 抽取数据时候想做一些筛选,可以在 `Filter` 中输入筛选条件。
 
     ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-typeA.png)
 
 12. 点击 `Save` 然后选择 `Yes` 来保存 data model。创建完成,data model 就会列在左边 `Models` 列表中。
    ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-edit.png)
 
-### III. 新建一个 Cube
+### III. 新建 Cube
 
 创建完 data model,可以开始创建 cube。
 点击顶部 `Model`,然后点击 `Models` 标签。点击 `+New` 按钮,在下拉框中选择 `New Cube`。
@@ -106,13 +106,13 @@ since: v0.7.1
 
 1. 选择 data model,输入 cube 名字;点击 `Next` 进行下一步。
 
-cube 名字可以使用字母,数字和下划线(空格也是允许的)。`Notification Email List` 是运用来通知job执行成功或失败情况的邮箱列表。`Notification Events` 是触发事件的状态。
+cube 名字可以使用字母,数字和下划线(空格不允许)。`Notification Email List` 是运用来通知job执行成功或失败情况的邮箱列表。`Notification Events` 是触发事件的状态。
 
    ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-%2Bmeas.png)
 
 **步骤2. 维度**
 
-1. 点击 `Add Dimension`,在弹窗中显示的事实表和lookup表里勾选输入需要的列。Lookup 表的列有2个选项:“Normal” 和 “Derived”(默认)。“Normal” 添加一个 normal 独立的维度列,“Derived” 添加一个 derived 维度。阅读更多【如何优化 cube】(/docs15/howto/howto_optimize_cubes.html)。
+1. 点击 `Add Dimension`,在弹窗中显示的事实表和 lookup 表里勾选输入需要的列。Lookup 表的列有2个选项:“Normal” 和 “Derived”(默认)。“Normal” 添加一个普通独立的维度列,“Derived” 添加一个 derived 维度,derived 维度不会计算入 cube,将由事实表的外键推算出。阅读更多【如何优化 cube】(/docs15/howto/howto_optimize_cubes.html)。
 
 2. 选择所有维度后点击 “Next”。
 
@@ -145,11 +145,11 @@ cube 名字可以使用字母,数字和下划线(空格也是允许的)。
    2)精确实现 bitmap(具体限制请看 https://issues.apache.org/jira/browse/KYLIN-1186)
 
      ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-distinct.png)
-    
+   
     注意:distinct 是一种非常重的数据类型,和其他度量相比构建和查询会更慢。
-    
+   
    * TOP_N
-   TopN 度量在每个维度结合时预计算,它比未预计算的在查询时间上性能更好;需要两个参数:一是被用来作为 Top 记录的度量列;二是 literal ID,代表记录就像 seller_id;
+   TopN 度量在每个维度结合时预计算,它比未预计算的在查询时间上性能更好;需要两个参数:一是被用来作为 Top 记录的度量列,Kylin 将计算它的 SUM 值并做倒序排列;二是 literal ID,代表最 Top 的记录,例如 seller_id;
    
    合理的选择返回类型,将决定多少 top 记录被监察:top 10, top 100, top 500, top 1000, top 5000 or top 10000。
 
@@ -179,27 +179,37 @@ cube 名字可以使用字母,数字和下划线(空格也是允许的)。
 
 **步骤5. 高级设置**
 
-`Aggregation Groups`: 默认 kylin 会把所有维度放在一个聚合组;如果你很好的了解你的查询模式,那么你可以创建多个聚合组。对于 "Mandatory Dimensions", "Hierarchy Dimensions" 和 "Joint Dimensions", 请阅读这个博客: [新的聚合组](/blog/2016/02/18/new-aggregation-group/)
+`Aggregation Groups`: Cube 中的维度可以划分到多个聚合组中。默认 kylin 会把所有维度放在一个聚合组,当维度较多时,产生的组合数可能是巨大的,会造成 Cube 爆炸;如果你很好的了解你的查询模式,那么你可以创建多个聚合组。在每个聚合组内,使用 "Mandatory Dimensions", "Hierarchy Dimensions" 和 "Joint Dimensions" 来进一步优化维度组合。
+
+`Mandatory Dimensions`: 必要维度,用于总是出现的维度。例如,如果你的查询中总是会带有 "ORDER_DATE" 做为 group by 或 过滤条件, 那么它可以被声明为必要维度。这样一来,所有不含此维度的 cuboid 就可以被跳过计算。
+
+`Hierarchy Dimensions`: 层级维度,例如 "国家" -> "省" -> "市" 是一个层级;不符合此层级关系的 cuboid 可以被跳过计算,例如 ["省"], ["市"]. 定义层级维度时,将父级别维度放在子维度的左边。
+
+`Joint Dimensions`:联合维度,有些维度往往一起出现,或者它们的基数非常接近(有1:1映射关系)。例如 "user_id" 和 "email"。把多个维度定义为组合关系后,所有不符合此关系的 cuboids 会被跳过计算。 
+
+关于更多维度优化,请阅读这个博客: [新的聚合组](/blog/2016/02/18/new-aggregation-group/)
+
+`Rowkeys`: 是由维度编码值组成。"Dictionary" (字典)是默认的编码方式; 字典只能处理中低基数(少于一千万)的维度;如果维度基数很高(如大于1千万), 选择 "false" 然后为维度输入合适的长度,通常是那列的最大长度值; 如果超过最大值,会被截断。请注意,如果没有字典编码,cube 的大小可能会非常大。
 
-`Rowkeys`: 是由维度编码值组成。"Dictionary" 是默认的编码方式; 如果维度和字典不符合(比如 cardinality > 1千万), 选择 "false" 然后为维度输入合适的长度,通常是那列的最大值; 如果超过最大值,他将会被截断。请注意,如果没有字典编码,cube 的大小将会变的非常大。
+你可以拖拽维度列去调整其在 rowkey 中位置; 位于rowkey前面的列,将可以用来大幅缩小查询的范围。通常建议将 mandantory 维度放在开头, 然后是在过滤 ( where 条件)中起到很大作用的维度;如果多个列都会被用于过滤,将高基数的维度(如 user_id)放在低基数的维度(如 age)的前面。
 
-你可以拖拽维度列去调整其在 rowkey 中位置; 将 mandantory 维度放在开头, 然后是在过滤 ( where 条件)中起到很大作用的维度. 将高 cardinality 的维度放在低 cardinality 的维度前.
+`Mandatory Cuboids`: 维度组合白名单。确保你想要构建的 cuboid 能被构建。
 
-`Mandatory Cuboids`: 确保你想要构建的 cuboid 能顺利构建。
+`Cube Engine`: cube 构建引擎。有两种:MapReduce 和 Spark。如果你的 cube 只有简单度量(SUM, MIN, MAX),建议使用 Spark。如果 cube 中有复杂类型度量(COUNT DISTINCT, TOP_N),建议使用 MapReduce。 
 
-`Cube Engine`: cube 构建引擎。有两种类型:MapReduce and Spark。
+`Advanced Dictionaries`: "Global Dictionary" 是用于精确计算 COUNT DISTINCT 的字典, 它会将一个非 integer的值转成 integer,以便于 bitmap 进行去重。如果你要计算 COUNT DISTINCT 的列本身已经是 integer 类型,那么不需要定义 Global Dictionary。 Global Dictionary 会被所有 segment 共享,因此支持在跨 segments 之间做上卷去重操作。请注意,Global Dictionary 随着数据的加载,可能会不断变大。
 
-`Advanced Dictionaries`: "Global Dictionary" 是准确计算不同度量的默认字典, 支持在所有 segments 汇总。
+"Segment Dictionary" 是另一个用于精确计算 COUNT DISTINCT 的字典,与 Global Dictionary 不同的是,它是基于一个 segment 的值构建的,因此不支持跨 segments 的汇总计算。如果你的 cube 不是分区的或者能保证你的所有 SQL 按照 partition_column 进行 group by, 那么你应该使用 "Segment Dictionary" 而不是 "Global Dictionary",这样可以避免单个字典过大的问题。
 
-"Segment Dictionary" 是准确计算不同度量的专用字典, 是基于一个 segment 的并且不支持在所有 segments 汇总。特别地,如果你的 cube 不是分区的或者你能保证你的所有 SQL 按照 partition_column 进行 group by, 那么你应该使用 "Segment Dictionary" 而不是 "Global Dictionary"。
+请注意:"Global Dictionary" 和 "Segment Dictionary" 都是单向编码的字典,仅用于 COUNT DISTINCT 计算(将非 integer 类型转成 integer 用于 bitmap计算),他们不支持解码,因此不能为普通维度编码。
 
 `Advanced Snapshot Table`: 为全局 lookup 表而设计,提供不同的存储类型。
 
-`Advanced ColumnFamily`: 如果有超过一个超高的 cardinality 精确的计算不同的度量, 你可以将它们放在更多列簇中。
+`Advanced ColumnFamily`: 如果有超过一个的COUNT DISTINCT 或 TopN 度量, 你可以将它们放在更多列簇中,以优化与HBase 的I/O。
 
 **步骤6. 重写配置**
 
-Cube 级别的属性将会覆盖 kylin.properties 中的配置, 如果你没有要配置的,点击 `Next` 按钮。
+Kylin 允许在 Cube 级别覆盖部分 kylin.properties 中的配置,你可以在这里定义覆盖的属性。如果你没有要配置的,点击 `Next` 按钮。
 
 ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/10 configuration.png)
 
diff --git a/website/_docs23/tutorial/create_cube.md b/website/_docs23/tutorial/create_cube.md
index b5adce8..6947ff7 100644
--- a/website/_docs23/tutorial/create_cube.md
+++ b/website/_docs23/tutorial/create_cube.md
@@ -1,4 +1,4 @@
----
+---
 layout: docs23
 title:  Cube Wizard
 categories: tutorial
@@ -6,21 +6,19 @@ permalink: /docs23/tutorial/create_cube.html
 ---
 
 This tutorial will guide you to create a cube. It need you have at least 1 sample table in Hive. If you don't have, you can follow this to create some data.
-  
+
 ### I. Create a Project
 1. Go to `Model` page in top menu bar, then click `Manage Projects`.
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/1 manage-prject.png)
 
-2. Click the `+ Project` button to add a new project or ignore the first step and click `Add Project`(button on the right) of the first picture.
-
-   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/2 +project.png)
+2. Click the `+ Project` button to add a new project.
 
-3. Enter a project name, e.g, "Tutorial", with a description (optional) and the config (optional), then click `submit` button to send the request.
+3. Enter a project name, e.g, "Tutorial", with a description (optional) and the overwritten Kylin configuration properties (optional), then click `submit` button.
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/3 new-project.png)
 
-4. After success, the project will show in the table.
+4. After success, the project will show in the table. You can switch the current project with the dropdown in the top of the page.
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/3.1 pj-created.png)
 
@@ -29,7 +27,7 @@ This tutorial will guide you to create a cube. It need you have at least 1 sampl
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/4 +table.png)
 
-2. Enter the hive table names, separated with commad, and then click `Sync` to send the request.
+2. Enter the hive table names, separated with commad, and then click `Sync` .
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
 
@@ -41,7 +39,7 @@ This tutorial will guide you to create a cube. It need you have at least 1 sampl
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table-tree.png)
 
-5. A success message will pop up. In the left `Tables` section, the newly loaded table is added. Click the table name will expand the columns.
+5.  In the left `Tables` section, the newly loaded table is added. Click the table name will shows the columns.
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table-info.png)
 
@@ -51,7 +49,7 @@ This tutorial will guide you to create a cube. It need you have at least 1 sampl
 
 
 ### III. Create Data Model
-Before create a cube, need define a data model. The data model defines the star schema. One data model can be reused in multiple cubes.
+Before creating a cube, you need to define a data model. The data model defines a star/snowflake schema. But it doesn't define the aggregation policies. One data model can be referenced by multiple cubes.
 
 1. Click `Model` in top bar, and then click `Models` tab. Click `+New` button, in the drop-down list select `New Model`.
 
@@ -69,34 +67,34 @@ Before create a cube, need define a data model. The data model defines the star
 
     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-lookup-table.png)
 
-5. [Optional] Click `New Join Condition` button, select the FK column of fact table in the left, and select the PK column of lookup table in the right side. Repeat this if have more than one join columns.
+5. Click `New Join Condition` button, select the FK column of fact table in the left, and select the PK column of lookup table in the right side. Repeat this step if have more than one join columns.
 
     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-join-condition.png)
 
 6. Click "OK", repeat step 4 and 5 to add more lookup tables if any. After finished, click "Next".
 
-7. The "Dimensions" page allows to select the columns that will be used as dimension in the child cubes. Click the `Columns` cell of a table, in the drop-down list select the column to the list. 
+7. The "Dimensions" page allows to select the columns that will be used as dimension in the cubes. Click the `Columns` cell of a table, in the drop-down list select the column to the list. Usually all "Varchar", "String", "Date" columns should be declared as dimension. Only a column in this list can be added into a cube as dimension, so please add all possible dimension columns here.
 
     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-dimensions.png)
 
-8. Click "Next" go to the "Measures" page, select the columns that will be used in measure/metrics. The measure column can only from fact table. 
+8. Click "Next" go to the "Measures" page, select the columns that will be used in measure/metrics. The measure column can only from fact table. Usually the "long", "int", "double", "decimal" columns are declared as measures. 
 
     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-measures.png)
 
 9. Click "Next" to the "Settings" page. If the data in fact table increases by day, select the corresponding date column in the `Partition Date Column`, and select the date format, otherwise leave it as blank.
 
-10. [Optional] Choose whether has a separate "time of the day" column, by default it is `No`. If choose `Yes`, select the corresponding time column in the `Partition Time Column`, and select the time format.
+10. [Optional] Choose whether has a separate "time of the day" column, by default it is `No`. If choose `Yes`, select the corresponding time column in the `Partition Time Column`, and select the time format.![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-partition-column.png)
 
-11. [Optional] If some records want to excluded from the cube, like dirty data, you can input the condition in `Filter`.
+11. [Optional] If some conditions need to be applied when extracting data from Hive,  you can input the condition in `Filter`.
 
-    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-partition-column.png)
+    
 
 12. Click `Save` and then select `Yes` to save the data model. After created, the data model will be shown in the left `Models` list.
 
     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-created.png)
 
 ### IV. Create Cube
-After the data model be created, you can start to create cube. 
+After the data model be created, you can start to create a cube. 
 
 Click `Model` in top bar, and then click `Models` tab. Click `+New` button, in the drop-down list select `New Cube`.
 
@@ -109,14 +107,14 @@ Select the data model, enter the cube name; Click `Next` to enter the next step.
 You can use letters, numbers and '_' to name your cube (blank space in name is not allowed). `Notification Email List` is a list of email addresses which be notified on cube job success/failure. `Notification Events` is the status to trigger events.
 
     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 cube-info.png)
-    
+
 **Step 2. Dimensions**
 
-1. Click `Add Dimension`, it pops up a window: tick columns that you need from FactTable and LookupTable. There are two options for LookupTable columns: "Normal" and "Derived" (default)。"Normal" is to add a normal independent dimension column, "Derived" is to add a derived dimension column. Read more in [How to optimize cubes](/docs15/howto/howto_optimize_cubes.html).
+1. Click `Add Dimension`, it pops up a window: tick columns that you need from FactTable and LookupTable. There are two options for LookupTable columns: "Normal" and "Derived" (default). "Normal" is to add a normal independent dimension column, "Derived" is to add a derived dimension column (deriving from the FK of the fact table). Read more in [How to optimize cubes](/docs15/howto/howto_optimize_cubes.html).
 
    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 cube-dimension-batch.png)
 
-2. Click "Next" after select all dimensions.
+2. Click "Next" after select all other dimensions.
 
 **Step 3. Measures**
 
@@ -151,7 +149,7 @@ You can use letters, numbers and '_' to name your cube (blank space in name is n
    Pleaste note: distinct count is a very heavy data type, it is slower to build and query comparing to other measures.
 
    * TOP_N
-   Approximate TopN measure pre-calculates the top records in each dimension combination, it will provide higher performance in query time than no pre-calculation; Need specify two parameters here: the first is the column will be used as metrics for Top records (aggregated with SUM and then sorted in descending order); the second is the literal ID, represents the record like seller_id;
+   Approximate TopN measure pre-calculates the top records in each dimension combination, it will provide higher performance in query time than no pre-calculation; Need specify two parameters here: the first is the column will be used as metrics for Top records (aggregated with SUM and then sorted in descending order); the second is the literal ID, represents the entity like seller_id;
 
    Properly select the return type, depends on how many top records to inspect: top 10, top 100, top 500, top 1000, top 5000 or top 10000. 
 
@@ -183,27 +181,37 @@ This step is designed for incremental cube build.
 
 **Step 5. Advanced Setting**
 
-`Aggregation Groups`: by default Kylin put all dimensions into one aggregation group; you can create multiple aggregation groups by knowing well about your query patterns. For the concepts of "Mandatory Dimensions", "Hierarchy Dimensions" and "Joint Dimensions", read this blog: [New Aggregation Group](/blog/2016/02/18/new-aggregation-group/)
+`Aggregation Groups`: The dimensions can be divided into multiple groups, each group is called an "agg group". By default Kylin put all dimensions into one aggregation group. When you have many dimensions, that will cause cube explosion. You can create multiple agg groups by knowing well about your query patterns. In each agg group, you can use the concepts of "Mandatory Dimensions", "Hierarchy Dimensions" and "Joint Dimensions" to further optimize the dimension combinations. 
+
+`Mandatory Dimensions`: Dimensions that appears always. For example, if all your queries have "ORDER_DATE" as the group by or filtering condition, then it can be marked as mandatory. The cuboids that doesn't have this dimension can be omitted for building.
+
+`Hierarchy Dimensions`: For example "Country" -> "State" -> "City" is a logic hierarchy; The cuboids that doesn't comply with this hierarchy can be omitted for building, for example ["STATE", "CITY"], ["CITY"]. When defining a hierarchy, put the parent level dimension before the child level dimension.
+
+`Joint Dimensions`:Some dimensions will always appear together, or their cardinality is close (near 1:1). For example, "user_id" and "email". Defining them as a joint relationship, then the cuboids only has partial of them can be omitted. 
+
+For more please read this blog: [New Aggregation Group](/blog/2016/02/18/new-aggregation-group/)
 
 `Rowkeys`: the rowkeys are composed by the dimension encoded values. "Dictionary" is the default encoding method; If a dimension is not fit with dictionary (e.g., cardinality > 10 million), select "false" and then enter the fixed length for that dimension, usually that is the max length of that column; if a value is longer than that size it will be truncated. Please note, without dictionary encoding, the cube size might be much bigger.
 
 You can drag & drop a dimension column to adjust its position in rowkey; Put the mandantory dimension at the begining, then followed the dimensions that heavily involved in filters (where condition). Put high cardinality dimensions ahead of low cardinality dimensions.
 
-`Mandatory Cuboids`: ensure the cuboid that you want to build can build smoothly.
+`Mandatory Cuboids`: Whitelist of the cuboids that you want to build.
+
+`Cube Engine`: The engine for building cube. There are 2 engines: MapReduce and Spark. If your cube only has simple measures (COUNT, SUM, MIN, MAX), Spark can gain better performance; If cube has complex measures (COUNT DISTINCT, TOP_N), MapReduce is more stable.
 
-`Cube Engine`: the engine for building cube. There are 2 types: MapReduce and Spark.
+`Advanced Dictionaries`: "Global Dictionary" is the default dictionary for precise count distinct measure, it can ensure one value always be encoded into one consistent integer, so it can support "COUNT DISTINCT" rollup among multiple segments. But global dictionary may grow to very big size as time go.
 
-`Advanced Dictionaries`: "Global Dictionary" is the default dict for precise count distinct measure, which support rollup among all segments.
+"Segment Dictionary" is a special dictionary for precise count distinct measure, which is built on one segment and could not support rollup among segments. Its size can be much smaller than global dictionary. Specifically, if your cube isn't partitioned or you can ensure all your SQLs will group by your partition_column, you could use "Segment Dictionary" instead of "Global Dictionary".
 
-"Segment Dictionary" is the special dict for precise count distinct measure, which is based on one segment and could not support rollup among segments. Specifically, if your cube isn't partitioned or you can ensure all your SQLs will group by your partition_column, you could use "Segment Dictionary" instead of "Global Dictionary".
+Please note: "Global Dictionary" and "Segment Dictionary" are one-way dictionary for COUNT DISTINCT (converting a non-integer value to integer for bitmap), they couldn't be used as the encoding for a dimension.
 
 `Advanced Snapshot Table`: design for global lookup table and provide different storage type.
 
-`Advanced ColumnFamily`: If there are more than one ultrahigh cardinality precise count distinct measures, you could assign these measures to more column family.
+`Advanced ColumnFamily`: If there are more than one ultra-high cardinality precise count distinct or TopN measures, you could divide these measures to more column family to optimize the I/O from HBase.
 
 **Step 6. Configuration Overwrites**
 
-Cube level properties will overwrite configuration in kylin.properties, if you don't have anything to config, click `Next` button.
+Kylin allows overwritting system configurations (conf/kylin.properties) at Cube level . You can add the key/values that you want to overwrite here. If you don't have anything to config, click `Next` button.
 
 ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/10 configuration.png)
 
@@ -213,4 +221,4 @@ You can overview your cube and go back to previous step to modify it. Click the
 
 ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/11 overview.png)
 
-Cheers! now the cube is created, you can go ahead to build and play it.
+Cheers! Now the cube is created, you can go ahead to build and play it.
diff --git a/website/_docs23/tutorial/cube_build_job.cn.md b/website/_docs23/tutorial/cube_build_job.cn.md
index d94edc1..3040115 100644
--- a/website/_docs23/tutorial/cube_build_job.cn.md
+++ b/website/_docs23/tutorial/cube_build_job.cn.md
@@ -1,15 +1,9 @@
----
-
+---
 layout: docs23-cn
-
-title:  Kylin Cube 建立和Job监控教程
-
+title: Cube 构建和 Job 监控
 categories: 教程
-
 permalink: /cn/docs23/tutorial/cube_build_job.html
-
 version: v1.2
-
 since: v0.7.1
 
 ---
diff --git a/website/_docs23/tutorial/kylin_client_tool.cn.md b/website/_docs23/tutorial/kylin_client_tool.cn.md
index d240017..df2754a 100644
--- a/website/_docs23/tutorial/kylin_client_tool.cn.md
+++ b/website/_docs23/tutorial/kylin_client_tool.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  Kylin Python 客户端工具库
+title:  Python 客户端工具库
 categories: 教程
 permalink: /cn/docs23/tutorial/kylin_client_tool.html
 ---
diff --git a/website/_docs23/tutorial/odbc.cn.md b/website/_docs23/tutorial/odbc.cn.md
index ddcd8c7..6515bfc 100644
--- a/website/_docs23/tutorial/odbc.cn.md
+++ b/website/_docs23/tutorial/odbc.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  Kylin ODBC 驱动程序教程
+title:  ODBC 驱动程序
 categories: 教程
 permalink: /cn/docs23/tutorial/odbc.html
 version: v1.2
diff --git a/website/_docs23/tutorial/powerbi.cn.md b/website/_docs23/tutorial/powerbi.cn.md
index c4d6b8a..2dd3223 100644
--- a/website/_docs23/tutorial/powerbi.cn.md
+++ b/website/_docs23/tutorial/powerbi.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  微软Excel及Power BI教程
+title:  MS Excel及Power BI教程
 categories: tutorial
 permalink: /cn/docs23/tutorial/powerbi.html
 version: v1.2
@@ -28,7 +28,7 @@ Microsoft Power BI 是由微软推出的商业智能的专业分析工具,给
 
 > 为了简化连接字符串的输入,推荐创建Apache Kylin的DSN,可以将连接字符串简化为DSN=[YOUR_DSN_NAME],有关DSN的创建请参考:[https://support.microsoft.com/en-us/kb/305599](https://support.microsoft.com/en-us/kb/305599)。
 
- 
+
 3. 如果您选择不输入SQL语句,Power Query将会列出所有的数据库表,您可以根据需要对整张表的数据进行加载。但是,Apache Kylin暂不支持原数据的查询,部分表的加载可能因此受限
 ![](/images/tutorial/odbc/ms_tool/Picture3.png)
 
diff --git a/website/_docs23/tutorial/query_pushdown.cn.md b/website/_docs23/tutorial/query_pushdown.cn.md
index ca3e6cb..0e5ebc1 100644
--- a/website/_docs23/tutorial/query_pushdown.cn.md
+++ b/website/_docs23/tutorial/query_pushdown.cn.md
@@ -6,7 +6,7 @@ permalink: /cn/docs23/tutorial/query_pushdown.html
 since: v2.1
 ---
 
-### Kylin支持查询下压
+### Kylin 支持查询下压
 
 对于没有cube能查得结果的sql,Kylin支持将这类查询通过JDBC下压至备用查询引擎如Hive, SparkSQL, Impala等来查得结果。以下以Hive为例说明开启步骤,由于Kylin本事就将Hive作为数据源,作为Query Pushdown引擎也更易使用与配置。
 
@@ -20,7 +20,7 @@ since: v2.1
     - *kylin.query.pushdown.jdbc.url*:Hive JDBC的URL.
 
     - *kylin.query.pushdown.jdbc.driver*:Hive Jdbc的driver类名
-        
+      
     - *kylin.query.pushdown.jdbc.username*:Hive Jdbc对应数据库的用户名
 
     - *kylin.query.pushdown.jdbc.password*:Hive Jdbc对应数据库的密码
diff --git a/website/_docs23/tutorial/tableau.cn.md b/website/_docs23/tutorial/tableau.cn.md
index 57868d1..8e33ac4 100644
--- a/website/_docs23/tutorial/tableau.cn.md
+++ b/website/_docs23/tutorial/tableau.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  Tableau教程
+title:  Tableau
 categories: 教程
 permalink: /cn/docs23/tutorial/tableau.html
 version: v1.2
diff --git a/website/_docs23/tutorial/tableau_91.cn.md b/website/_docs23/tutorial/tableau_91.cn.md
index 9693b7a..9d20866 100644
--- a/website/_docs23/tutorial/tableau_91.cn.md
+++ b/website/_docs23/tutorial/tableau_91.cn.md
@@ -1,6 +1,6 @@
 ---
 layout: docs23-cn
-title:  Tableau 9 教程
+title:  Tableau 9 
 categories: tutorial
 permalink: /cn/docs23/tutorial/tableau_91.html
 version: v1.2
diff --git a/website/_docs23/tutorial/web.cn.md b/website/_docs23/tutorial/web.cn.md
index 6e34423..ad311d3 100644
--- a/website/_docs23/tutorial/web.cn.md
+++ b/website/_docs23/tutorial/web.cn.md
@@ -1,6 +1,6 @@
----
+---
 layout: docs23-cn
-title:  Kylin网页版教程
+title:  Web 界面
 categories: 教程
 permalink: /cn/docs23/tutorial/web.html
 version: v1.2
@@ -64,7 +64,7 @@ Kylin 的网页版为用户提供了一个简单的查询工具来运行 SQL 以
 
    ![]( /images/Kylin-Web-Tutorial/10 query-result.png)
 
-* 已保存的查询(只在 LDAP security 有效后才能使用):
+* 已保存的查询:
 
    与用户账号关联,你将能够从不同的浏览器甚至机器上获取已保存的查询。
    在结果区域点击 “Save”,将会弹出用来保存当前查询名字和描述:
diff --git a/website/_docs23/tutorial/web.md b/website/_docs23/tutorial/web.md
index d0d15e9..8257099 100644
--- a/website/_docs23/tutorial/web.md
+++ b/website/_docs23/tutorial/web.md
@@ -1,4 +1,4 @@
----
+---
 layout: docs23
 title:  Web Interface
 categories: tutorial
@@ -17,12 +17,12 @@ Login with password:KYLIN
 ![](/images/tutorial/1.5/Kylin-Web-Tutorial/1 login.png)
 
 ## 2. Sync Hive Table into Kylin
-Although Kylin will using SQL as query interface and leverage Hive metadata, kylin will not enable user to query all hive tables since it's a pre-build OLAP (MOLAP) system so far. To enable Table in Kylin, it will be easy to using "Sync" function to sync up tables from Hive.
+Although Kylin will using SQL as query interface and leverage Hive metadata, kylin will not enable user to query all hive tables since it's a pre-build OLAP (MOLAP) system so far. To enable Table in Kylin, use "Sync" function to sync up hive table metadata to Kylin.
 
 ![](/images/tutorial/1.5/Kylin-Web-Tutorial/2 tables.png)
 
 ## 3. Kylin OLAP Cube
-Kylin's OLAP Cubes are pre-calculation datasets from star schema tables, Here's the web interface for user to explore, manage all cubes. Go to `Model` menu, it will list all cubes available in system.
+Kylin's OLAP Cubes are pre-calculation datasets from star/snowflake schema tables, Here's the web interface for user to explore, manage all cubes. Go to `Model` menu, it will list all cubes available in system.
 
 ![](/images/tutorial/1.5/Kylin-Web-Tutorial/3 cubes.png)
 
@@ -61,7 +61,7 @@ Go to "Insight" menu.
 
    ![](/images/tutorial/1.5/Kylin-Web-Tutorial/10 query-result.png)
 
-* Saved Query (only work after enable LDAP security):
+* Saved Query:
 
    Associate with user account, you can get saved query from different browsers even machines.
    Click "Save" in Result area, it will popup for name and description to save current query.

-- 
To stop receiving notification emails like this one, please contact
shaofengshi@apache.org.