You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by xx...@apache.org on 2020/11/10 08:05:56 UTC

[kylin] 01/02: update system cube

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git

commit bda1466486b3a1bd082326fd0e0710bf8dcd5940
Author: xuekaiqi <ka...@qq.com>
AuthorDate: Tue Nov 10 10:26:19 2020 +0800

    update system cube
---
 website/_docs/tutorial/setup_systemcube.cn.md | 152 +++++++++++++++-------
 website/_docs/tutorial/setup_systemcube.md    | 177 +++++++++++++++++---------
 2 files changed, 220 insertions(+), 109 deletions(-)

diff --git a/website/_docs/tutorial/setup_systemcube.cn.md b/website/_docs/tutorial/setup_systemcube.cn.md
index 0224cec..d05e942 100644
--- a/website/_docs/tutorial/setup_systemcube.cn.md
+++ b/website/_docs/tutorial/setup_systemcube.cn.md
@@ -7,16 +7,24 @@ permalink: /cn/docs/tutorial/setup_systemcube.html
 
 > 自 Apache Kylin v2.3.0 起有效
 
-## 什么是系统 Cube
+本节主要内容:
+
+- [什么是系统 Cube](#什么是系统 Cube)
+- [如何建立系统 Cube](#如何建立系统 Cube)
+- [自动创建系统 Cube](#自动创建系统 Cube)
+- [系统 Cube 的细节](#系统 Cube 的细节)
+
+## <span id="什么是系统 Cube">什么是系统 Cube</span>
 
 为了更好的支持自我监控,在系统 project 下创建一组系统 Cubes,叫做 "KYLIN_SYSTEM"。现在,这里有五个 Cubes。三个用于查询指标,"METRICS_QUERY","METRICS_QUERY_CUBE","METRICS_QUERY_RPC"。另外两个是 job 指标,"METRICS_JOB","METRICS_JOB_EXCEPTION"。
 
-## 如何建立系统 Cube
+## <span id="如何建立系统 Cube">如何建立系统 Cube</span>
 
-### 准备
-在 KYLIN_HOME 目录下创建一个配置文件 SCSinkTools.json。
+本节我们介绍手动启用系统 Cube 的方法,如果您希望通过 shell 脚本自动创建系统 Cube,请参考[自动创建系统 Cube](#什么是系统 Cube)。
 
-例如:
+### 1. 准备
+
+在 KYLIN_HOME 目录下创建一个配置文件 SCSinkTools.json。例如:
 
 ```
 [
@@ -31,8 +39,8 @@ permalink: /cn/docs/tutorial/setup_systemcube.html
 ]
 ```
 
-### 1. 生成 Metadata
-在 KYLIN_HOME 文件夹下运行一下命令生成相关的 metadata:
+### 2. 生成 Metadata
+在 KYLIN_HOME 文件夹下运行以下命令生成相关的 metadata:
 
 ```
 ./bin/kylin.sh org.apache.kylin.tool.metrics.systemcube.SCCreator \
@@ -44,37 +52,32 @@ permalink: /cn/docs/tutorial/setup_systemcube.html
 
 ![metadata](/images/SystemCube/metadata.png)
 
-### 2. 建立数据源
-运行下列命令生成 hive 源表:
+### 3. 建立数据源
+运行下列命令生成 Hive 源表:
 
 ```
 hive -f <output_forder>/create_hive_tables_for_system_cubes.sql
 ```
 
-通过这个命令,相关的 hive 表将会被创建。
+通过这个命令,相关的 hive 表将会被创建。每一个系统 Cube 中的事实表对应了一张 Hive 源表,Hive 源表中记录了查询或任务相关的数据,这些数据将为系统 Cube 服务。
 
 ![hive_table](/images/SystemCube/hive_table.png)
 
-### 3. 为 System Cubes 上传 Metadata 
+### 4. 为系统 Cubes 上传 Metadata 
 然后我们需要通过下列命令上传 metadata 到 hbase:
 
 ```
 ./bin/metastore.sh restore <output_forder>
 ```
 
-### 4. 重载 Metadata
-最终,我们需要在 Kylin web UI 重载 metadata。
-
+### 5. 重载 Metadata
+最终,我们需要在 Kylin web UI 重载 metadata。然后,一组系统 Cubes 将会被创建在系统 project 下,称为 "KYLIN_SYSTEM"。
 
-然后,一组系统 Cubes 将会被创建在系统 project 下,称为 "KYLIN_SYSTEM"。
 
+### 6. 构建系统 Cube
+当系统 Cube 被创建,我们需要定期构建 Cube。方法如下:
 
-### 5. 系统 Cube build
-当系统 Cube 被创建,我们需要定期 build Cube。
-
-1. 创建一个 shell 脚本其通过调用 org.apache.kylin.tool.job.CubeBuildingCLI 来 build 系统 Cube
-  
-	例如:
+**步骤一**:创建一个 shell 脚本,通过调用 org.apache.kylin.tool.job.CubeBuildingCLI 来构建系统 Cube。例如:
 
 {% highlight Groff markup %}
 #!/bin/bash
@@ -96,9 +99,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
 
 {% endhighlight %}
 
-2. 然后定期运行这个 shell 脚本
-
-	例如,像接下来这样添加一个 cron job:
+**步骤二**:定期运行这个 shell 脚本。例如,像接下来这样添加一个 cron job:
 
 {% highlight Groff markup %}
 0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000
@@ -113,20 +114,23 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
 
 {% endhighlight %}
 
-## 自动创建系统cube
+## <span id="自动创建系统 Cube">自动创建系统 Cube</span>
 
-从kylin 2.6.0开始提供system-cube.sh脚本,用户可以通过执行此脚本来自动创建系统cube。
+从kylin 2.6.0 开始提供 system-cube.sh 脚本,用户可以通过执行此脚本来自动创建系统 Cube。
 
-- 创建系统cube:`sh system-cube.sh setup`
+- 创建系统 Cube:`sh system-cube.sh setup`
 
-- 构建系统cube:`sh bin/system-cube.sh build`
+- 构建系统 Cube:`sh bin/system-cube.sh build`
 
-- 为系统cube添加定时任务:`bin/system.sh cron`
+- 为系统 Cube 添加定时任务:`bin/system.sh cron`
 
-## 系统 Cube 的细节
+## <span id="系统 Cube 的细节">系统 Cube 的细节</span>
+
+Hive 中有 5 张表记录了 Kylin 系统的相关指标数据,每一个系统 Cube 的事实表对应了一张 Hive 表,共有 5 个系统 Cube。
 
 ### 普通 Dimension
-对于这些 Cube,admins 能够用四个时间粒度查询。从高级别到低级别,如下:
+
+对于这些系统 Cube,admins 能够用四个时间粒度查询,这些维度在 5 个系统 Cube 中均生效。从高级别到低级别,如下:
 
 <table>
   <tr>
@@ -147,6 +151,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
   </tr>
 </table>
 
+
 ### METRICS_QUERY
 这个 Cube 用于在最高级别收集查询 metrics。细节如下:
 
@@ -159,12 +164,16 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td>the host of server for query engine</td>
   </tr>
   <tr>
+    <td>KUSER</td>
+    <td>the user who executes the query</td>
+  </tr>
+  <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>REALIZATION</td>
-    <td>in Kylin,there are two OLAP realizations: Cube,or Hybrid of Cubes</td>
+    <td>the cube which the query hits. In Kylin,there are two OLAP realizations: Cube,or Hybrid of Cubes</td>
   </tr>
   <tr>
     <td>REALIZATION_TYPE</td>
@@ -189,7 +198,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td></td>
   </tr>
   <tr>
-    <td>MIN,MAX,SUM of QUERY_TIME_COST</td>
+    <td>MIN,MAX,SUM,PERCENTILE_APPROX of QUERY_TIME_COST</td>
     <td>the time cost for the whole query</td>
   </tr>
   <tr>
@@ -210,6 +219,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
   </tr>
 </table>
 
+
 ### METRICS_QUERY_RPC
 这个 Cube 用于在最低级别收集查询 metrics。对于一个查询,相关的 aggregation 和 filter 能够下推到每一个 rpc 目标服务器。Rpc 目标服务器的健壮性是更好查询性能的基础。细节如下:
 
@@ -223,11 +233,11 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
   </tr>
   <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>REALIZATION</td>
-    <td></td>
+    <td>the cube which the query hits.</td>
   </tr>
   <tr>
     <td>RPC_SERVER</td>
@@ -248,7 +258,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td></td>
   </tr>
   <tr>
-    <td>MAX,SUM of CALL_TIME</td>
+    <td>MAX,SUM,PERCENTILE_APPROX of CALL_TIME</td>
     <td>the time cost of a rpc all</td>
   </tr>
   <tr>
@@ -273,6 +283,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
   </tr>
 </table>
 
+
 ### METRICS_QUERY_CUBE
 这个 Cube 用于在 Cube 级别收集查询 metrics。最重要的是 cuboids 相关的,其为 Cube planner 提供服务。细节如下:
 
@@ -285,6 +296,10 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td></td>
   </tr>
   <tr>
+    <td>SEGMENT_NAME</td>
+    <td></td>
+  </tr>
+  <tr>
     <td>CUBOID_SOURCE</td>
     <td>source cuboid parsed based on query and Cube design</td>
   </tr>
@@ -311,6 +326,10 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td></td>
   </tr>
   <tr>
+    <td>WEIGHT_PER_HIT</td>
+    <td></td>
+  </tr>
+  <tr>
     <td>MAX,SUM of STORAGE_CALL_COUNT</td>
     <td>the number of rpc calls for a query hit on this Cube</td>
   </tr>
@@ -327,23 +346,24 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td>the sum of row count skipped for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX,SUM of STORAGE_SIZE_SCAN</td>
+    <td>MAX,SUM of STORAGE_COUNT_SCAN</td>
     <td>the sum of row count scanned for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX,SUM of STORAGE_SIZE_RETURN</td>
+    <td>MAX,SUM of STORAGE_COUNT_RETURN</td>
     <td>the sum of row count returned for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX,SUM of STORAGE_SIZE_AGGREGATE</td>
+    <td>MAX,SUM of STORAGE_COUNT_AGGREGATE</td>
     <td>the sum of row count aggregated for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX,SUM of STORAGE_SIZE_AGGREGATE_FILTER</td>
+    <td>MAX,SUM of STORAGE_COUNT_AGGREGATE_FILTER</td>
     <td>the sum of row count aggregated and filtered for the related rpc calls,= STORAGE_SIZE_SCAN - STORAGE_SIZE_RETURN</td>
   </tr>
 </table>
 
+
 ### METRICS_JOB
 在 Kylin 中,主要有三种类型的 job:
 - "BUILD",为了从 **HIVE** 中 building Cube segments。
@@ -357,16 +377,24 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <th colspan="2">Dimension</th>
   </tr>
   <tr>
+    <td>HOST</td>
+    <td>the host of server for query engine</td>
+  </tr>
+  <tr>
+    <td>KUSER</td>
+    <td>the user who executes the query</td>
+  </tr>
+  <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>CUBE_NAME</td>
-    <td></td>
+    <td>the cube which the query hits.</td>
   </tr>
   <tr>
     <td>JOB_TYPE</td>
-    <td></td>
+    <td>build, merge or optimize</td>
   </tr>
   <tr>
     <td>CUBING_TYPE</td>
@@ -383,7 +411,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td></td>
   </tr>
   <tr>
-    <td>MIN,MAX,SUM of DURATION</td>
+    <td>MIN,MAX,SUM,PERCENTILE_APPROX of DURATION</td>
     <td>the duration from a job start to finish</td>
   </tr>
   <tr>
@@ -402,8 +430,25 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <td>MIN,MAX,SUM of WAIT_RESOURCE_TIME</td>
     <td>a job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.</td>
   </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_distinct_columns</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_dictionary</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_inmem_cubing</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_hfile_convert</td>
+    <td></td>
+  </tr>
 </table>
 
+
 ### METRICS_JOB_EXCEPTION
 这个 Cube 是用来收集 job exception 指标。细节如下:
 
@@ -412,20 +457,28 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
     <th colspan="2">Dimension</th>
   </tr>
   <tr>
+    <td>HOST</td>
+    <td>the host of server for query engine</td>
+  </tr>
+  <tr>
+    <td>KUSER</td>
+    <td>the user who executes the query</td>
+  </tr>
+  <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>CUBE_NAME</td>
-    <td></td>
+    <td>the cube which the query hits.</td>
   </tr>
   <tr>
     <td>JOB_TYPE</td>
-    <td></td>
+    <td>build, merge or optimize</td>
   </tr>
   <tr>
     <td>CUBING_TYPE</td>
-    <td></td>
+    <td>in kylin,there are two cubing algorithms,Layered & Fast(InMemory)</td>
   </tr>
   <tr>
     <td>EXCEPTION</td>
@@ -433,6 +486,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
   </tr>
 </table>
 
+
 <table>
   <tr>
     <th>Measure</th>
diff --git a/website/_docs/tutorial/setup_systemcube.md b/website/_docs/tutorial/setup_systemcube.md
index c58574e..196aca4 100644
--- a/website/_docs/tutorial/setup_systemcube.md
+++ b/website/_docs/tutorial/setup_systemcube.md
@@ -7,13 +7,23 @@ permalink: /docs/tutorial/setup_systemcube.html
 
 > Available since Apache Kylin v2.3.0
 
-## What is System Cube
+Main content of this section:
+
+- [What is System Cube](#What is System Cube)
+- [How to Set Up System Cube](#How to Set Up System Cube)
+- [Automatically create System Cube](#Automatically create System Cube)
+- [Details of System Cube](#Details of System Cube)
+
+## <span id="What is System Cube">What is System Cube</span>
 
 For better supporting self-monitoring, a set of system Cubes are created under the system project, called "KYLIN_SYSTEM". Currently, there are five Cubes. Three are for query metrics, "METRICS_QUERY", "METRICS_QUERY_CUBE", "METRICS_QUERY_RPC". And the other two are for job metrics, "METRICS_JOB", "METRICS_JOB_EXCEPTION".
 
-## How to Set Up System Cube
+## <span id="How to Set Up System Cube">How to Set Up System Cube</span>
+
+In this section, we will introduce the method of manually enabling the system cube. If you want to automatically enable the system cube through shell scripts, please refer to [Automatically Create System Cube](#What is System Cube).
+
+### 1. Prepare
 
-### Prepare
 Create a configuration file SCSinkTools.json in KYLIN_HOME directory.
 
 For example:
@@ -31,7 +41,7 @@ For example:
 ]
 ```
 
-### 1. Generate Metadata
+### 2. Generate Metadata
 Run the following command in KYLIN_HOME folder to generate related metadata:
 
 ```
@@ -44,7 +54,7 @@ By this command, the related metadata will be generated and its location is unde
 
 ![metadata](/images/SystemCube/metadata.png)
 
-### 2. Set Up Datasource
+### 3. Set Up Datasource
 Running the following command to create source hive tables:
 
 ```
@@ -55,26 +65,26 @@ By this command, the related hive table will be created.
 
 ![hive_table](/images/SystemCube/hive_table.png)
 
-### 3. Upload Metadata for System Cubes
+### 4. Upload Metadata for System Cubes
 Then we need to upload metadata to hbase by the following command:
 
 ```
 ./bin/metastore.sh restore <output_folder>
 ```
 
-### 4. Reload Metadata
+### 5. Reload Metadata
 Finally, we need to reload metadata in Kylin web UI.
 
 
 Then, a set of system Cubes will be created under the system project, called "KYLIN_SYSTEM".
 
 
-### 5. System Cube build
+### 6. System Cube build
 When the system Cube is created, we need to build the Cube regularly.
 
-1. Create a shell script that builds the system Cube by calling org.apache.kylin.tool.job.CubeBuildingCLI
-  
-	For example:
+**Step 1**. Create a shell script that builds the system Cube by calling `org.apache.kylin.tool.job.CubeBuildingCLI`
+
+For example:
 
 {% highlight Groff markup %}
 #!/bin/bash
@@ -96,9 +106,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
 
 {% endhighlight %}
 
-2. Then run this shell script regularly
-
-	For example, add a cron job as follows:
+**Step 2**. Then run this shell script regularly. For example, add a cron job as follows:
 
 {% highlight Groff markup %}
 0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000
@@ -113,7 +121,7 @@ sh ${KYLIN_HOME}/bin/kylin.sh org.apache.kylin.tool.job.CubeBuildingCLI --cube $
 
 {% endhighlight %}
 
-## Automatically create System Cube
+## <span id="Automatically create System Cube">Automatically create System Cube</span>
 
 Kylin provides system-cube.sh from v2.6.0, users can automatically create system cube by executing this script.
 
@@ -123,7 +131,7 @@ Kylin provides system-cube.sh from v2.6.0, users can automatically create system
 
 - Add crontab job for System Cube:`bin/system.sh cron`
 
-## Details of System Cube
+## <span id="Details of System Cube">Details of System Cube</span>
 
 ### Common Dimension
 For all of these Cube, admins can query at four time granularities. From higher level to lower, it's as follows:
@@ -159,12 +167,16 @@ This Cube is for collecting query metrics at the highest level. The details are
     <td>the host of server for query engine</td>
   </tr>
   <tr>
+    <td>KUSER</td>
+    <td>the user who executes the query</td>
+  </tr>
+  <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>REALIZATION</td>
-    <td>in Kylin, there are two OLAP realizations: Cube, or Hybrid of Cubes</td>
+    <td>the cube which the query hits. In Kylin,there are two OLAP realizations: Cube,or Hybrid of Cubes</td>
   </tr>
   <tr>
     <td>REALIZATION_TYPE</td>
@@ -172,11 +184,11 @@ This Cube is for collecting query metrics at the highest level. The details are
   </tr>
   <tr>
     <td>QUERY_TYPE</td>
-    <td>users can query on different data sources, CACHE, OLAP, LOOKUP_TABLE, HIVE</td>
+    <td>users can query on different data sources,CACHE,OLAP,LOOKUP_TABLE,HIVE</td>
   </tr>
   <tr>
     <td>EXCEPTION</td>
-    <td>when doing query, exceptions may happen. It's for classifying different exception types</td>
+    <td>when doing query,exceptions may happen. It's for classifying different exception types</td>
   </tr>
 </table>
 
@@ -189,19 +201,19 @@ This Cube is for collecting query metrics at the highest level. The details are
     <td></td>
   </tr>
   <tr>
-    <td>MIN, MAX, SUM of QUERY_TIME_COST</td>
+    <td>MIN,MAX,SUM,PERCENTILE_APPROX of QUERY_TIME_COST</td>
     <td>the time cost for the whole query</td>
   </tr>
   <tr>
-    <td>MAX, SUM of CALCITE_SIZE_RETURN</td>
+    <td>MAX,SUM of CALCITE_SIZE_RETURN</td>
     <td>the row count of the result Calcite returns</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_SIZE_RETURN</td>
+    <td>MAX,SUM of STORAGE_SIZE_RETURN</td>
     <td>the row count of the input to Calcite</td>
   </tr>
   <tr>
-    <td>MAX, SUM of CALCITE_SIZE_AGGREGATE_FILTER</td>
+    <td>MAX,SUM of CALCITE_SIZE_AGGREGATE_FILTER</td>
     <td>the row count of Calcite aggregates and filters</td>
   </tr>
   <tr>
@@ -210,6 +222,7 @@ This Cube is for collecting query metrics at the highest level. The details are
   </tr>
 </table>
 
+
 ### METRICS_QUERY_RPC
 This Cube is for collecting query metrics at the lowest level. For a query, the related aggregation and filter can be pushed down to each rpc target server. The robustness of rpc target servers is the foundation for better serving queries. The details are as follows:
 
@@ -223,11 +236,11 @@ This Cube is for collecting query metrics at the lowest level. For a query, the
   </tr>
   <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>REALIZATION</td>
-    <td></td>
+    <td>the cube which the query hits.</td>
   </tr>
   <tr>
     <td>RPC_SERVER</td>
@@ -235,7 +248,7 @@ This Cube is for collecting query metrics at the lowest level. For a query, the
   </tr>
   <tr>
     <td>EXCEPTION</td>
-    <td>the exception of a rpc call. If no exception, "NULL" is used</td>
+    <td>the exception of a rpc call. If no exception,"NULL" is used</td>
   </tr>
 </table>
 
@@ -248,31 +261,32 @@ This Cube is for collecting query metrics at the lowest level. For a query, the
     <td></td>
   </tr>
   <tr>
-    <td>MAX, SUM of CALL_TIME</td>
+    <td>MAX,SUM,PERCENTILE_APPROX of CALL_TIME</td>
     <td>the time cost of a rpc all</td>
   </tr>
   <tr>
-    <td>MAX, SUM of COUNT_SKIP</td>
-    <td>based on fuzzy filters or else, a few rows will be skipped. This indicates the skipped row count</td>
+    <td>MAX,SUM of COUNT_SKIP</td>
+    <td>based on fuzzy filters or else,a few rows will be skiped. This indicates the skipped row count</td>
   </tr>
   <tr>
-    <td>MAX, SUM of SIZE_SCAN</td>
+    <td>MAX,SUM of SIZE_SCAN</td>
     <td>the row count actually scanned</td>
   </tr>
   <tr>
-    <td>MAX, SUM of SIZE_RETURN</td>
+    <td>MAX,SUM of SIZE_RETURN</td>
     <td>the row count actually returned</td>
   </tr>
   <tr>
-    <td>MAX, SUM of SIZE_AGGREGATE</td>
+    <td>MAX,SUM of SIZE_AGGREGATE</td>
     <td>the row count actually aggregated</td>
   </tr>
   <tr>
-    <td>MAX, SUM of SIZE_AGGREGATE_FILTER</td>
-    <td>the row count actually aggregated and filtered, = SIZE_SCAN - SIZE_RETURN</td>
+    <td>MAX,SUM of SIZE_AGGREGATE_FILTER</td>
+    <td>the row count actually aggregated and filtered,= SIZE_SCAN - SIZE_RETURN</td>
   </tr>
 </table>
 
+
 ### METRICS_QUERY_CUBE
 This Cube is for collecting query metrics at the Cube level. The most important are cuboids related, which will serve for Cube planner. The details are as follows:
 
@@ -285,12 +299,16 @@ This Cube is for collecting query metrics at the Cube level. The most important
     <td></td>
   </tr>
   <tr>
+    <td>SEGMENT_NAME</td>
+    <td></td>
+  </tr>
+  <tr>
     <td>CUBOID_SOURCE</td>
     <td>source cuboid parsed based on query and Cube design</td>
   </tr>
   <tr>
     <td>CUBOID_TARGET</td>
-    <td>target cuboid already pre-calculated and served for source cuboid</td>
+    <td>target cuboid already precalculated and served for source cuboid</td>
   </tr>
   <tr>
     <td>IF_MATCH</td>
@@ -311,39 +329,44 @@ This Cube is for collecting query metrics at the Cube level. The most important
     <td></td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_CALL_COUNT</td>
+    <td>WEIGHT_PER_HIT</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of STORAGE_CALL_COUNT</td>
     <td>the number of rpc calls for a query hit on this Cube</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_CALL_TIME_SUM</td>
+    <td>MAX,SUM of STORAGE_CALL_TIME_SUM</td>
     <td>sum of time cost for the rpc calls of a query</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_CALL_TIME_MAX</td>
+    <td>MAX,SUM of STORAGE_CALL_TIME_MAX</td>
     <td>max of time cost among the rpc calls of a query</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_COUNT_SKIP</td>
+    <td>MAX,SUM of STORAGE_COUNT_SKIP</td>
     <td>the sum of row count skipped for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_SIZE_SCAN</td>
+    <td>MAX,SUM of STORAGE_COUNT_SCAN</td>
     <td>the sum of row count scanned for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_SIZE_RETURN</td>
+    <td>MAX,SUM of STORAGE_COUNT_RETURN</td>
     <td>the sum of row count returned for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_SIZE_AGGREGATE</td>
+    <td>MAX,SUM of STORAGE_COUNT_AGGREGATE</td>
     <td>the sum of row count aggregated for the related rpc calls</td>
   </tr>
   <tr>
-    <td>MAX, SUM of STORAGE_SIZE_AGGREGATE_FILTER</td>
-    <td>the sum of row count aggregated and filtered for the related rpc calls, = STORAGE_SIZE_SCAN - STORAGE_SIZE_RETURN</td>
+    <td>MAX,SUM of STORAGE_COUNT_AGGREGATE_FILTER</td>
+    <td>the sum of row count aggregated and filtered for the related rpc calls,= STORAGE_SIZE_SCAN - STORAGE_SIZE_RETURN</td>
   </tr>
 </table>
 
+
 ### METRICS_JOB
 In Kylin, there are mainly three types of job:
 - "BUILD", for building Cube segments from **HIVE**.
@@ -357,20 +380,28 @@ This Cube is for collecting job metrics. The details are as follows:
     <th colspan="2">Dimension</th>
   </tr>
   <tr>
+    <td>HOST</td>
+    <td>the host of server for query engine</td>
+  </tr>
+  <tr>
+    <td>KUSER</td>
+    <td>the user who executes the query</td>
+  </tr>
+  <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>CUBE_NAME</td>
-    <td></td>
+    <td>the cube which the query hits.</td>
   </tr>
   <tr>
     <td>JOB_TYPE</td>
-    <td></td>
+    <td>build, merge or optimize</td>
   </tr>
   <tr>
     <td>CUBING_TYPE</td>
-    <td>in kylin, there are two cubing algorithms, Layered & Fast(InMemory)</td>
+    <td>in kylin,there are two cubing algorithms,Layered & Fast(InMemory)</td>
   </tr>
 </table>
 
@@ -383,27 +414,44 @@ This Cube is for collecting job metrics. The details are as follows:
     <td></td>
   </tr>
   <tr>
-    <td>MIN, MAX, SUM of DURATION</td>
+    <td>MIN,MAX,SUM,PERCENTILE_APPROX of DURATION</td>
     <td>the duration from a job start to finish</td>
   </tr>
   <tr>
-    <td>MIN, MAX, SUM of TABLE_SIZE</td>
+    <td>MIN,MAX,SUM of TABLE_SIZE</td>
     <td>the size of data source in bytes</td>
   </tr>
   <tr>
-    <td>MIN, MAX, SUM of CUBE_SIZE</td>
+    <td>MIN,MAX,SUM of CUBE_SIZE</td>
     <td>the size of created Cube segment in bytes</td>
   </tr>
   <tr>
-    <td>MIN, MAX, SUM of PER_BYTES_TIME_COST</td>
+    <td>MIN,MAX,SUM of PER_BYTES_TIME_COST</td>
     <td>= DURATION / TABLE_SIZE</td>
   </tr>
   <tr>
-    <td>MIN, MAX, SUM of WAIT_RESOURCE_TIME</td>
-    <td>a job may includes several MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.</td>
+    <td>MIN,MAX,SUM of WAIT_RESOURCE_TIME</td>
+    <td>a job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.</td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_distinct_columns</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_dictionary</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_inmem_cubing</td>
+    <td></td>
+  </tr>
+  <tr>
+    <td>MAX,SUM of step_duration_hfile_convert</td>
+    <td></td>
   </tr>
 </table>
 
+
 ### METRICS_JOB_EXCEPTION
 This Cube is for collecting job exception metrics. The details are as follows:
 
@@ -412,27 +460,36 @@ This Cube is for collecting job exception metrics. The details are as follows:
     <th colspan="2">Dimension</th>
   </tr>
   <tr>
+    <td>HOST</td>
+    <td>the host of server for query engine</td>
+  </tr>
+  <tr>
+    <td>KUSER</td>
+    <td>the user who executes the query</td>
+  </tr>
+  <tr>
     <td>PROJECT</td>
-    <td></td>
+    <td>the project where the query executes</td>
   </tr>
   <tr>
     <td>CUBE_NAME</td>
-    <td></td>
+    <td>the cube which the query hits.</td>
   </tr>
   <tr>
     <td>JOB_TYPE</td>
-    <td></td>
+    <td>build, merge or optimize</td>
   </tr>
   <tr>
     <td>CUBING_TYPE</td>
-    <td></td>
+    <td>in kylin,there are two cubing algorithms,Layered & Fast(InMemory)</td>
   </tr>
   <tr>
     <td>EXCEPTION</td>
-    <td>when running a job, exceptions may happen. It's for classifying different exception types</td>
+    <td>when running a job,exceptions may happen. It's for classifying different exception types</td>
   </tr>
 </table>
 
+
 <table>
   <tr>
     <th>Measure</th>