You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by bi...@apache.org on 2018/01/29 04:14:50 UTC
[2/6] kylin git commit: KYLIN-3202 update doc directory for Kylin 2.3.0

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/Qlik.cn.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/Qlik.cn.md b/website/_docs23/tutorial/Qlik.cn.md
new file mode 100644
index 0000000..796d474
--- /dev/null
+++ b/website/_docs23/tutorial/Qlik.cn.md
@@ -0,0 +1,153 @@
+---
+layout: docs23-cn
+title:  与Qlik Sense集成
+categories: tutorial
+permalink: /cn/docs23/tutorial/Qlik.html
+since: v2.2
+---
+
+Qlik Sense 是新一代自助式数据可视化工具。它是一款完整的商业分析软件，便于开发人员和分析人员快速构建和部署强大的分析应用。近年来，该工具成为全球增长率最快的 BI 产品。它可以与 Hadoop Database（Hive 和 Impala）集成。现在也可与 Apache Kylin 集成。本文将分步指导您完成 Apache Kylin 与 Qlik Sense 的连接。 
+
+### 安装 Kylin ODBC 驱动程序
+
+有关安装信息，参考页面 [Kylin ODBC 驱动](http://kylin.apache.org/cn/docs23/tutorial/odbc.html).
+
+### 安装 Qlik Sense
+
+有关 Olik Sense 的安装说明，请访问 [Qlik Sense Desktop download](https://www.qlik.com/us/try-or-buy/download-qlik-sense).
+
+### 与 Qlik Sense 连接
+
+配置完本地 DSN 并成功安装 Qlik Sense 后，可执行以下步骤来用 Qlik Sense 连接 Apache Kylin：
+
+- 打开 **Qlik Sense Desktop**.
+
+
+- 输入 Qlik 用户名和密码，接着系统将弹出以下对话框。单击**创建新应用程序**.
+
+![](/images/tutorial/2.1/Qlik/welcome_to_qlik_desktop.png)
+
+- 为新建的应用程序指定名称. 
+
+![](/images/tutorial/2.1/Qlik/create_new_application.png)
+
+- 应用程序视图中有两个选项，选择下方的**脚本编辑器**。
+
+![](/images/tutorial/2.1/Qlik/script_editor.png)
+
+- 此时会显示 **数据加载编辑器**的窗口。单击页面右上方的**创建新连接**并选择**ODBC**。
+
+![Create New Data Connection](/images/tutorial/2.1/Qlik/create_data_connection.png)
+
+- 选择你创建的**DSN**，忽略账户信息，点击**创建**。
+
+![ODBC Connection](/images/tutorial/2.1/Qlik/odbc_connection.png)
+
+### 配置Direct Query连接模式
+修改默认的脚本中的"TimeFormat", "DateFormat" and "TimestampFormat" 为
+
+`SET TimeFormat='h:mm:ss';`
+`SET DateFormat='YYYY-MM-DD';`
+`SET TimestampFormat='YYYY-MM-DD h:mm:ss[.fff]';`
+
+考虑到kylin环境中的Cube的数据量级通常都很大，可达到PB级。我们推荐用户使用Qlik sense的Direct Query连接模式，而不要将数据导入到Qlik sense中。
+
+你可以在脚本的连接中打入`Direct Query`来启用Direct Query连接模式。
+
+下面的截图展现了一个连接了 *Learn_kylin* 项目中的 *kylin_sales_cube* 的Direct Query的脚本。
+
+![Script](/images/tutorial/2.1/Qlik/script_run_result.png) 
+
+Qlik sense会基于你定义的这个脚本在报表中相应的生成SQL查询。
+
+我们推荐用户将Kylin Cube上定义的维度和度量相应的定义到脚本中的维度和度量中。
+
+你也可以使用Native表达式来使用Apache Kylin内置函数，例如：
+
+`NATIVE('extract(month from PART_DT)') ` 
+
+完整的脚本提供在下方以供参考。
+
+请确保将脚本中`LIB CONNECT TO 'kylin';` 部分引用的DSN进行相应的修改。 
+
+```SQL
+SET ThousandSep=',';
+SET DecimalSep='.';
+SET MoneyThousandSep=',';
+SET MoneyDecimalSep='.';
+SET MoneyFormat='$#,##0.00;-$#,##0.00';
+SET TimeFormat='h:mm:ss';
+SET DateFormat='YYYY/MM/DD';
+SET TimestampFormat='YYYY/MM/DD h:mm:ss[.fff]';
+SET FirstWeekDay=6;
+SET BrokenWeeks=1;
+SET ReferenceDay=0;
+SET FirstMonthOfYear=1;
+SET CollationLocale='en-US';
+SET CreateSearchIndexOnReload=1;
+SET MonthNames='Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec';
+SET LongMonthNames='January;February;March;April;May;June;July;August;September;October;November;December';
+SET DayNames='Mon;Tue;Wed;Thu;Fri;Sat;Sun';
+SET LongDayNames='Monday;Tuesday;Wednesday;Thursday;Friday;Saturday;Sunday';
+
+LIB CONNECT TO 'kylin';
+
+
+DIRECT QUERY
+DIMENSION 
+  TRANS_ID,
+  YEAR_BEG_DT,
+  MONTH_BEG_DT,
+  WEEK_BEG_DT,
+  PART_DT,
+  LSTG_FORMAT_NAME,
+  OPS_USER_ID,
+  OPS_REGION,
+  NATIVE('extract(month from PART_DT)') AS PART_MONTH,
+   NATIVE('extract(year from PART_DT)') AS PART_YEAR,
+  META_CATEG_NAME,
+  CATEG_LVL2_NAME,
+  CATEG_LVL3_NAME,
+  ACCOUNT_BUYER_LEVEL,
+  NAME
+MEASURE
+	ITEM_COUNT,
+    PRICE,
+    SELLER_ID
+FROM KYLIN_SALES 
+join KYLIN_CATEGORY_GROUPINGS  
+on( SITE_ID=LSTG_SITE_ID 
+and KYLIN_SALES.LEAF_CATEG_ID=KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID)
+join KYLIN_CAL_DT
+on (KYLIN_CAL_DT.CAL_DT=KYLIN_SALES.PART_DT)
+join KYLIN_ACCOUNT 
+on (KYLIN_ACCOUNT.ACCOUNT_ID=KYLIN_SALES.BUYER_ID)
+JOIN KYLIN_COUNTRY
+on (KYLIN_COUNTRY.COUNTRY=KYLIN_ACCOUNT.ACCOUNT_COUNTRY)
+```
+
+点击窗口右上方的**加载数据**，Qlik sense会根据脚本来生成探测查询以检查脚本的语法。
+
+![Load Data](/images/tutorial/2.1/Qlik/load_data.png)
+
+### 创建报表
+
+点击左上角的**应用程序视图**。
+
+![Open App Overview](/images/tutorial/2.1/Qlik/go_to_app_overview.png)
+
+点击**创建新工作表**。
+
+![Create new sheet](/images/tutorial/2.1/Qlik/create_new_report.png)
+
+选择一个图标类型，将维度和度量根据需要添加到图表上。
+
+![Select the required charts, dimension and measure](/images/tutorial/2.1/Qlik/add_dimension.png)
+
+图表返回了结果，说明连接Apache Kylin成功。
+
+现在你可以使用Qlik sense分析Apache Kylin中的数据了。
+
+![View data in Qlik Sense](/images/tutorial/2.1/Qlik/report.png)
+
+请注意如果你希望你的报表可以击中Cube，你在Qlik sense中定义的度量需要和Cube上定义的一致。比如，为了击中Learn_kylin项目的 *Kylin_sales_cube* 我们在本例中使用`sum(price)`。

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/Qlik.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/Qlik.md b/website/_docs23/tutorial/Qlik.md
new file mode 100644
index 0000000..6419490
--- /dev/null
+++ b/website/_docs23/tutorial/Qlik.md
@@ -0,0 +1,156 @@
+---
+layout: docs23
+title: Qlik Sense
+categories: tutorial
+permalink: /docs23/tutorial/Qlik.html
+---
+
+Qlik Sense delivers intuitive platform solutions for self-service data visualization, guided analytics applications, embedded analytics, and reporting. It is a new player in the Business Intelligence (BI) tools world, with a high growth since 2013. It has connectors with Hadoop Database (Hive and Impala). Now it can be integrated with Apache Kylin. This article will guide you to connect Apache Kylin with Qlik Sense.  
+
+### Install Kylin ODBC Driver
+
+For the installation information, please refer to [Kylin ODBC Driver](http://kylin.apache.org/docs23/tutorial/odbc.html).
+
+### Install Qlik Sense
+
+For the installation of Qlik Sense, please visit [Qlik Sense Desktop download](https://www.qlik.com/us/try-or-buy/download-qlik-sense).
+
+### Connection with Qlik Sense
+
+After configuring your Local DSN and installing Qlik Sense successfully, you may go through the following steps to connect Apache Kylin with Qlik Sense.
+
+- Open **Qlik Sense Desktop**.
+
+
+
+- Input your Qlik account to log in, then the following dialog will pop up. Click **Create New Application**.
+
+![Create New Application](../../images/tutorial/2.1/Qlik/welcome_to_qlik_desktop.png)
+
+- Specify a name for the new app. 
+
+
+![Specify a unique name](../../images/tutorial/2.1/Qlik/create_new_application.png)
+
+- There are two choices in the Application View. Please select the bottom **Script Editor**.
+
+
+![Select Script Editor](../../images/tutorial/2.1/Qlik/script_editor.png)
+
+- The Data Load Editor window shows. Click **Create New Connection** and choose **ODBC**.
+
+
+![Create New Data Connection](../../images/tutorial/2.1/Qlik/create_data_connection.png)
+
+- Select **DSN** you have created, ignore the account information and then click **Create**. 
+
+
+![ODBC Connection](../../images/tutorial/2.1/Qlik/odbc_connection.png)
+
+### Configure Direct Query mode
+Change the default scripts of "TimeFormat", "DateFormat" and "TimestampFormat" to:
+
+`SET TimeFormat='h:mm:ss';`
+`SET DateFormat='YYYY-MM-DD';`
+`SET TimestampFormat='YYYY-MM-DD h:mm:ss[.fff]';`
+
+
+Given the Peta-byte scale Cube size in a usual Apache Kylin environment, we recommend user to use Direct Query mode in Qlik Sense and avoid importing data into Qlik Sense.
+
+You are able to enable Direct Query mode by typing `Direct Query` in front of your query script in Script editor.
+
+Below is the screenshot of such Direct Query script against *kylin_sales_cube* in *Learn_kylin* project. 
+
+![Script](../../images/tutorial/2.1/Qlik/script_run_result.png)
+
+Once you defined such script, Qlik sense can generate SQL based on this script for your report.
+
+It is recommended that you define Dimension and Measure corresponding to the Dimension and Measure in the Kylin Cube.  
+
+You may also be able to utilize Apache Kylin built-in functions by creating a Native expression, for example: 
+
+`NATIVE('extract(month from PART_DT)') ` 
+
+The whole script has been posted for your reference. 
+
+Make sure to update `LIB CONNECT TO 'kylin';` to the DSN you created. 
+
+```SQL
+SET ThousandSep=',';
+SET DecimalSep='.';
+SET MoneyThousandSep=',';
+SET MoneyDecimalSep='.';
+SET MoneyFormat='$#,##0.00;-$#,##0.00';
+SET TimeFormat='h:mm:ss';
+SET DateFormat='YYYY/MM/DD';
+SET TimestampFormat='YYYY/MM/DD h:mm:ss[.fff]';
+SET FirstWeekDay=6;
+SET BrokenWeeks=1;
+SET ReferenceDay=0;
+SET FirstMonthOfYear=1;
+SET CollationLocale='en-US';
+SET CreateSearchIndexOnReload=1;
+SET MonthNames='Jan;Feb;Mar;Apr;May;Jun;Jul;Aug;Sep;Oct;Nov;Dec';
+SET LongMonthNames='January;February;March;April;May;June;July;August;September;October;November;December';
+SET DayNames='Mon;Tue;Wed;Thu;Fri;Sat;Sun';
+SET LongDayNames='Monday;Tuesday;Wednesday;Thursday;Friday;Saturday;Sunday';
+
+LIB CONNECT TO 'kylin';
+
+
+DIRECT QUERY
+DIMENSION 
+  TRANS_ID,
+  YEAR_BEG_DT,
+  MONTH_BEG_DT,
+  WEEK_BEG_DT,
+  PART_DT,
+  LSTG_FORMAT_NAME,
+  OPS_USER_ID,
+  OPS_REGION,
+  NATIVE('extract(month from PART_DT)') AS PART_MONTH,
+   NATIVE('extract(year from PART_DT)') AS PART_YEAR,
+  META_CATEG_NAME,
+  CATEG_LVL2_NAME,
+  CATEG_LVL3_NAME,
+  ACCOUNT_BUYER_LEVEL,
+  NAME
+MEASURE
+	ITEM_COUNT,
+    PRICE,
+    SELLER_ID
+FROM KYLIN_SALES 
+join KYLIN_CATEGORY_GROUPINGS  
+on( SITE_ID=LSTG_SITE_ID 
+and KYLIN_SALES.LEAF_CATEG_ID=KYLIN_CATEGORY_GROUPINGS.LEAF_CATEG_ID)
+join KYLIN_CAL_DT
+on (KYLIN_CAL_DT.CAL_DT=KYLIN_SALES.PART_DT)
+join KYLIN_ACCOUNT 
+on (KYLIN_ACCOUNT.ACCOUNT_ID=KYLIN_SALES.BUYER_ID)
+JOIN KYLIN_COUNTRY
+on (KYLIN_COUNTRY.COUNTRY=KYLIN_ACCOUNT.ACCOUNT_COUNTRY)
+```
+
+Click **Load Data** on the upper right of the window, Qlik sense will send out inspection query to test the connection based on the script.
+
+![Load Data](../../images/tutorial/2.1/Qlik/load_data.png)
+
+### Create a new report
+
+On the top left menu open **App Overview**.
+
+![Open App Overview](../../images/tutorial/2.1/Qlik/go_to_app_overview.png)
+
+ Click **Create new sheet** on this page.
+
+![Create new sheet](../../images/tutorial/2.1/Qlik/create_new_report.png)
+
+Select the charts you need, then add dimension and measurement based on your requirements. 
+
+![Select the required charts, dimension and measure](../../images/tutorial/2.1/Qlik/add_dimension.png)
+
+You will get your worksheet and the connection is complete. Your Apache Kylin data shows in Qlik Sense now.
+
+![View data in Qlik Sense](../../images/tutorial/2.1/Qlik/report.png)
+
+Please note that if you want the report to hit on Cube, you need to create the measure exactly as those are defined in the Cube. For the case of *Kylin_sales_cube* in Learn_kylin project. We use `sum(price)` as an example. 

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/acl.cn.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/acl.cn.md b/website/_docs23/tutorial/acl.cn.md
new file mode 100644
index 0000000..2042478
--- /dev/null
+++ b/website/_docs23/tutorial/acl.cn.md
@@ -0,0 +1,35 @@
+---
+layout: docs23-cn
+title:  Kylin Cube 权限授予教程
+categories: 教程
+permalink: /cn/docs23/tutorial/acl.html
+version: v1.2
+since: v0.7.1
+---
+
+> 从v2.2.0版本开始，Cube ACL功能已经移除, 请使用[Project level ACL](/docs23/tutorial/project_level_acl.html)进行权限管理。
+
+在`Cubes`页面，双击cube行查看详细信息。在这里我们关注`Access`标签。
+点击`+Grant`按钮进行授权。
+
+![]( /images/Kylin-Cube-Permission-Grant-Tutorial/14 +grant.png)
+
+一个cube有四种不同的权限。将你的鼠标移动到`?`图标查看详细信息。
+
+![]( /images/Kylin-Cube-Permission-Grant-Tutorial/15 grantInfo.png)
+
+授权对象也有两种：`User`和`Role`。`Role`是指一组拥有同样权限的用户。
+
+### 1. 授予用户权限
+* 选择`User`类型，输入你想要授权的用户的用户名并选择相应的权限。
+
+     ![]( /images/Kylin-Cube-Permission-Grant-Tutorial/16 grant-user.png)
+
+* 然后点击`Grant`按钮提交请求。在这一操作成功后，你会在表中看到一个新的表项。你可以选择不同的访问权限来修改用户权限。点击`Revoke`按钮可以删除一个拥有权限的用户。
+
+     ![]( /images/Kylin-Cube-Permission-Grant-Tutorial/16 user-update.png)
+
+### 2. 授予角色权限
+* 选择`Role`类型，通过点击下拉按钮选择你想要授权的一组用户并选择一个权限。
+
+* 然后点击`Grant`按钮提交请求。在这一操作成功后，你会在表中看到一个新的表项。你可以选择不同的访问权限来修改组权限。点击`Revoke`按钮可以删除一个拥有权限的组。

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/acl.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/acl.md b/website/_docs23/tutorial/acl.md
new file mode 100644
index 0000000..0f9a864
--- /dev/null
+++ b/website/_docs23/tutorial/acl.md
@@ -0,0 +1,37 @@
+---
+layout: docs23
+title: Cube Permission (v2.1.x)
+categories: tutorial
+permalink: /docs23/tutorial/acl.html
+since: v0.7.1
+---
+
+```
+Notes:
+Cube ACL is removed since v2.2.0, please use [Project level ACL](/docs23/tutorial/project_level_acl.html) to manager ACL.
+```
+
+In `Cubes` page, double click the cube row to see the detail information. Here we focus on the `Access` tab.
+Click the `+Grant` button to grant permission. 
+
+![](/images/Kylin-Cube-Permission-Grant-Tutorial/14 +grant.png)
+
+There are four different kinds of permissions for a cube. Move your mouse over the `?` icon to see detail information. 
+
+![](/images/Kylin-Cube-Permission-Grant-Tutorial/15 grantInfo.png)
+
+There are also two types of user that a permission can be granted: `User` and `Role`. `Role` means a group of users who have the same role.
+
+### 1. Grant User Permission
+* Select `User` type, enter the username of the user you want to grant and select the related permission. 
+
+     ![](/images/Kylin-Cube-Permission-Grant-Tutorial/16 grant-user.png)
+
+* Then click the `Grant` button to send a request. After the success of this operation, you will see a new table entry show in the table. You can select various permission of access to change the permission of a user. To delete a user with permission, just click the `Revoke` button.
+
+     ![](/images/Kylin-Cube-Permission-Grant-Tutorial/16 user-update.png)
+
+### 2. Grant Role Permission
+* Select `Role` type, choose a group of users that you want to grant by click the drop down button and select a permission.
+
+* Then click the `Grant` button to send a request. After the success of this operation, you will see a new table entry show in the table. You can select various permission of access to change the permission of a group. To delete a group with permission, just click the `Revoke` button.

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/create_cube.cn.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/create_cube.cn.md b/website/_docs23/tutorial/create_cube.cn.md
new file mode 100644
index 0000000..a0a76e9
--- /dev/null
+++ b/website/_docs23/tutorial/create_cube.cn.md
@@ -0,0 +1,129 @@
+---
+layout: docs23-cn
+title:  Kylin Cube 创建教程
+categories: 教程
+permalink: /cn/docs23/tutorial/create_cube.html
+version: v1.2
+since: v0.7.1
+---
+  
+  
+### I. 新建一个项目
+1. 由顶部菜单栏进入`Query`页面，然后点击`Manage Projects`。
+
+   ![](/images/Kylin-Cube-Creation-Tutorial/1 manage-prject.png)
+
+2. 点击`+ Project`按钮添加一个新的项目。
+
+   ![](/images/Kylin-Cube-Creation-Tutorial/2 %2Bproject.png)
+
+3. 填写下列表单并点击`submit`按钮提交请求。
+
+   ![](/images/Kylin-Cube-Creation-Tutorial/3 new-project.png)
+
+4. 成功后，底部会显示通知。
+
+   ![](/images/Kylin-Cube-Creation-Tutorial/3.1 pj-created.png)
+
+### II. 同步一张表
+1. 在顶部菜单栏点击`Tables`，然后点击`+ Sync`按钮加载hive表元数据。
+
+   ![](/images/Kylin-Cube-Creation-Tutorial/4 %2Btable.png)
+
+2. 输入表名并点击`Sync`按钮提交请求。
+
+   ![](/images/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
+
+### III. 新建一个cube
+首先，在顶部菜单栏点击`Cubes`。然后点击`+Cube`按钮进入cube designer页面。
+
+![](/images/Kylin-Cube-Creation-Tutorial/6 %2Bcube.png)
+
+**步骤1. Cube信息**
+
+填写cube基本信息。点击`Next`进入下一步。
+
+你可以使用字母、数字和“_”来为你的cube命名（注意名字中不能使用空格）。
+
+![](/images/Kylin-Cube-Creation-Tutorial/7 cube-info.png)
+
+**步骤2. 维度**
+
+1. 建立事实表。
+
+    ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-factable.png)
+
+2. 点击`+Dimension`按钮添加一个新的维度。
+
+    ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-%2Bdim.png)
+
+3. 可以选择不同类型的维度加入一个cube。我们在这里列出其中一部分供你参考。
+
+    * 从事实表获取维度。
+          ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-typeA.png)
+
+    * 从查找表获取维度。
+        ![]( /images/Kylin-Cube-Creation-Tutorial/8 dim-typeB-1.png)
+
+        ![]( /images/Kylin-Cube-Creation-Tutorial/8 dim-typeB-2.png)
+   
+    * 从有分级结构的查找表获取维度。
+          ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-typeC.png)
+
+    * 从有衍生维度(derived dimensions)的查找表获取维度。
+          ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-typeD.png)
+
+4. 用户可以在保存维度后进行编辑。
+   ![](/images/Kylin-Cube-Creation-Tutorial/8 dim-edit.png)
+
+**步骤3. 度量**
+
+1. 点击`+Measure`按钮添加一个新的度量。
+   ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-%2Bmeas.png)
+
+2. 根据它的表达式共有5种不同类型的度量：`SUM`、`MAX`、`MIN`、`COUNT`和`COUNT_DISTINCT`。请谨慎选择返回类型，它与`COUNT(DISTINCT)`的误差率相关。
+   * SUM
+
+     ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-sum.png)
+
+   * MIN
+
+     ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-min.png)
+
+   * MAX
+
+     ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-max.png)
+
+   * COUNT
+
+     ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-count.png)
+
+   * DISTINCT_COUNT
+
+     ![](/images/Kylin-Cube-Creation-Tutorial/9 meas-distinct.png)
+
+**步骤4. 过滤器**
+
+这一步骤是可选的。你可以使用`SQL`格式添加一些条件过滤器。
+
+![](/images/Kylin-Cube-Creation-Tutorial/10 filter.png)
+
+**步骤5. 更新设置**
+
+这一步骤是为增量构建cube而设计的。
+
+![](/images/Kylin-Cube-Creation-Tutorial/11 refresh-setting1.png)
+
+选择分区类型、分区列和开始日期。
+
+![](/images/Kylin-Cube-Creation-Tutorial/11 refresh-setting2.png)
+
+**步骤6. 高级设置**
+
+![](/images/Kylin-Cube-Creation-Tutorial/12 advanced.png)
+
+**步骤7. 概览 & 保存**
+
+你可以概览你的cube并返回之前的步骤进行修改。点击`Save`按钮完成cube创建。
+
+![](/images/Kylin-Cube-Creation-Tutorial/13 overview.png)

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/create_cube.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/create_cube.md b/website/_docs23/tutorial/create_cube.md
new file mode 100644
index 0000000..31dbf99
--- /dev/null
+++ b/website/_docs23/tutorial/create_cube.md
@@ -0,0 +1,198 @@
+---
+layout: docs23
+title:  Cube Wizard
+categories: tutorial
+permalink: /docs23/tutorial/create_cube.html
+---
+
+This tutorial will guide you to create a cube. It need you have at least 1 sample table in Hive. If you don't have, you can follow this to create some data.
+  
+### I. Create a Project
+1. Go to `Query` page in top menu bar, then click `Manage Projects`.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/1 manage-prject.png)
+
+2. Click the `+ Project` button to add a new project.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/2 +project.png)
+
+3. Enter a project name, e.g, "Tutorial", with a description (optional), then click `submit` button to send the request.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/3 new-project.png)
+
+4. After success, the project will show in the table.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/3.1 pj-created.png)
+
+### II. Sync up Hive Table
+1. Click `Model` in top bar and then click `Data Source` tab in the left part, it lists all the tables loaded into Kylin; click `Load Hive Table` button.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/4 +table.png)
+
+2. Enter the hive table names, separated with commad, and then click `Sync` to send the request.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table.png)
+
+3. [Optional] If you want to browser the hive database to pick tables, click the `Load Hive Table From Tree` button.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/4 +table-tree.png)
+
+4. [Optional] Expand the database node, click to select the table to load, and then click `Sync`.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table-tree.png)
+
+5. A success message will pop up. In the left `Tables` section, the newly loaded table is added. Click the table name will expand the columns.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table-info.png)
+
+6. In the background, Kylin will run a MapReduce job to calculate the approximate cardinality for the newly synced table. After the job be finished, refresh web page and then click the table name, the cardinality will be shown in the table info.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/5 hive-table-cardinality.png)
+
+
+### III. Create Data Model
+Before create a cube, need define a data model. The data model defines the star schema. One data model can be reused in multiple cubes.
+
+1. Click `Model` in top bar, and then click `Models` tab. Click `+New` button, in the drop-down list select `New Model`.
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 +model.png)
+
+2. Enter a name for the model, with an optional description.
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-name.png)
+
+3. In the `Fact Table` box, select the fact table of this data model.
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-fact-table.png)
+
+4. [Optional] Click `Add Lookup Table` button to add a lookup table. Select the table name and join type (inner or left).
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-lookup-table.png)
+
+5. [Optional] Click `New Join Condition` button, select the FK column of fact table in the left, and select the PK column of lookup table in the right side. Repeat this if have more than one join columns.
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-join-condition.png)
+
+6. Click "OK", repeat step 4 and 5 to add more lookup tables if any. After finished, click "Next".
+
+7. The "Dimensions" page allows to select the columns that will be used as dimension in the child cubes. Click the `Columns` cell of a table, in the drop-down list select the column to the list. 
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-dimensions.png)
+
+8. Click "Next" go to the "Measures" page, select the columns that will be used in measure/metrics. The measure column can only from fact table. 
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-measures.png)
+
+9. Click "Next" to the "Settings" page. If the data in fact table increases by day, select the corresponding date column in the `Partition Date Column`, and select the date format, otherwise leave it as blank.
+
+10. [Optional] Select `Cube Size`, which is an indicator on the scale of the cube, by default it is `MEDIUM`.
+
+11. [Optional] If some records want to excluded from the cube, like dirty data, you can input the condition in `Filter`.
+
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-partition-column.png)
+
+12. Click `Save` and then select `Yes` to save the data model. After created, the data model will be shown in the left `Models` list.
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/6 model-created.png)
+
+### IV. Create Cube
+After the data model be created, you can start to create cube. 
+
+Click `Model` in top bar, and then click `Models` tab. Click `+New` button, in the drop-down list select `New Cube`.
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 new-cube.png)
+
+
+**Step 1. Cube Info**
+
+Select the data model, enter the cube name; Click `Next` to enter the next step.
+
+You can use letters, numbers and '_' to name your cube (blank space in name is not allowed). `Notification List` is a list of email addresses which be notified on cube job success/failure.
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 cube-info.png)
+    
+
+**Step 2. Dimensions**
+
+1. Click `Add Dimension`, it popups two option: "Normal" and "Derived": "Normal" is to add a normal independent dimension column, "Derived" is to add a derived dimension column. Read more in [How to optimize cubes](/docs15/howto/howto_optimize_cubes.html).
+
+2. Click "Normal" and then select a dimension column, give it a meaningful name.
+
+    ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 cube-dimension-normal.png)
+    
+3. [Optional] Click "Derived" and then pickup 1 more multiple columns on lookup table, give them a meaningful name.
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 cube-dimension-derived.png)
+
+4. Repeate 2 and 3 to add all dimension columns; you can do this in batch for "Normal" dimension with the button `Auto Generator`. 
+
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/7 cube-dimension-batch.png)
+
+5. Click "Next" after select all dimensions.
+
+**Step 3. Measures**
+
+1. Click the `+Measure` to add a new measure.
+   ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 meas-+meas.png)
+
+2. There are 6 types of measure according to its expression: `SUM`, `MAX`, `MIN`, `COUNT`, `COUNT_DISTINCT` and `TOP_N`. Properly select the return type for `COUNT_DISTINCT` and `TOP_N`, as it will impact on the cube size.
+   * SUM
+
+     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-sum.png)
+
+   * MIN
+
+     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-min.png)
+
+   * MAX
+
+     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-max.png)
+
+   * COUNT
+
+     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-count.png)
+
+   * DISTINCT_COUNT
+   This measure has two implementations: 
+   a) approximate implementation with HyperLogLog, select an acceptable error rate, lower error rate will take more storage.
+   b) precise implementation with bitmap (see limitation in https://issues.apache.org/jira/browse/KYLIN-1186). 
+
+     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-distinct.png)
+
+   Pleaste note: distinct count is a very heavy data type, it is slower to build and query comparing to other measures.
+
+   * TOP_N
+   Approximate TopN measure pre-calculates the top records in each dimension combination, it will provide higher performance in query time than no pre-calculation; Need specify two parameters here: the first is the column will be used as metrics for Top records (aggregated with SUM and then sorted in descending order); the second is the literal ID, represents the record like seller_id;
+
+   Properly select the return type, depends on how many top records to inspect: top 10, top 100 or top 1000. 
+
+     ![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/8 measure-topn.png)
+
+
+**Step 4. Refresh Setting**
+
+This step is designed for incremental cube build. 
+
+`Auto Merge Time Ranges (days)`: merge the small segments into medium and large segment automatically. If you don't want to auto merge, remove the default two ranges.
+
+`Retention Range (days)`: only keep the segment whose data is in past given days in cube, the old segment will be automatically dropped from head; 0 means not enable this feature.
+
+`Partition Start Date`: the start date of this cube.
+
+![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/9 refresh-setting1.png)
+
+**Step 5. Advanced Setting**
+
+`Aggregation Groups`: by default Kylin put all dimensions into one aggregation group; you can create multiple aggregation groups by knowing well about your query patterns. For the concepts of "Mandatory Dimensions", "Hierarchy Dimensions" and "Joint Dimensions", read this blog: [New Aggregation Group](/blog/2016/02/18/new-aggregation-group/)
+
+`Rowkeys`: the rowkeys are composed by the dimension encoded values. "Dictionary" is the default encoding method; If a dimension is not fit with dictionary (e.g., cardinality > 10 million), select "false" and then enter the fixed length for that dimension, usually that is the max. length of that column; if a value is longer than that size it will be truncated. Please note, without dictionary encoding, the cube size might be much bigger.
+
+You can drag & drop a dimension column to adjust its position in rowkey; Put the mandantory dimension at the begining, then followed the dimensions that heavily involved in filters (where condition). Put high cardinality dimensions ahead of low cardinality dimensions.
+
+
+**Step 6. Overview & Save**
+
+You can overview your cube and go back to previous step to modify it. Click the `Save` button to complete the cube creation.
+
+![]( /images/tutorial/1.5/Kylin-Cube-Creation-Tutorial/10 overview.png)
+
+Cheers! now the cube is created, you can go ahead to build and play it.

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/cube_build_job.cn.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/cube_build_job.cn.md b/website/_docs23/tutorial/cube_build_job.cn.md
new file mode 100644
index 0000000..e00e624
--- /dev/null
+++ b/website/_docs23/tutorial/cube_build_job.cn.md
@@ -0,0 +1,66 @@
+---
+layout: docs23-cn
+title:  Kylin Cube 建立和Job监控教程
+categories: 教程
+permalink: /cn/docs23/tutorial/cube_build_job.html
+version: v1.2
+since: v0.7.1
+---
+
+### Cube建立
+首先，确认你拥有你想要建立的cube的权限。
+
+1. 在`Cubes`页面中，点击cube栏右侧的`Action`下拉按钮并选择`Build`操作。
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/1 action-build.png)
+
+2. 选择后会出现一个弹出窗口。
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/2 pop-up.png)
+
+3. 点击`END DATE`输入框选择增量构建这个cube的结束日期。
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/3 end-date.png)
+
+4. 点击`Submit`提交请求。
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/4 submit.png)
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/4.1 success.png)
+
+   提交请求成功后，你将会看到`Jobs`页面新建了job。
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/5 jobs-page.png)
+
+5. 如要放弃这个job，点击`Discard`按钮。
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/6 discard.png)
+
+### Job监控
+在`Jobs`页面，点击job详情按钮查看显示于右侧的详细信息。
+
+![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/7 job-steps.png)
+
+job详细信息为跟踪一个job提供了它的每一步记录。你可以将光标停放在一个步骤状态图标上查看基本状态和信息。
+
+![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/8 hover-step.png)
+
+点击每个步骤显示的图标按钮查看详情：`Parameters`、`Log`、`MRJob`、`EagleMonitoring`。
+
+* Parameters
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 parameters.png)
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 parameters-d.png)
+
+* Log
+        
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 log.png)
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 log-d.png)
+
+* MRJob(MapReduce Job)
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 mrjob.png)
+
+   ![]( /images/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 mrjob-d.png)

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/cube_build_job.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/cube_build_job.md b/website/_docs23/tutorial/cube_build_job.md
new file mode 100644
index 0000000..4561a60
--- /dev/null
+++ b/website/_docs23/tutorial/cube_build_job.md
@@ -0,0 +1,67 @@
+---
+layout: docs23
+title:  Cube Build and Job Monitoring
+categories: tutorial
+permalink: /docs23/tutorial/cube_build_job.html
+---
+
+### Cube Build
+First of all, make sure that you have authority of the cube you want to build.
+
+1. In `Models` page, click the `Action` drop down button in the right of a cube column and select operation `Build`.
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/1 action-build.png)
+
+2. There is a pop-up window after the selection, click `END DATE` input box to select end date of this incremental cube build.
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/3 end-date.png)
+
+4. Click `Submit` to send the build request. After success, you will see the new job in the `Monitor` page.
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/4 jobs-page.png)
+
+5. The new job is in "pending" status; after a while, it will be started to run and you will see the progress by refresh the web page or click the refresh button.
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/5 job-progress.png)
+
+
+6. Wait the job to finish. In the between if you want to discard it, click `Actions` -> `Discard` button.
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/6 discard.png)
+
+7. After the job is 100% finished, the cube's status becomes to "Ready", means it is ready to serve SQL queries. In the `Model` tab, find the cube, click cube name to expand the section, in the "HBase" tab, it will list the cube segments. Each segment has a start/end time; Its underlying HBase table information is also listed.
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/10 cube-segment.png)
+
+If you have more source data, repeate the steps above to build them into the cube.
+
+### Job Monitoring
+In the `Monitor` page, click the job detail button to see detail information show in the right side.
+
+![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/7 job-steps.png)
+
+The detail information of a job provides a step-by-step record to trace a job. You can hover a step status icon to see the basic status and information.
+
+![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/8 hover-step.png)
+
+Click the icon buttons showing in each step to see the details: `Parameters`, `Log`, `MRJob`.
+
+* Parameters
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 parameters.png)
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 parameters-d.png)
+
+* Log
+        
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 log.png)
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 log-d.png)
+
+* MRJob(MapReduce Job)
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 mrjob.png)
+
+   ![](/images/tutorial/1.5/Kylin-Cube-Build-and-Job-Monitoring-Tutorial/9 mrjob-d.png)
+
+

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/cube_build_performance.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/cube_build_performance.md b/website/_docs23/tutorial/cube_build_performance.md
new file mode 100755
index 0000000..85ea1e6
--- /dev/null
+++ b/website/_docs23/tutorial/cube_build_performance.md
@@ -0,0 +1,266 @@
+---
+layout: docs23
+title: Cube Build Tuning
+categories: tutorial
+permalink: /docs23/tutorial/cube_build_performance.html
+---
+ *This tutorial is an example step by step about how to optimize build of cube.* 
+ 
+In this scenario we're trying to optimize a very simple Cube, with 1 fact and 1 lookup table (Date Dimension). Before do a real tunning, please get an overall understanding about Cube build process from [Optimize Cube Build](/docs20/howto/howto_optimize_build.html)
+
+![]( /images/tutorial/2.0/cube_build_performance/01.png)
+
+The baseline is:
+
+* One Measure: Balance, calculate always Max, Min and Count
+* All Dim_date (10 items) will be used as dimensions 
+* Input is a Hive CSV external table 
+* Output is a Cube in HBase without compression 
+
+With this configuration, the results are: 13 min to build a cube of 20 Mb  (Cube_01)
+
+### Cube_02: Reduce combinations
+To make the first improvement, use Joint and Hierarchy on Dimensions to reduce the combinations (number of cuboids).
+
+Put together all ID and Text of: Month, Week, Weekday and Quarter using Joint Dimension
+
+![]( /images/tutorial/2.0/cube_build_performance/02.png)
+
+	
+Define Id_date and Year as a Hierarchy Dimension
+
+This reduces the size down to 0.72MB and time to 5 min
+
+[Kylin 2149](https://issues.apache.org/jira/browse/KYLIN-2149), ideally, these Hierarchies can be defined also:
+* Id_weekday > Id_date
+* Id_Month > Id_date
+* Id_Quarter > Id_date
+* Id_week > Id_date
+
+But for now, it impossible to use Joint and Hierarchy together for one dimension.
+
+
+### Cube_03: Compress output
+To make the next improvement, compress HBase Cube with Snappy:
+
+![alt text](/images/tutorial/2.0/cube_build_performance/03.png)
+
+Another option is Gzip:
+
+![alt text](/images/tutorial/2.0/cube_build_performance/04.png)
+
+
+The results of compression output are:
+
+![alt text](/images/tutorial/2.0/cube_build_performance/05.png)
+
+The difference between Snappy and Ggzip in time is less than 1% but in size it is 18%
+
+
+### Cube_04: Compress Hive table
+The time distribution is like this:
+
+![]( /images/tutorial/2.0/cube_build_performance/06.png)
+
+
+Group detailed times by concepts :
+
+![]( /images/tutorial/2.0/cube_build_performance/07.png)
+
+67 % is used to build / process flat table and respect 30% to build the cube
+
+A lot of time is used in the first steps.
+
+This time distribution is typical in a cube with few measures and few dim (or very optimized)
+
+
+Try to use ORC Format and compression on Hive input table (Snappy):
+
+![]( /images/tutorial/2.0/cube_build_performance/08.png)
+
+
+The time in the first three stree steps (Flat Table) has been improved by half.
+
+Other columnar formats can be tested:
+
+![]( /images/tutorial/2.0/cube_build_performance/19.png)
+
+
+* ORC
+* ORC compressed with Snappy
+
+But the results are worse than when using Sequence file.
+
+See comments about this here: [Shaofengshi in MailList](http://apache-kylin.74782.x6.nabble.com/Kylin-Performance-td6713.html#a6767)
+
+The second strep is to redistribute Flat Hive table:
+
+![]( /images/tutorial/2.0/cube_build_performance/20.png)
+
+Is a simple row count, two approximations can be made
+* If it doesn’t need to be accurate, the rows of the fact table can be counted→ this can be performed in parallel with Step 1 (and 99% of the time it will be accurate)
+
+![]( /images/tutorial/2.0/cube_build_performance/21.png)
+
+
+* In the future versions (KYLIN-2165 v2.0), this steps will be implemented using Hive table statistics.
+
+
+
+### Cube_05: Partition Hive table (fail)
+The distribution of rows is:
+
+Table | Rows
+--- | --- 
+Fact Table | 3.900.00 
+Dim Date | 2.100 
+
+And the query (the simplified version) to build the flat table is:
+{% highlight Groff markup %}
+```sql
+SELECT
+,DIM_DATE.X
+,DIM_DATE.y
+,FACT_POSICIONES.BALANCE
+FROM  FACT_POSICIONES  INNER JOIN DIM_DATE 
+	ON  ID_FECHA = .ID_FECHA
+WHERE (ID_DATE >= '2016-12-08' AND ID_DATE < '2016-12-23')
+```
+{% endhighlight %}
+
+The problem here, is that, Hive in only using 1 Map to create Flat Table. It is important to lets go to change this behavior. The solution is to partition DIM and FACT in the same columns
+
+* Option 1: Use id_date as a partition column on Hive table. This has a big problem: the Hive metastore is meant for few a hundred of partitions and not thousands (In [Hive 9452](https://issues.apache.org/jira/browse/HIVE-9452) there is an idea to solve this but it isn’t finished yet)
+* Option 2: Generate a new column for this purpose like Monthslot.
+
+![]( /images/tutorial/2.0/cube_build_performance/09.png)
+
+
+Add the same column to dim and fact tables
+
+Now, upgrade the the data model with this new condition to join tables
+
+![]( /images/tutorial/2.0/cube_build_performance/10.png)
+
+	
+The new query to generate flat table will be similar to:
+{% highlight Groff markup %}
+```sql
+SELECT *
+	FROM  FACT_POSICIONES  **INNER JOIN** DIM_DATE 
+		ON  ID_FECHA = .ID_FECHA    AND  MONTHSLOT=MONTHSLOT
+```
+{% endhighlight %}
+
+Rebuild the new cube with this data model
+
+As a result, the performance has worsened  :( . After tried several attempts, there hasn’t been a solution
+
+![]( /images/tutorial/2.0/cube_build_performance/11.png)
+
+
+The problem is that partitions were not used to generate several Mappers
+
+![]( /images/tutorial/2.0/cube_build_performance/12.png)
+
+	
+(I checked this issue with ShaoFeng Shi. He thinks the problem is that there are few many rows and we are not working with a real Hadoop cluster. See this [tech note](http://kylin.apache.org/docs16/howto/howto_optimize_build.html)).
+	
+
+### Resume of results
+
+![]( /images/tutorial/2.0/cube_build_performance/13.png)
+
+
+The tunning process has been:
+* Hive Input tables compressed
+* HBase Output compressed
+* Apply techniques of reduction of cardinality (Joint, Derived, Hierarchy and Mandatory)
+* Personalize Dim encoder for each Dim and choose the best order of Dim in Row Key
+
+
+
+Now, there are three types of cubes:
+* Cubes with low cardinality in their dimensions (Like cube 4, most of time is usend in flat table steps)
+* Cubes with high cardinality in their dimensions (Like cube 6,most of time is usend on Build cube, the flat table steps are lower than 10%)
+* The third type, ultra high cardinality (UHC) which is outside the scope of this article
+
+
+### Cube 6: Cube with high cardinality Dimensions
+
+![]( /images/tutorial/2.0/cube_build_performance/22.png)
+
+In this case the **72%** of the time is used to build Cube
+
+This step is a MapReduce task, you can see the YARN log of these steps on ![alt text](/images/tutorial/2.0/cube_build_performance/23.png) > ![alt text](/images/tutorial/2.0/cube_build_performance/24.png) 
+
+How can the performance of Map – Reduce be improved? The easy way is to increase the numbers of Mappers and Reduces (= Increase parallelism).
+
+
+![]( /images/tutorial/2.0/cube_build_performance/25.png)
+
+
+**NOTE:** YARN / MapReduce have a lot parameters to configure and adapt to theyour system. The focus here is only on small parts. 
+
+(In my system I can assign 12 – 14 GB and 8 cores to YARN Resources):
+
+* yarn.nodemanager.resource.memory-mb = 15 GB
+* yarn.scheduler.maximum-allocation-mb = 8 GB
+* yarn.nodemanager.resource.cpu-vcores = 8 cores
+With this config our max theoreticaleorical grade of parallelismelist is 8. However, but this has a problem: “Timed out after 3600 secs”
+
+![]( /images/tutorial/2.0/cube_build_performance/26.png)
+
+
+The parameter mapreduce.task.timeout  (1 hour by default) define max time that Application Master (AM) can happen with out ACK of Yarn Container. Once this time passes, AM kill the container and retry the same 4 times (with the same result)
+
+Where is the problem? The problem is that 4 mappers started, but each mapper needed more than 4 GB to finish
+
+* The solution 1: add more RAM to YARN 
+* The solution 2: increase vCores number used in Mapper step to reduce the RAM used
+* The solution 3: you can play with max RAM to YARN by node  (yarn.nodemanager.resource.memory-mb) and experiment with mimin RAM perto container (yarn.scheduler.minimum-allocation-mb). If you increase minimum RAM per container, YARN will reduce the numbers of Mappers     
+
+![]( /images/tutorial/2.0/cube_build_performance/27.png)
+
+
+In the last two cases the results are the same: reduce the level of parallelism ==> 
+* Now we only start 3 mappers start at the same time, the fourth must be wait for a free slot
+* The three first mappers distribute spread the ram among themselves, and as a result they will have enough ram to finish the task
+
+During a normal “Build Cube” step you will see similars messages on YARN log:
+
+![]( /images/tutorial/2.0/cube_build_performance/28.png)
+
+
+If you don’t see this periodically, perhaps you have a bottleneck in the memory.
+
+
+
+### Cube 7: Improve cube response time
+We can try to use different aggregations groups to improve the query performance of some very important Dim or a Dim with high cardinality.
+
+In our case we define 3 Aggregations Groups: 
+1. “Normal cube”
+2. Cube with Date Dim and Currency (as mandatory)
+3. Cube with Date Dim and Carteras_Desc (as mandatory)
+
+![]( /images/tutorial/2.0/cube_build_performance/29.png)
+
+
+![]( /images/tutorial/2.0/cube_build_performance/30.png)
+
+
+![]( /images/tutorial/2.0/cube_build_performance/31.png)
+
+
+
+Compare without / with AGGs:
+
+![]( /images/tutorial/2.0/cube_build_performance/32.png)
+
+
+Now it uses 3% more of time to build the cube and 0.6% of space, but queries by currency or Carteras_Desc will be much faster.
+
+
+
+

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/cube_spark.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/cube_spark.md b/website/_docs23/tutorial/cube_spark.md
new file mode 100644
index 0000000..4ca288e
--- /dev/null
+++ b/website/_docs23/tutorial/cube_spark.md
@@ -0,0 +1,169 @@
+---
+layout: docs23
+title:  Build Cube with Spark
+categories: tutorial
+permalink: /docs23/tutorial/cube_spark.html
+---
+Kylin v2.0 introduces the Spark cube engine, it uses Apache Spark to replace MapReduce in the build cube step; You can check [this blog](/blog/2017/02/23/by-layer-spark-cubing/) for an overall picture. The current document uses the sample cube to demo how to try the new engine.
+
+
+## Preparation
+To finish this tutorial, you need a Hadoop environment which has Kylin v2.1.0 or above installed. Here we will use Hortonworks HDP 2.4 Sandbox VM, the Hadoop components as well as Hive/HBase has already been started. 
+
+## Install Kylin v2.1.0 or above
+
+Download the Kylin v2.1.0 for HBase 1.x from Kylin's download page, and then uncompress the tar ball into */usr/local/* folder:
+
+{% highlight Groff markup %}
+
+wget http://www-us.apache.org/dist/kylin/apache-kylin-2.1.0/apache-kylin-2.1.0-bin-hbase1x.tar.gz -P /tmp
+
+tar -zxvf /tmp/apache-kylin-2.1.0-bin-hbase1x.tar.gz -C /usr/local/
+
+export KYLIN_HOME=/usr/local/apache-kylin-2.1.0-bin-hbase1x
+{% endhighlight %}
+
+## Prepare "kylin.env.hadoop-conf-dir"
+
+To run Spark on Yarn, need specify **HADOOP_CONF_DIR** environment variable, which is the directory that contains the (client side) configuration files for Hadoop. In many Hadoop distributions the directory is "/etc/hadoop/conf"; But Kylin not only need access HDFS, Yarn and Hive, but also HBase, so the default directory might not have all necessary files. In this case, you need create a new directory and then copying or linking those client files (core-site.xml, hdfs-site.xml, yarn-site.xml, hive-site.xml and hbase-site.xml) there. In HDP 2.4, there is a conflict between hive-tez and Spark, so need change the default engine from "tez" to "mr" when copy for Kylin.
+
+{% highlight Groff markup %}
+
+mkdir $KYLIN_HOME/hadoop-conf
+ln -s /etc/hadoop/conf/core-site.xml $KYLIN_HOME/hadoop-conf/core-site.xml 
+ln -s /etc/hadoop/conf/hdfs-site.xml $KYLIN_HOME/hadoop-conf/hdfs-site.xml 
+ln -s /etc/hadoop/conf/yarn-site.xml $KYLIN_HOME/hadoop-conf/yarn-site.xml 
+ln -s /etc/hbase/2.4.0.0-169/0/hbase-site.xml $KYLIN_HOME/hadoop-conf/hbase-site.xml 
+cp /etc/hive/2.4.0.0-169/0/hive-site.xml $KYLIN_HOME/hadoop-conf/hive-site.xml 
+vi $KYLIN_HOME/hadoop-conf/hive-site.xml (change "hive.execution.engine" value from "tez" to "mr")
+
+{% endhighlight %}
+
+Now, let Kylin know this directory with property "kylin.env.hadoop-conf-dir" in kylin.properties:
+
+{% highlight Groff markup %}
+kylin.env.hadoop-conf-dir=/usr/local/apache-kylin-2.1.0-bin-hbase1x/hadoop-conf
+{% endhighlight %}
+
+If this property isn't set, Kylin will use the directory that "hive-site.xml" locates in; while that folder may have no "hbase-site.xml", will get HBase/ZK connection error in Spark.
+
+## Check Spark configuration
+
+Kylin embedes a Spark binary (v2.1.0) in $KYLIN_HOME/spark, all the Spark configurations can be managed in $KYLIN_HOME/conf/kylin.properties with prefix *"kylin.engine.spark-conf."*. These properties will be extracted and applied when runs submit Spark job; E.g, if you configure "kylin.engine.spark-conf.spark.executor.memory=4G", Kylin will use "--conf spark.executor.memory=4G" as parameter when execute "spark-submit".
+
+Before you run Spark cubing, suggest take a look on these configurations and do customization according to your cluster. Below is the default configurations, which is also the minimal config for a sandbox (1 executor with 1GB memory); usually in a normal cluster, need much more executors and each has at least 4GB memory and 2 cores:
+
+{% highlight Groff markup %}
+kylin.engine.spark-conf.spark.master=yarn
+kylin.engine.spark-conf.spark.submit.deployMode=cluster
+kylin.engine.spark-conf.spark.yarn.queue=default
+kylin.engine.spark-conf.spark.executor.memory=1G
+kylin.engine.spark-conf.spark.executor.cores=2
+kylin.engine.spark-conf.spark.executor.instances=1
+kylin.engine.spark-conf.spark.eventLog.enabled=true
+kylin.engine.spark-conf.spark.eventLog.dir=hdfs\:///kylin/spark-history
+kylin.engine.spark-conf.spark.history.fs.logDirectory=hdfs\:///kylin/spark-history
+
+#kylin.engine.spark-conf.spark.io.compression.codec=org.apache.spark.io.SnappyCompressionCodec
+
+## uncomment for HDP
+#kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
+#kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
+#kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
+
+{% endhighlight %}
+
+For running on Hortonworks platform, need specify "hdp.version" as Java options for Yarn containers, so please uncommment the last three lines in kylin.properties. 
+
+Besides, in order to avoid repeatedly uploading Spark jars to Yarn, you can manually do that once, and then configure the jar's HDFS location; Please note, the HDFS location need be full qualified name.
+
+{% highlight Groff markup %}
+jar cv0f spark-libs.jar -C $KYLIN_HOME/spark/jars/ .
+hadoop fs -mkdir -p /kylin/spark/
+hadoop fs -put spark-libs.jar /kylin/spark/
+{% endhighlight %}
+
+After do that, the config in kylin.properties will be:
+{% highlight Groff markup %}
+kylin.engine.spark-conf.spark.yarn.archive=hdfs://sandbox.hortonworks.com:8020/kylin/spark/spark-libs.jar
+kylin.engine.spark-conf.spark.driver.extraJavaOptions=-Dhdp.version=current
+kylin.engine.spark-conf.spark.yarn.am.extraJavaOptions=-Dhdp.version=current
+kylin.engine.spark-conf.spark.executor.extraJavaOptions=-Dhdp.version=current
+{% endhighlight %}
+
+All the "kylin.engine.spark-conf.*" parameters can be overwritten at Cube or Project level, this gives more flexibility to the user.
+
+## Create and modify sample cube
+
+Run the sample.sh to create the sample cube, and then start Kylin server:
+
+{% highlight Groff markup %}
+
+$KYLIN_HOME/bin/sample.sh
+$KYLIN_HOME/bin/kylin.sh start
+
+{% endhighlight %}
+
+After Kylin is started, access Kylin web, edit the "kylin_sales" cube, in the "Advanced Setting" page, change the "Cube Engine" from "MapReduce" to "Spark":
+
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/1_cube_engine.png)
+
+Click "Next" to the "Configuration Overwrites" page, click "+Property" to add property "kylin.engine.spark.rdd-partition-cut-mb" with value "500" (reasons below):
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/2_overwrite_partition.png)
+
+The sample cube has two memory hungry measures: a "COUNT DISTINCT" and a "TOPN(100)"; Their size estimation can be inaccurate when the source data is small: the estimized size is much larger than the real size, that causes much more RDD partitions be splitted, which slows down the build. Here 100 is a more reasonable number for it. Click "Next" and "Save" to save the cube.
+
+
+## Build Cube with Spark
+
+Click "Build", select current date as the build end date. Kylin generates a build job in the "Monitor" page, in which the 7th step is the Spark cubing. The job engine starts to execute the steps in sequence. 
+
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/2_job_with_spark.png)
+
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/3_spark_cubing_step.png)
+
+When Kylin executes this step, you can monitor the status in Yarn resource manager. Click the "Application Master" link will open Spark web UI, it shows the progress of each stage and the detailed information.
+
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/4_job_on_rm.png)
+
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/5_spark_web_gui.png)
+
+
+After all steps be successfully executed, the Cube becomes "Ready" and you can query it as normal.
+
+## Troubleshooting
+
+When getting error, you should check "logs/kylin.log" firstly. There has the full Spark command that Kylin executes, e.g:
+
+{% highlight Groff markup %}
+2017-03-06 14:44:38,574 INFO  [Job 2d5c1178-c6f6-4b50-8937-8e5e3b39227e-306] spark.SparkExecutable:121 : cmd:export HADOOP_CONF_DIR=/usr/local/apache-kylin-2.1.0-bin-hbase1x/hadoop-conf && /usr/local/apache-kylin-2.1.0-bin-hbase1x/spark/bin/spark-submit --class org.apache.kylin.common.util.SparkEntry  --conf spark.executor.instances=1  --conf spark.yarn.queue=default  --conf spark.yarn.am.extraJavaOptions=-Dhdp.version=current  --conf spark.history.fs.logDirectory=hdfs:///kylin/spark-history  --conf spark.driver.extraJavaOptions=-Dhdp.version=current  --conf spark.master=yarn  --conf spark.executor.extraJavaOptions=-Dhdp.version=current  --conf spark.executor.memory=1G  --conf spark.eventLog.enabled=true  --conf spark.eventLog.dir=hdfs:///kylin/spark-history  --conf spark.executor.cores=2  --conf spark.submit.deployMode=cluster --files /etc/hbase/2.4.0.0-169/0/hbase-site.xml --jars /usr/hdp/2.4.0.0-169/hbase/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/2.4.0.0-169/hbase/lib/hbase-c
 lient-1.1.2.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hbase/lib/hbase-common-1.1.2.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hbase/lib/hbase-protocol-1.1.2.2.4.0.0-169.jar,/usr/hdp/2.4.0.0-169/hbase/lib/metrics-core-2.2.0.jar,/usr/hdp/2.4.0.0-169/hbase/lib/guava-12.0.1.jar, /usr/local/apache-kylin-2.1.0-bin-hbase1x/lib/kylin-job-2.1.0.jar -className org.apache.kylin.engine.spark.SparkCubingByLayer -hiveTable kylin_intermediate_kylin_sales_cube_555c4d32_40bb_457d_909a_1bb017bf2d9e -segmentId 555c4d32-40bb-457d-909a-1bb017bf2d9e -confPath /usr/local/apache-kylin-2.1.0-bin-hbase1x/conf -output hdfs:///kylin/kylin_metadata/kylin-2d5c1178-c6f6-4b50-8937-8e5e3b39227e/kylin_sales_cube/cuboid/ -cubename kylin_sales_cube
+
+{% endhighlight %}
+
+You can copy the cmd to execute manually in shell and then tunning the parameters quickly; During the execution, you can access Yarn resource manager to check more. If the job has already finished, you can check the history info in Spark history server. 
+
+By default Kylin outputs the history to "hdfs:///kylin/spark-history", you need start Spark history server on that directory, or change to use your existing Spark history server's event directory in conf/kylin.properties with parameter "kylin.engine.spark-conf.spark.eventLog.dir" and "kylin.engine.spark-conf.spark.history.fs.logDirectory".
+
+The following command will start a Spark history server instance on Kylin's output directory, before run it making sure you have stopped the existing Spark history server in sandbox:
+
+{% highlight Groff markup %}
+$KYLIN_HOME/spark/sbin/start-history-server.sh hdfs://sandbox.hortonworks.com:8020/kylin/spark-history 
+{% endhighlight %}
+
+In web browser, access "http://sandbox:18080" it shows the job history:
+
+   ![](/images/tutorial/2.0/Spark-Cubing-Tutorial/9_spark_history.png)
+
+Click a specific job, there you will see the detail runtime information, that is very helpful for trouble shooting and performance tuning.
+
+## Go further
+
+If you're a Kylin administrator but new to Spark, suggest you go through [Spark documents](https://spark.apache.org/docs/2.1.0/), and don't forget to update the configurations accordingly. You can enable Spark [Dynamic Resource Allocation](https://spark.apache.org/docs/2.1.0/job-scheduling.html#dynamic-resource-allocation) so that it can auto scale/shrink for different work load. Spark's performance relies on Cluster's memory and CPU resource, while Kylin's Cube build is a heavy task when having a complex data model and a huge dataset to build at one time. If your cluster resource couldn't fulfill, errors like "OutOfMemorry" will be thrown in Spark executors, so please use it properly. For Cube which has UHC dimension, many combinations (e.g, a full cube with more than 12 dimensions), or memory hungry measures (Count Distinct, Top-N), suggest to use the MapReduce engine. If your Cube model is simple, all measures are SUM/MIN/MAX/COUNT, source data is small to medium scale, Spark eng
 ine would be a good choice. Besides, Streaming build isn't supported in this engine so far (KYLIN-2484).
+
+If you have any question, comment, or bug fix, welcome to discuss in dev@kylin.apache.org.

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/cube_streaming.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/cube_streaming.md b/website/_docs23/tutorial/cube_streaming.md
new file mode 100644
index 0000000..ef6578e
--- /dev/null
+++ b/website/_docs23/tutorial/cube_streaming.md
@@ -0,0 +1,219 @@
+---
+layout: docs23
+title:  Scalable Cubing from Kafka
+categories: tutorial
+permalink: /docs23/tutorial/cube_streaming.html
+---
+Kylin v1.6 releases the scalable streaming cubing function, it leverages Hadoop to consume the data from Kafka to build the cube, you can check [this blog](/blog/2016/10/18/new-nrt-streaming/) for the high level design. This doc is a step by step tutorial, illustrating how to create and build a sample cube;
+
+## Preparation
+To finish this tutorial, you need a Hadoop environment which has kylin v1.6.0 or above installed, and also have a Kafka (v0.10.0 or above) running; Previous Kylin version has a couple issues so please upgrade your Kylin instance at first.
+
+In this tutorial, we will use Hortonworks HDP 2.2.4 Sandbox VM + Kafka v0.10.0(Scala 2.10) as the environment.
+
+## Install Kafka 0.10.0.0 and Kylin
+Don't use HDP 2.2.4's build-in Kafka as it is too old, stop it first if it is running.
+{% highlight Groff markup %}
+curl -s http://mirrors.tuna.tsinghua.edu.cn/apache/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz | tar -xz -C /usr/local/
+
+cd /usr/local/kafka_2.10-0.10.0.0/
+
+bin/kafka-server-start.sh config/server.properties &
+
+{% endhighlight %}
+
+Download the Kylin v1.6 from download page, expand the tar ball in /usr/local/ folder.
+
+## Create sample Kafka topic and populate data
+
+Create a sample topic "kylindemo", with 3 partitions:
+
+{% highlight Groff markup %}
+
+bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic kylindemo
+Created topic "kylindemo".
+{% endhighlight %}
+
+Put sample data to this topic; Kylin has an utility class which can do this;
+
+{% highlight Groff markup %}
+export KAFKA_HOME=/usr/local/kafka_2.10-0.10.0.0
+export KYLIN_HOME=/usr/local/apache-kylin-1.6.0-bin
+
+cd $KYLIN_HOME
+./bin/kylin.sh org.apache.kylin.source.kafka.util.KafkaSampleProducer --topic kylindemo --broker localhost:9092
+{% endhighlight %}
+
+This tool will send 100 records to Kafka every second. Please keep it running during this tutorial. You can check the sample message with kafka-console-consumer.sh now:
+
+{% highlight Groff markup %}
+cd $KAFKA_HOME
+bin/kafka-console-consumer.sh --zookeeper localhost:2181 --bootstrap-server localhost:9092 --topic kylindemo --from-beginning
+{"amount":63.50375137330458,"category":"TOY","order_time":1477415932581,"device":"Other","qty":4,"user":{"id":"bf249f36-f593-4307-b156-240b3094a1c3","age":21,"gender":"Male"},"currency":"USD","country":"CHINA"}
+{"amount":22.806058795736583,"category":"ELECTRONIC","order_time":1477415932591,"device":"Andriod","qty":1,"user":{"id":"00283efe-027e-4ec1-bbed-c2bbda873f1d","age":27,"gender":"Female"},"currency":"USD","country":"INDIA"}
+
+ {% endhighlight %}
+
+## Define a table from streaming
+Start Kylin server with "$KYLIN_HOME/bin/kylin.sh start", login Kylin Web GUI at http://sandbox:7070/kylin/, select an existing project or create a new project; Click "Model" -> "Data Source", then click the icon "Add Streaming Table";
+
+   ![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/1_Add_streaming_table.png)
+
+In the pop-up dialogue, enter a sample record which you got from the kafka-console-consumer, click the ">>" button, Kylin parses the JSON message and listS all the properties;
+
+You need give a logic table name for this streaming data source; The name will be used for SQL query later; here enter "STREAMING_SALES_TABLE" as an example in the "Table Name" field.
+
+You need select a timestamp field which will be used to identify the time of a message; Kylin can derive other time values like "year_start", "quarter_start" from this time column, which can give your more flexibility on building and querying the cube. Here check "order_time". You can deselect those properties which are not needed for cube. Here let's keep all fields.
+
+Notice that Kylin supports structured (or say "embedded") message from v1.6, it will convert them into a flat table structure. By default use "_" as the separator of the structed properties.
+
+   ![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/2_Define_streaming_table.png)
+
+
+Click "Next". On this page, provide the Kafka cluster information; Enter "kylindemo" as "Topic" name; The cluster has 1 broker, whose host name is "sandbox", port is "9092", click "Save".
+
+   ![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/3_Kafka_setting.png)
+
+In "Advanced setting" section, the "timeout" and "buffer size" are the configurations for connecting with Kafka, keep them. 
+
+In "Parser Setting", by default Kylin assumes your message is JSON format, and each record's timestamp column (specified by "tsColName") is a bigint (epoch time) value; in this case, you just need set the "tsColumn" to "order_time"; 
+
+![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/3_Paser_setting.png)
+
+In real case if the timestamp value is a string valued timestamp like "Jul 20, 2016 9:59:17 AM", you need specify the parser class with "tsParser" and the time pattern with "tsPattern" like this:
+
+
+![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/3_Paser_time.png)
+
+Click "Submit" to save the configurations. Now a "Streaming" table is created.
+
+![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/4_Streaming_table.png)
+
+## Define data model
+With the table defined in previous step, now we can create the data model. The step is almost the same as you create a normal data model, but it has two requirement:
+
+* Streaming Cube doesn't support join with lookup tables; When define the data model, only select fact table, no lookup table;
+* Streaming Cube must be partitioned; If you're going to build the Cube incrementally at minutes level, select "MINUTE_START" as the cube's partition date column. If at hours level, select "HOUR_START".
+
+Here we pick 13 dimension and 2 measure columns:
+
+![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/5_Data_model_dimension.png)
+
+![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/6_Data_model_measure.png)
+Save the data model.
+
+## Create Cube
+
+The streaming Cube is almost the same as a normal cube. a couple of points need get your attention:
+
+* The partition time column should be a dimension of the Cube. In Streaming OLAP the time is always a query condition, and Kylin will leverage this to narrow down the scanned partitions.
+* Don't use "order\_time" as dimension as that is pretty fine-grained; suggest to use "mintue\_start", "hour\_start" or other, depends on how you will inspect the data.
+* Define "year\_start", "quarter\_start", "month\_start", "day\_start", "hour\_start", "minute\_start" as a hierarchy to reduce the combinations to calculate.
+* In the "refersh setting" step, create more merge ranges, like 0.5 hour, 4 hours, 1 day, and then 7 days; This will help to control the cube segment number.
+* In the "rowkeys" section, drag&drop the "minute\_start" to the head position, as for streaming queries, the time condition is always appeared; putting it to head will help to narrow down the scan range.
+
+	![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/8_Cube_dimension.png)
+
+	![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/9_Cube_measure.png)
+
+	![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/10_agg_group.png)
+
+	![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/11_Rowkey.png)
+
+Save the cube.
+
+## Run a build
+
+You can trigger the build from web GUI, by clicking "Actions" -> "Build", or sending a request to Kylin RESTful API with 'curl' command:
+
+{% highlight Groff markup %}
+curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:7070/kylin/api/cubes/{your_cube_name}/build2
+{% endhighlight %}
+
+Please note the API endpoint is different from a normal cube (this URL end with "build2").
+
+Here 0 means from the last position, and 9223372036854775807 (Long.MAX_VALUE) means to the end position on Kafka topic. If it is the first time to build (no previous segment), Kylin will seek to beginning of the topics as the start position. 
+
+In the "Monitor" page, a new job is generated; Wait it 100% finished.
+
+## Click the "Insight" tab, compose a SQL to run, e.g:
+
+ {% highlight Groff markup %}
+select minute_start, count(*), sum(amount), sum(qty) from streaming_sales_table group by minute_start order by minute_start
+ {% endhighlight %}
+
+The result looks like below.
+![](/images/tutorial/1.6/Kylin-Cube-Streaming-Tutorial/13_Query_result.png)
+
+
+## Automate the build
+
+Once the first build and query got successfully, you can schedule incremental builds at a certain frequency. Kylin will record the offsets of each build; when receive a build request, it will start from the last end position, and then seek the latest offsets from Kafka. With the REST API you can trigger it with any scheduler tools like Linux cron:
+
+  {% highlight Groff markup %}
+crontab -e
+*/5 * * * * curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:7070/kylin/api/cubes/{your_cube_name}/build2
+ {% endhighlight %}
+
+Now you can site down and watch the cube be automatically built from streaming. And when the cube segments accumulate to bigger time range, Kylin will automatically merge them into a bigger segment.
+
+## Trouble shootings
+
+ * You may encounter the following error when run "kylin.sh":
+{% highlight Groff markup %}
+Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/clients/producer/Producer
+	at java.lang.Class.getDeclaredMethods0(Native Method)
+	at java.lang.Class.privateGetDeclaredMethods(Class.java:2615)
+	at java.lang.Class.getMethod0(Class.java:2856)
+	at java.lang.Class.getMethod(Class.java:1668)
+	at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
+	at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
+Caused by: java.lang.ClassNotFoundException: org.apache.kafka.clients.producer.Producer
+	at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
+	at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
+	at java.security.AccessController.doPrivileged(Native Method)
+	at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
+	at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
+	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
+	at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
+	... 6 more
+{% endhighlight %}
+
+The reason is Kylin wasn't able to find the proper Kafka client jars; Make sure you have properly set "KAFKA_HOME" environment variable.
+
+ * Get "killed by admin" error in the "Build Cube" step
+
+ Within a Sandbox VM, YARN may not allocate the requested memory resource to MR job as the "inmem" cubing algorithm requests more memory. You can bypass this by requesting less memory: edit "conf/kylin_job_conf_inmem.xml", change the following two parameters like this:
+
+ {% highlight Groff markup %}
+    <property>
+        <name>mapreduce.map.memory.mb</name>
+        <value>1072</value>
+        <description></description>
+    </property>
+
+    <property>
+        <name>mapreduce.map.java.opts</name>
+        <value>-Xmx800m</value>
+        <description></description>
+    </property>
+ {% endhighlight %}
+
+ * If there already be bunch of history messages in Kafka and you don't want to build from the very beginning, you can trigger a call to set the current end position as the start for the cube:
+
+{% highlight Groff markup %}
+curl -X PUT --user ADMIN:KYLIN -H "Content-Type: application/json;charset=utf-8" -d '{ "sourceOffsetStart": 0, "sourceOffsetEnd": 9223372036854775807, "buildType": "BUILD"}' http://localhost:7070/kylin/api/cubes/{your_cube_name}/init_start_offsets
+{% endhighlight %}
+
+ * If some build job got error and you discard it, there will be a hole (or say gap) left in the Cube. Since each time Kylin will build from last position, you couldn't expect the hole be filled by normal builds. Kylin provides API to check and fill the holes 
+
+Check holes:
+ {% highlight Groff markup %}
+curl -X GET --user ADMINN:KYLIN -H "Content-Type: application/json;charset=utf-8" http://localhost:7070/kylin/api/cubes/{your_cube_name}/holes
+{% endhighlight %}
+
+If the result is an empty arrary, means there is no hole; Otherwise, trigger Kylin to fill them:
+ {% highlight Groff markup %}
+curl -X PUT --user ADMINN:KYLIN -H "Content-Type: application/json;charset=utf-8" http://localhost:7070/kylin/api/cubes/{your_cube_name}/holes
+{% endhighlight %}
+

http://git-wip-us.apache.org/repos/asf/kylin/blob/40a53fe3/website/_docs23/tutorial/flink.md
----------------------------------------------------------------------
diff --git a/website/_docs23/tutorial/flink.md b/website/_docs23/tutorial/flink.md
new file mode 100644
index 0000000..fd59c4f
--- /dev/null
+++ b/website/_docs23/tutorial/flink.md
@@ -0,0 +1,249 @@
+---
+layout: docs23
+title:  Apache Flink
+categories: tutorial
+permalink: /docs23/tutorial/flink.html
+---
+
+
+### Introduction
+
+This document describes how to use Kylin as a data source in Apache Flink; 
+
+There were several attempts to do this in Scala and JDBC, but none of them works: 
+
+* [attempt1](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JDBCInputFormat-preparation-with-Flink-1-1-SNAPSHOT-and-Scala-2-11-td5371.html)  
+* [attempt2](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Type-of-TypeVariable-OT-in-class-org-apache-flink-api-common-io-RichInputFormat-could-not-be-determi-td7287.html)  
+* [attempt3](http://stackoverflow.com/questions/36067881/create-dataset-from-jdbc-source-in-flink-using-scala)  
+* [attempt4](https://codegists.com/snippet/scala/jdbcissuescala_zeitgeist_scala); 
+
+We will try use CreateInput and [JDBCInputFormat](https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/batch/index.html) in batch mode and access via JDBC to Kylin. But it isn’t implemented in Scala, is only in Java [MailList](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/jdbc-JDBCInputFormat-td9393.html). This doc will go step by step solving these problems.
+
+### Pre-requisites
+
+* Need an instance of Kylin, with a Cube; [Sample Cube](kylin_sample.html) will be good enough.
+* [Scala](http://www.scala-lang.org/) and [Apache Flink](http://flink.apache.org/) Installed
+* [IntelliJ](https://www.jetbrains.com/idea/) Installed and configured for Scala/Flink (see [Flink IDE setup guide](https://ci.apache.org/projects/flink/flink-docs-release-1.1/internals/ide_setup.html) )
+
+### Used software:
+
+* [Apache Flink](http://flink.apache.org/downloads.html) v1.2-SNAPSHOT
+* [Apache Kylin](http://kylin.apache.org/download/) v1.5.2 (v1.6.0 also works)
+* [IntelliJ](https://www.jetbrains.com/idea/download/#section=linux)  v2016.2
+* [Scala](downloads.lightbend.com/scala/2.11.8/scala-2.11.8.tgz)  v2.11
+
+### Starting point:
+
+This can be out initial skeleton: 
+
+{% highlight Groff markup %}
+import org.apache.flink.api.scala._
+val env = ExecutionEnvironment.getExecutionEnvironment
+val inputFormat = JDBCInputFormat.buildJDBCInputFormat()
+  .setDrivername("org.apache.kylin.jdbc.Driver")
+  .setDBUrl("jdbc:kylin://172.17.0.2:7070/learn_kylin")
+  .setUsername("ADMIN")
+  .setPassword("KYLIN")
+  .setQuery("select count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt")
+  .finish()
+  val dataset =env.createInput(inputFormat)
+{% endhighlight %}
+
+The first error is: ![alt text](/images/Flink-Tutorial/02.png)
+
+Add to Scala: 
+{% highlight Groff markup %}
+import org.apache.flink.api.java.io.jdbc.JDBCInputFormat
+{% endhighlight %}
+
+Next error is  ![alt text](/images/Flink-Tutorial/03.png)
+
+We can solve dependencies [(mvn repository: jdbc)](https://mvnrepository.com/artifact/org.apache.flink/flink-jdbc/1.1.2); Add this to your pom.xml:
+{% highlight Groff markup %}
+<dependency>
+   <groupId>org.apache.flink</groupId>
+   <artifactId>flink-jdbc</artifactId>
+   <version>${flink.version}</version>
+</dependency>
+{% endhighlight %}
+
+## Solve dependencies of row 
+
+Similar to previous point we need solve dependencies of Row Class [(mvn repository: Table) ](https://mvnrepository.com/artifact/org.apache.flink/flink-table_2.10/1.1.2):
+
+  ![](/images/Flink-Tutorial/03b.png)
+
+
+* In pom.xml add:
+{% highlight Groff markup %}
+<dependency>
+   <groupId>org.apache.flink</groupId>
+   <artifactId>flink-table_2.10</artifactId>
+   <version>${flink.version}</version>
+</dependency>
+{% endhighlight %}
+
+* In Scala: 
+{% highlight Groff markup %}
+import org.apache.flink.api.table.Row
+{% endhighlight %}
+
+## Solve RowTypeInfo property (and their new dependencies)
+
+This is the new error to solve:
+
+  ![](/images/Flink-Tutorial/04.png)
+
+
+* If check the code of [JDBCInputFormat.java](https://github.com/apache/flink/blob/master/flink-batch-connectors/flink-jdbc/src/main/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.java#L69), we can see [this new property](https://github.com/apache/flink/commit/09b428bd65819b946cf82ab1fdee305eb5a941f5#diff-9b49a5041d50d9f9fad3f8060b3d1310R69) (and mandatory) added on Apr 2016 by [FLINK-3750](https://issues.apache.org/jira/browse/FLINK-3750)  Manual [JDBCInputFormat](https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/io/jdbc/JDBCInputFormat.html) v1.2 in Java
+
+   Add the new Property: **setRowTypeInfo**
+   
+{% highlight Groff markup %}
+val inputFormat = JDBCInputFormat.buildJDBCInputFormat()
+  .setDrivername("org.apache.kylin.jdbc.Driver")
+  .setDBUrl("jdbc:kylin://172.17.0.2:7070/learn_kylin")
+  .setUsername("ADMIN")
+  .setPassword("KYLIN")
+  .setQuery("select count(distinct seller_id) as sellers from kylin_sales group by part_dt order by part_dt")
+  .setRowTypeInfo(DB_ROWTYPE)
+  .finish()
+{% endhighlight %}
+
+* How can configure this property in Scala? In [Attempt4](https://codegists.com/snippet/scala/jdbcissuescala_zeitgeist_scala), there is an incorrect solution
+   
+   We can check the types using the intellisense: ![alt text](/images/Flink-Tutorial/05.png)
+   
+   Then we will need add more dependences; Add to scala:
+
+{% highlight Groff markup %}
+import org.apache.flink.api.table.typeutils.RowTypeInfo
+import org.apache.flink.api.common.typeinfo.{BasicTypeInfo, TypeInformation}
+{% endhighlight %}
+
+   Create a Array or Seq of TypeInformation[ ]
+
+  ![](/images/Flink-Tutorial/06.png)
+
+
+   Solution:
+   
+{% highlight Groff markup %}
+   var stringColum: TypeInformation[String] = createTypeInformation[String]
+   val DB_ROWTYPE = new RowTypeInfo(Seq(stringColum))
+{% endhighlight %}
+
+## Solve ClassNotFoundException
+
+  ![](/images/Flink-Tutorial/07.png)
+
+Need find the kylin-jdbc-x.x.x.jar and then expose to Flink
+
+1. Find the Kylin JDBC jar
+
+   From Kylin [Download](http://kylin.apache.org/download/) choose **Binary** and the **correct version of Kylin and HBase**
+   
+   Download & Unpack: in ./lib: 
+   
+  ![](/images/Flink-Tutorial/08.png)
+
+
+2. Make this JAR accessible to Flink
+
+   If you execute like service you need put this JAR in you Java class path using your .bashrc 
+
+  ![](/images/Flink-Tutorial/09.png)
+
+
+  Check the actual value: ![alt text](/images/Flink-Tutorial/10.png)
+  
+  Check the permission for this file (Must be accessible for you):
+
+  ![](/images/Flink-Tutorial/11.png)
+
+ 
+  If you are executing from IDE, need add your class path manually:
+  
+  On IntelliJ: ![alt text](/images/Flink-Tutorial/12.png)  > ![alt text](/images/Flink-Tutorial/13.png) > ![alt text](/images/Flink-Tutorial/14.png) > ![alt text](/images/Flink-Tutorial/15.png)
+  
+  The result, will be similar to: ![alt text](/images/Flink-Tutorial/16.png)
+  
+## Solve "Couldn’t access resultSet" error
+
+  ![](/images/Flink-Tutorial/17.png)
+
+
+It is related with [Flink 4108](https://issues.apache.org/jira/browse/FLINK-4108)  [(MailList)](http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/jdbc-JDBCInputFormat-td9393.html#a9415) and Timo Walther [make a PR](https://github.com/apache/flink/pull/2619)
+
+If you are running Flink <= 1.2 you will need apply this path and make clean install
+
+## Solve the casting error
+
+  ![](/images/Flink-Tutorial/18.png)
+
+In the error message you have the problem and solution …. nice ;)  ¡¡
+
+## The result
+
+The output must be similar to this, print the result of query by standard output:
+
+  ![](/images/Flink-Tutorial/19.png)
+
+
+## Now, more complex
+
+Try with a multi-colum and multi-type query:
+
+{% highlight Groff markup %}
+select part_dt, sum(price) as total_selled, count(distinct seller_id) as sellers 
+from kylin_sales 
+group by part_dt 
+order by part_dt
+{% endhighlight %}
+
+Need changes in DB_ROWTYPE:
+
+  ![](/images/Flink-Tutorial/20.png)
+
+
+And import lib of Java, to work with Data type of Java ![alt text](/images/Flink-Tutorial/21.png)
+
+The new result will be: 
+
+  ![](/images/Flink-Tutorial/23.png)
+
+
+## Error:  Reused Connection
+
+
+  ![](/images/Flink-Tutorial/24.png)
+
+Check if your HBase and Kylin is working. Also you can use Kylin UI for it.
+
+
+## Error:  java.lang.AbstractMethodError:  ….Avatica Connection
+
+See [Kylin 1898](https://issues.apache.org/jira/browse/KYLIN-1898) 
+
+It is a problem with kylin-jdbc-1.x.x. JAR, you need use Calcite 1.8 or above; The solution is to use Kylin 1.5.4 or above.
+
+  ![](/images/Flink-Tutorial/25.png)
+
+
+
+## Error: can't expand macros compiled by previous versions of scala
+
+Is a problem with versions of scala, check in with "scala -version" your actual version and choose your correct POM.
+
+Perhaps you will need a IntelliJ > File > Invalidates Cache > Invalidate and Restart.
+
+I added POM for Scala 2.11
+
+
+## Final Words
+
+Now you can read Kylin’s data from Apache Flink, great!
+
+[Full Code Example](https://github.com/albertoRamon/Flink/tree/master/ReadKylinFromFlink/flink-scala-project)
+
+Solved all integration problems, and tested with different types of data (Long, BigDecimal and Dates). The patch has been comited at 15 Oct, then, will be part of Flink 1.2.