You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by xx...@apache.org on 2020/11/12 06:55:42 UTC

[kylin] branch document updated (7476711 -> 0ed60fa)

This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a change to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git.


    from 7476711  add new committer chuxiao and new cve
     new 5a0dfdb  add sample dataset introduction
     new 0ed60fa  update some copywritings

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 website/_data/docs-cn.yml                |   1 +
 website/_data/docs.yml                   |   1 +
 website/_docs/howto/sample_dataset.cn.md |  96 +++++++++++++++++++++++++++++++
 website/_docs/howto/sample_dataset.md    |  82 ++++++++++++++++++++++++++
 website/images/SampleDataset/dataset.png | Bin 0 -> 67621 bytes
 5 files changed, 180 insertions(+)
 create mode 100644 website/_docs/howto/sample_dataset.cn.md
 create mode 100644 website/_docs/howto/sample_dataset.md
 create mode 100644 website/images/SampleDataset/dataset.png


[kylin] 01/02: add sample dataset introduction

Posted by xx...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git

commit 5a0dfdb95eb892fc9d3f51a2cd33331b40fd24f6
Author: xuekaiqi <ka...@qq.com>
AuthorDate: Wed Nov 11 13:32:54 2020 +0800

    add sample dataset introduction
---
 website/_data/docs-cn.yml                |   1 +
 website/_data/docs.yml                   |   1 +
 website/_docs/howto/sample_dataset.cn.md |  98 +++++++++++++++++++++++++++++++
 website/_docs/howto/sample_dataset.md    |  84 ++++++++++++++++++++++++++
 website/images/SampleDataset/dataset.png | Bin 0 -> 67621 bytes
 5 files changed, 184 insertions(+)

diff --git a/website/_data/docs-cn.yml b/website/_data/docs-cn.yml
index 290f255..764477c 100644
--- a/website/_data/docs-cn.yml
+++ b/website/_data/docs-cn.yml
@@ -73,3 +73,4 @@
   - howto/howto_cleanup_storage
   - howto/howto_use_cli
   - howto/howto_use_hive_mr_dict
+  - howto/sample_dataset
diff --git a/website/_data/docs.yml b/website/_data/docs.yml
index 75ddcee..90af718 100644
--- a/website/_data/docs.yml
+++ b/website/_data/docs.yml
@@ -91,6 +91,7 @@
   - howto/howto_enable_zookeeper_acl
   - howto/howto_use_health_check_cli
   - howto/howto_use_hive_mr_dict
+  - howto/sample_dataset
 
 - title: Security
   docs:
diff --git a/website/_docs/howto/sample_dataset.cn.md b/website/_docs/howto/sample_dataset.cn.md
new file mode 100644
index 0000000..d974d38
--- /dev/null
+++ b/website/_docs/howto/sample_dataset.cn.md
@@ -0,0 +1,98 @@
+---
+layout: docs-cn
+title:  样例数据集
+categories: howto
+permalink: /cn/docs/howto/sample_dataset.html
+---
+
+# 样例数据集
+
+Kylin 的二进制包中包含了一份样例数据集,共计 5 张表,其中事实表有 10000 条数据。用户可以在 Kylin 部署完成后,利用样例数据集进行测试。用户可通过执行脚本方式,将 Kylin 中自带的样例数据导入至 Hive。
+
+### 将样例数据集导入至 Hive
+
+导入样例数据集的可执行脚本为 **sample.sh** ,其默认存放路径为系统安装目录下的 **/bin** 目录:
+
+```sh
+$KYLIN_HOME/bin/sample.sh
+```
+
+脚本执行成功之后,可在服务器终端执行 **hive** 命令行,进入 hive,然后执行查询语句验证导入正常:
+
+```she
+hive
+```
+
+系统默认将 5 张表导入 Hive 的 `default` 数据库中,用户可以检查导入 Hive 的表清单或查询具体表:
+
+```sql
+hive> use default;
+hive> show tables;
+hive> select count(*) from kylin_sales;
+```
+
+> 提示:如果需要将表导入至 Hive 中指定的数据库,您可以修改 Kylin 配置文件 `$KYLIN_HOME/conf/kylin.properties` 中的配置项 `kylin.source.hive.database-for-flat-table` 至指定的 Hive 数据库。
+
+### 数据表介绍
+
+本产品支持星型数据模型和雪花模型。本文中用到的样例数据集是一个规范的雪花模型结构,它总共包含了 5 个数据表:
+
+- **KYLIN_SALES**
+
+  事实表,保存了销售订单的明细信息,每一行对应着一笔交易订单。交易记录包含了卖家、商品分类、订单金额、商品数量等信息,
+
+- **KYLIN_CATEGORY_GROUPINGS**
+
+  维度表,保存了商品分类的详细介绍,例如商品分类名称等。
+
+- **KYLIN_CAL_DT**
+
+  维度表,保存了时间的扩展信息。如单个日期所在的年始、月始、周始、年份、月份等。
+
+- **KYLIN_ACCOUNT**
+
+  维度表,用户账户表,每行是一个用户。用户在事实表中可以是买方(Buyer)或者卖方(Seller)。通过 ACCOUNT_ID 链接到 **KYLIN_SALES** 的 BUYER_ID 或者 SELLER_ID 上。
+
+- **KYLIN_COUNTRY**
+
+  维度表,用户所在的国家表,链接到 **KYLIN_ACCOUNT**。
+
+这5张表一起构成了整个雪花模型的结构,下图是实例-关系(ER)图:
+
+![样例数据表](/images/SampleDataset/dataset.png)
+
+### 数据表与关系
+
+通过脚本 `sample.sh`  生成的 Hive 表中包含的列较多,下面以样例项目 `learn_kylin` 中模型 `kylin_sales_models` 中被定义为维度的列为主,介绍一些主要的列。
+
+| 表                       | 字段                | 意义           |
+| :----------------------- | :------------------ | :------------- |
+| KYLIN_SALES              | TRANS_ID             | 订单 ID      |
+| KYLIN_SALES              | PART_DT             | 订单日期       |
+| KYLIN_SALES              | LEAF_CATEG_ID       | 商品分类 ID    |
+| KYLIN_SALES              | LSTG_SITE_ID       | 网站 ID    |
+| KYLIN_SALES              | SELLER_ID           | 卖家 ID        |
+| KYLIN_SALES              | BUYER_ID            | 买家 ID        |
+| KYLIN_SALES              | PRICE               | 订单金额       |
+| KYLIN_SALES              | ITEM_COUNT          | 购买商品个数   |
+| KYLIN_SALES              | LSTG_FORMAT_NAME    | 订单交易类型   |
+| KYLIN_SALES              | OPS_USER_ID          | 系统用户 ID  |
+| KYLIN_SALES              | OPS_REGION    | 系统用户地区   |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD1 | 用户定义字段 1 |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD3 | 用户定义字段 3 |
+| KYLIN_CATEGORY_GROUPINGS | UPD_DATE            | 更新日期       |
+| KYLIN_CATEGORY_GROUPINGS | UPD_USER            | 更新负责人     |
+| KYLIN_CATEGORY_GROUPINGS | META_CATEG_NAME     | 一级分类       |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL2_NAME     | 二级分类       |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL3_NAME     | 三级分类       |
+| KYLIN_CAL_DT             | CAL_DT              | 日期           |
+| KYLIN_CAL_DT             | WEEK_BEG_DT         | 周始日期       |
+| KYLIN_CAL_DT             | MONTH_BEG_DT        | 月始日期       |
+| KYLIN_CAL_DT             | YEAR_BEG_DT         | 年始日期       |
+| KYLIN_ACCOUNT            | ACCOUNT_ID          | 用户账户 ID    |
+| KYLIN_ACCOUNT            | ACCOUNT_COUNTRY     | 账户所在国家 ID |
+| KYLIN_ACCOUNT            | ACCOUNT_BUYER_LEVEL          | 买家账户等级 |
+| KYLIN_ACCOUNT            | ACCOUNT_SELLER_LEVEL     | 卖家账户等级 |
+| KYLIN_ACCOUNT            | ACCOUNT_CONTACT     | 账户联系方式 |
+| KYLIN_COUNTRY            | COUNTRY             | 国家 ID        |
+| KYLIN_COUNTRY            | NAME                | 国家名称       |
\ No newline at end of file
diff --git a/website/_docs/howto/sample_dataset.md b/website/_docs/howto/sample_dataset.md
new file mode 100644
index 0000000..e8726bb
--- /dev/null
+++ b/website/_docs/howto/sample_dataset.md
@@ -0,0 +1,84 @@
+---
+layout: docs
+title:  Sample Dataset
+categories: tutorial
+permalink: /docs/howto/sample_dataset.html
+---
+
+## Sample Dataset
+
+Kylin binary package contains a sample dataset for testing. It consists of five tables, including the fact table which has 10,000 rows. Because of the small data size, it is convenient to carry out as a test in the virtual machine. You can import the Kylin built-in sample data into Hive using executable script.
+
+### Import Sample Dataset into Hive
+
+The script is `sample.sh`. Its default storage path is the bin directory under `$KYLIN_HOME/bin`
+
+```sh
+$KYLIN_HOME/bin/sample.sh
+```
+
+Once the script is complete, execute the following commands to enter Hive. Then you can confirm whether the tables are imported successfully.
+
+```she
+hive
+```
+
+By default, the script imports 5 tables into Hive's `default` database. You can check the tables imported into Hive or query some tables:
+
+```sql
+hive> use default;
+hive> show tables;
+hive> select count(*) from kylin_sales;
+```
+
+> Tip: If you need to import the table to the specified database in Hive, you can modify the configuration item `kylin.source.hive.database-for-flat-table` in the Kylin configuration file `$KYLIN_HOME/conf/kylin.properties` to the specified Hive database.
+
+### Table Introduction
+
+Kylin supports both star schema and snowflake data model. In this manual, we will use a typical snowflake data model as our sample data set which contains five tables:
+
+- **KYLIN_SALES** This is the fact table, it contains detail information of sales orders. Each row holds information such as the seller, the commodity classification, the amount of orders, the quantity of goods, etc. Each row corresponds to a transaction.
+- **KYLIN_CATEGORY_GROUPINGS** This is a dimension table, it represents details of commodity classification, such as, name of commodity category, etc.
+- **KYLIN_CAL_DT** This is another dimension table which extends information of dates, such as beginning date of the year, beginning date of the month, beginning date of the week.
+- **KYLIN_ACCOUNT** This is the user account table. Each row represents a user who could be a buyer and/or a seller of a specific transaction, which links to **KYLIN_SALES** through the BUYER_ID or SELLER_ID.
+- **KYLIN_COUNTRY** This is the country dimension table linking to **KYLIN_ACCOUNT**.
+
+The five tables together constitute the structure of the entire snowflake data model. Below is a relational diagram of them.
+
+![Sample Table](/images/SampleDataset/dataset.png)
+
+### Data Dictionary
+
+The tables generated by the script `sample.sh` contains many columns. Below, we will introduce some key columns which are focus on the columns defined as dimensions in the model `kylin_sales_models` of the sample project `learn_kylin`.
+
+| Table                    | Field                | Description                      |
+| :----------------------- | :------------------- | :------------------------------- |
+| KYLIN_SALES              | TRANS_ID             | Order ID                         |
+| KYLIN_SALES              | PART_DT              | Order Date                       |
+| KYLIN_SALES              | LEAF_CATEG_ID        | ID Of Commodity Category         |
+| KYLIN_SALES              | LSTG_SITE_ID         | Site ID                          |
+| KYLIN_SALES              | SELLER_ID            | Account ID Of Seller             |
+| KYLIN_SALES              | BUYER_ID             | Account ID Of Buyer              |
+| KYLIN_SALES              | PRICE                | Order Amount                     |
+| KYLIN_SALES              | ITEM_COUNT           | The Number Of Purchased Goods    |
+| KYLIN_SALES              | LSTG_FORMAT_NAME     | Order Transaction Type           |
+| KYLIN_SALES              | OPS_USER_ID          | System User ID                   |
+| KYLIN_SALES              | OPS_REGION           | System User Region               |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD1  | User Defined Fields 1            |
+| KYLIN_CATEGORY_GROUPINGS | USER_DEFINED_FIELD3  | User Defined Fields 3            |
+| KYLIN_CATEGORY_GROUPINGS | UPD_DATE             | Update Date                      |
+| KYLIN_CATEGORY_GROUPINGS | UPD_USER             | Update User                      |
+| KYLIN_CATEGORY_GROUPINGS | META_CATEG_NAME      | Level 1 Category                 |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL2_NAME      | Level 2 Category                 |
+| KYLIN_CATEGORY_GROUPINGS | CATEG_LVL3_NAME      | Level 3 Category                 |
+| KYLIN_CAL_DT             | CAL_DT               | Date                             |
+| KYLIN_CAL_DT             | WEEK_BEG_DT          | Week Beginning Date              |
+| KYLIN_CAL_DT             | MONTH_BEG_DT         | Month Beginning Date             |
+| KYLIN_CAL_DT             | YEAR_BEG_DT          | Year Beginning Date              |
+| KYLIN_ACCOUNT            | ACCOUNT_ID           | ID Number Of Account             |
+| KYLIN_ACCOUNT            | ACCOUNT_COUNTRY      | Country ID Where Account Resides |
+| KYLIN_ACCOUNT            | ACCOUNT_BUYER_LEVEL  | Buyer Account Level              |
+| KYLIN_ACCOUNT            | ACCOUNT_SELLER_LEVEL | Seller Account Level             |
+| KYLIN_ACCOUNT            | ACCOUNT_CONTACT      | Contact of Account               |
+| KYLIN_COUNTRY            | COUNTRY              | Country ID                       |
+| KYLIN_COUNTRY            | NAME                 | Descriptive Name Of Country      |
\ No newline at end of file
diff --git a/website/images/SampleDataset/dataset.png b/website/images/SampleDataset/dataset.png
new file mode 100644
index 0000000..2c20652
Binary files /dev/null and b/website/images/SampleDataset/dataset.png differ


[kylin] 02/02: update some copywritings

Posted by xx...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

xxyu pushed a commit to branch document
in repository https://gitbox.apache.org/repos/asf/kylin.git

commit 0ed60fa8a4fd14dd5023c2071bae87bfcce6960f
Author: xuekaiqi <ka...@qq.com>
AuthorDate: Thu Nov 12 14:43:14 2020 +0800

    update some copywritings
---
 website/_docs/howto/sample_dataset.cn.md | 4 +---
 website/_docs/howto/sample_dataset.md    | 8 +++-----
 2 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/website/_docs/howto/sample_dataset.cn.md b/website/_docs/howto/sample_dataset.cn.md
index d974d38..df37cc7 100644
--- a/website/_docs/howto/sample_dataset.cn.md
+++ b/website/_docs/howto/sample_dataset.cn.md
@@ -5,8 +5,6 @@ categories: howto
 permalink: /cn/docs/howto/sample_dataset.html
 ---
 
-# 样例数据集
-
 Kylin 的二进制包中包含了一份样例数据集,共计 5 张表,其中事实表有 10000 条数据。用户可以在 Kylin 部署完成后,利用样例数据集进行测试。用户可通过执行脚本方式,将 Kylin 中自带的样例数据导入至 Hive。
 
 ### 将样例数据集导入至 Hive
@@ -19,7 +17,7 @@ $KYLIN_HOME/bin/sample.sh
 
 脚本执行成功之后,可在服务器终端执行 **hive** 命令行,进入 hive,然后执行查询语句验证导入正常:
 
-```she
+```sh
 hive
 ```
 
diff --git a/website/_docs/howto/sample_dataset.md b/website/_docs/howto/sample_dataset.md
index e8726bb..3c3a51d 100644
--- a/website/_docs/howto/sample_dataset.md
+++ b/website/_docs/howto/sample_dataset.md
@@ -5,13 +5,11 @@ categories: tutorial
 permalink: /docs/howto/sample_dataset.html
 ---
 
-## Sample Dataset
-
 Kylin binary package contains a sample dataset for testing. It consists of five tables, including the fact table which has 10,000 rows. Because of the small data size, it is convenient to carry out as a test in the virtual machine. You can import the Kylin built-in sample data into Hive using executable script.
 
 ### Import Sample Dataset into Hive
 
-The script is `sample.sh`. Its default storage path is the bin directory under `$KYLIN_HOME/bin`
+The script is `sample.sh`. you can find it under `$KYLIN_HOME/bin`.
 
 ```sh
 $KYLIN_HOME/bin/sample.sh
@@ -19,7 +17,7 @@ $KYLIN_HOME/bin/sample.sh
 
 Once the script is complete, execute the following commands to enter Hive. Then you can confirm whether the tables are imported successfully.
 
-```she
+```sh
 hive
 ```
 
@@ -49,7 +47,7 @@ The five tables together constitute the structure of the entire snowflake data m
 
 ### Data Dictionary
 
-The tables generated by the script `sample.sh` contains many columns. Below, we will introduce some key columns which are focus on the columns defined as dimensions in the model `kylin_sales_models` of the sample project `learn_kylin`.
+The generated hive tables contains too many columns which may confused you. So the following tables will list some key columns which referred in Kylin's Model/Cube, and describe the underlying business meaning of them.
 
 | Table                    | Field                | Description                      |
 | :----------------------- | :------------------- | :------------------------------- |