You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kylin.apache.org by ya...@apache.org on 2022/04/02 08:53:56 UTC
[kylin] branch kylin4_on_cloud updated: Modify docs (#1847)

This is an automated email from the ASF dual-hosted git repository.

yaqian pushed a commit to branch kylin4_on_cloud
in repository https://gitbox.apache.org/repos/asf/kylin.git


The following commit(s) were added to refs/heads/kylin4_on_cloud by this push:
     new 113e417  Modify docs (#1847)
113e417 is described below

commit 113e417c6096d424c3c1bf8bcdf9ca78abb8013d
Author: Yaqian Zhang <59...@qq.com>
AuthorDate: Sat Apr 2 16:53:51 2022 +0800

    Modify docs (#1847)
---
 README.md                                          |  5 ++-
 instances/aws_instance.py                          |  2 +-
 ...vanced_configs.md => advanced_configuration.md} | 10 ++---
 readme/commands.md                                 | 43 +++++++++++++++-------
 readme/{configs.md => configuration.md}            | 16 +++++---
 readme/quick_start.md                              | 27 ++++++++------
 readme/quick_start_for_multiple_clusters.md        |  4 +-
 7 files changed, 68 insertions(+), 39 deletions(-)

diff --git a/README.md b/README.md
index c4704e4..2599bc9 100644
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@ When cluster(s) created, services and nodes will be like below:
 1. For more details about `cost` of tool, see document [cost calculation](./readme/cost_calculation.md).
 2. For more details about `commands` of tool, see document [commands](./readme/commands.md).
 3. For more details about the `prerequisites` of tool, see document [prerequisites](./readme/prerequisites.md).
-4. For more details about `advanced configs` of tool, see document [advanced configs](./readme/advanced_configs.md).
+4. For more details about `advanced configs` of tool, see document [configuration](./readme/configuration.md) and [advanced configuration](./readme/advanced_configuration.md).
 5. For more details about `monitor services` supported by tool, see document [monitor](./readme/monitor.md).
 6. For more details about `troubleshooting`, see document [troubleshooting](./readme/trouble_shooting.md).
 7. The current tool has already opened the public port for some services. You can access the service by `public IP` of related EC2 instances.
@@ -49,6 +49,7 @@ When cluster(s) created, services and nodes will be like below:
    3. `Prometheus`:  9090, 9100.
    4. `Kylin`: 7070.
    5. `Spark`: 8080, 4040.
+   6. `MDX for Kylin`: 7080.
 8. More about cloudformation syntax, please check [aws website](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html).
-9. The current Kylin version is 4.0.0.
+9. The current Kylin version is 4.0.1.
 10. The current Spark version is 3.1.1.
diff --git a/instances/aws_instance.py b/instances/aws_instance.py
index f370d95..9260e79 100644
--- a/instances/aws_instance.py
+++ b/instances/aws_instance.py
@@ -1808,7 +1808,7 @@ class AWSInstance:
         logger.info(f"Fetching messages successfully ...")
 
         header_msg = '\n=================== List Alive Nodes ===========================\n'
-        result = header_msg + f"Stack Name\t\tInstance ID\t\tPrivate Ip\t\tPublic Ip\t\t\n"
+        result = header_msg + f"Node Name\t\tInstance ID\t\tPrivate Ip\t\tPublic Ip\t\t\n"
         for msg in msgs:
             result += msg + '\n'
         result += header_msg
diff --git a/readme/advanced_configs.md b/readme/advanced_configuration.md
similarity index 83%
rename from readme/advanced_configs.md
rename to readme/advanced_configuration.md
index cd36d3b..0218650 100644
--- a/readme/advanced_configs.md
+++ b/readme/advanced_configuration.md
@@ -16,7 +16,7 @@ There are `9` modules params for tools.  Introductions as below:
 
 - EC2_KYLIN4_PARAMS: These params of the module are for creating a Kylin4.
 
-- EC2_SPARK_WORKER_PARAMS: These params of the module are for creating **Spark Workers**, the default is **3** spark workers for all clusters.
+- EC2_SPARK_WORKER_PARAMS: These params of the module are for creating **Spark Workers**, the default is **3** spark workers for all of the clusters.
 
 - EC2_KYLIN4_SCALE_PARAMS: these params of the module are for scaling **Kylin4 nodes**, the range of **Kylin4 nodes** is related to `KYLIN_SCALE_UP_NODES` and `KYLIN_SCALE_DOWN_NODES`.
 
@@ -25,16 +25,16 @@ There are `9` modules params for tools.  Introductions as below:
   > 1. `KYLIN_SCALE_UP_NODES` is for the range of Kylin nodes to scale up. 
   > 2. `KYLIN_SCALE_DOWN_NODES` is for the range of Kylin nodes to scale down.
   > 3. The range of `KYLIN_SCALE_UP_NODES` must contain the range of `KYLIN_SCALE_DOWN_NODES`.
-  > 4. **They are effective to all clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
+  > 4. **They are effective to all of the clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
 
 - EC2_SPARK_SCALE_SLAVE_PARAMS: these params of the module are for scaling **Spark workers**, the range of **Spark Workers is related to `SPARK_WORKER_SCALE_UP_NODES` and `SPARK_WORKER_SCALE_DOWN_NODES`.
 
   > Note:
   >
-  > 1. `SPARK_WORKER_SCALE_UP_NODES` is for the range for spark workers to scale up. **It's effective to all clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
-  > 2. `SPARK_WORKER_SCALE_DOWN_NODES` is for the range for spark workers to scale down. **It's effective to all clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
+  > 1. `SPARK_WORKER_SCALE_UP_NODES` is for the range for spark workers to scale up. **It's effective to all of the clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
+  > 2. `SPARK_WORKER_SCALE_DOWN_NODES` is for the range for spark workers to scale down. **It's effective to all of the clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
   > 3. The range of `SPARK_WORKER_SCALE_UP_NODES` must contain the range of `SPARK_WORKER_SCALE_DOWN_NODES`.
-  > 4. **They are effective to all clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
+  > 4. **They are effective to all of the clusters which are not only `default cluster` but also another cluster whose index is in `${CLUSTER_INDEXES}`.**
 
 ### Customize Configs
 
diff --git a/readme/commands.md b/readme/commands.md
index 7bca88f..81dd510 100644
--- a/readme/commands.md
+++ b/readme/commands.md
@@ -1,16 +1,21 @@
 ## Commands<a name="run"></a>
 
 Command:
+  > Note:
+  >
+  > Options are placed in `[]`, and different options are separated by `|`.
 
 ```shell
-python deploy.py --type [deploy|destroy|list|scale] --scale-type [up|down] --node-type [kylin|spark_worker] [--cluster {1..6}|all|default]
+python deploy.py --type [deploy|destroy|destroy-all|list|scale] --mode [all|job|query] --scale-type [up|down] --node-type [kylin|spark_worker] --cluster [{1..6}|all|default]
 ```
 
 - deploy: create cluster(s).
 
-- destroy: destroy created cluster(s).
+- destroy: destroy created cluster(s). Including kylin node, spark master node, spark slave node and zookeeper node.
+
+- destroy-all: destroy all of the node. Including kylin node, spark master node, spark slave node,  zookeeper node, rds node, monitor node and vpc node .
 
-- list: list alive nodes which are with stack name, instance id, private IP, and public IP.
+- list: list alive nodes which are with node name, instance id, private IP, and public IP.
 
 - scale: Must be used with `--scale-type` and `--node-type`.
 
@@ -19,14 +24,26 @@ python deploy.py --type [deploy|destroy|list|scale] --scale-type [up|down] --nod
   > 1. Current support to scale up/down `kylin` or `spark_worker` for a specific cluster.
   > 2. Before scaling up/down `kylin` or `spark_worker` nodes, Cluster services must be ready.
   > 3. If you want to scale a `kylin` or `spark_worker` node to a specified cluster, please add the `--cluster ${cluster ID}` to specify the expected node add to the cluster `${cluster ID}`.
-  > 4. For details about the index of the cluster,  please check [Indexes of clusters](./configs.md#indexofcluster).
+  > 4. For details about the index of the cluster,  please check [Indexes of clusters](./configuration.md#indexofcluster).
 
 ### Command for deploy
 
-- Deploy a default cluster
+- Deploy a cluster, the mode of kylin node is `all`
+
+```shell
+$ python deploy.py --type deploy
+```
+
+- deploy a cluster, the mode of kylin node is `job`
+
+```shell
+$ python deploy.py --type deploy --mode job
+```
+
+- deploy a cluster, the mode of kylin node is `query`
 
 ```shell
-$ python deploy.py --type deploy [--cluster default]
+$ python deploy.py --type deploy --mode query
 ```
 
 - Deploy a cluster with a specific cluster index. <a name="deploycluster"></a>
@@ -37,7 +54,7 @@ $ python deploy.py --type deploy --cluster ${cluster ID}
 
 > Note: the `${cluster ID}` must be in the range of `CLUSTER_INDEXES`.
 
-- Deploy all clusters which contain the default cluster and all clusters whose index is in the range of `CLUSTER_INDEXES`.
+- Deploy all of the clusters which contain the default cluster and all of the clusters whose index is in the range of `CLUSTER_INDEXES`.
 
 ```shell
 $ python deploy.py --type deploy --cluster all
@@ -47,12 +64,12 @@ $ python deploy.py --type deploy --cluster all
 
 > Note:
 >
-> 		Destroy all clusters will not delete vpc, rds, and monitor node. So if user doesn't want to hold the env, please set the `ALWAYS_DESTROY_VPC_RDS_MONITOR` to be `'true'`.
+> 		By default, using the `destroy` command does not vpc, rds, and monitor node. So if user doesn't want to hold the env, please use `destroy-all` command.
 
-- Destroy a default cluster
+- Destroy the default cluster
 
 ```shell
-$ python deploy.py --type destroy [--cluster default]
+$ python deploy.py --type destroy
 ```
 
 - Destroy a cluster with a specific cluster index. 
@@ -63,7 +80,7 @@ $ python deploy.py --type destroy --cluster ${cluster ID}
 
 > Note: the `${cluster ID}` must be in the range of `CLUSTER_INDEXES`.
 
-- Destroy all clusters which contain the default cluster and all clusters whose index is in the range of `CLUSTER_INDEXES`.
+- Destroy all of the clusters which contain the default cluster and all of the clusters whose index is in the range of `CLUSTER_INDEXES`.
 
 ```shell
 $ python deploy.py --type destroy --cluster all
@@ -71,7 +88,7 @@ $ python deploy.py --type destroy --cluster all
 
 ### Command for list
 
-- List nodes that are with **stack name**, **instance id**, **private IP,** and **public IP** in **available stacks**.
+- List nodes that are with **node name**, **instance id**, **private IP,** and **public IP** in **available stacks**.
 
 ```shell
 $ python deploy.py --type list
@@ -83,7 +100,7 @@ $ python deploy.py --type list
 >
 > 1. Scale command must be used with `--scale-type` and `--node-type`.
 > 2. If the scale command does not specify a `cluster ID`, then the scaled node(Kylin or spark worker) will be added to the `default` cluster.
-> 3. Scale command **not support** to **scale** node (kylin or spark worker) to **all clusters** at **one time**. It means that `python ./deploy.py --type scale --scale-type up[|down] --node-type kylin[|spark_worker] --cluster all` is invalid commad.
+> 3. Scale command **not support** to **scale** node (kylin or spark worker) to **all of the clusters** at **one time**. It means that `python ./deploy.py --type scale --scale-type up[|down] --node-type kylin[|spark_worker] --cluster all` is invalid commad.
 > 4. Scale params which are `KYLIN_SCALE_UP_NODES`, `KYLIN_SCALE_DOWN_NODES`, `SPARK_WORKER_SCALE_UP_NODES` and `SPARK_WORKER_SCALE_DOWN_NODES` effect on all cluster. So if user wants to scale a node for a specific cluster, then modify the scale params before **every run time.**
 > 5. **(Important!!!)** The current cluster is created with default `3` spark workers and `1` Kylin node. The `3` spark workers can not be scaled down. The `1`  Kylin node also can not be scaled down.
 > 6. **(Important!!!)** The current cluster can only scale up or down the range of nodes which is in  `KYLIN_SCALE_UP_NODES`, `KYLIN_SCALE_DOWN_NODES`, `SPARK_WORKER_SCALE_UP_NODES,` and `SPARK_WORKER_SCALE_DOWN_NODES`. Not the default `3` spark workers and `1` kylin node in a cluster.
diff --git a/readme/configs.md b/readme/configuration.md
similarity index 87%
rename from readme/configs.md
rename to readme/configuration.md
index b3281f7..52e0559 100644
--- a/readme/configs.md
+++ b/readme/configuration.md
@@ -1,18 +1,24 @@
-## Configs
+## Configuration
 
 #### I. Configure the `kylin_configs.yaml`
 
 **Required parameters**:
 
-- `AWS_REGION`: Current region for EC2 instances.
+- `AWS_REGION`: Current region for EC2 instances. Default is `cn-northwest-1`.
 - `IAMRole`: IAM role which has the access to aws authority. This parameter will be set to the created **name** of the IAM role.
 - `S3_URI`: the prefix path of storing `jars/scripts/tar`. For example, this parameter will be set to `s3://.../kylin4-aws-test`.
 - `KeyName`: Security key name is a set of security credentials that you use to prove your identity when connecting to an instance. This parameter will be set to the created **name** of key pair`.
 - `CIDR_IP`:  An inbound rule permits instances to receive traffic from the specified IPv4 or IPv6 CIDR address range, or the instances associated with the specified security group.
+
+**Optional parameters**:
+
 - `DB_IDENTIFIER`: this param should be only one in the `RDS -> Databases`. And it will be the name of created RDS database.
-- `DB_PORT`: this param will be the port of created RDS database, default is `3306`.
-- `DB_USER`: this param will be a login ID for the master user of your DB instance, the default is `root`.
-- `DB_PASSWORD`: this param will be the password of `DB_USER` to access the DB instance. default is `123456test`, it's strongly suggested you change it.
+- `DB_PORT`: this param will be the port of created RDS database. The default value is `3306`.
+- `DB_USER`: this param will be a login ID for the master user of your DB instance. The default value is `root`.
+- `DB_PASSWORD`: this param will be the password of `DB_USER` to access the DB instance. The default value is `123456test`, it's strongly suggested you change it.
+
+- `ENABLE_MDX`: Whether to start the `MDX for Kylin` service when starting the cluster. The default value is `false`.
+- `SUPPORT_GLUE`: Whether to use AWS Glue as the metastore service of hive data source. The default value is `true`, effective only when deploying a kylin node of `job` mode.
 
 #### II. Configure the `kylin.properties` in `backup/properties` directories.<a name="cluster"></a>
 
diff --git a/readme/quick_start.md b/readme/quick_start.md
index 7311c2b..f86f026 100644
--- a/readme/quick_start.md
+++ b/readme/quick_start.md
@@ -15,26 +15,30 @@
    git clone https://github.com/apache/kylin.git && cd kylin && git checkout kylin4_on_cloud
    ```
 
-3. Modify the `kylin_config.yml`.
+3. Configure the `kylin_config.yaml`.
 
-   1. Set the `AWS_REGION`, such as us-east-1.
+   1. Set the `AWS_REGION`, such as `us-east-1`.
 
-   2. Set the `IAMRole`, please check [Create an IAM role](./prerequisites.md#IAM).
+   2. Set the `IAMRole`, please check [create an IAM role](./prerequisites.md#IAM).
 
-   3. Set the `S3_URI`, please check [Create a S3 direcotry](./prerequisites.md#S3).
+   3. Set the `S3_URI`, please check [create a S3 direcotry](./prerequisites.md#S3).
 
-   4. Set the `KeyName`, please check [Create a keypair](./prerequisites.md#keypair).
+   4. Set the `KeyName`, please check [create a keypair](./prerequisites.md#keypair).
 
    5. Set the `CIDR_IP`, make sure that the `CIDR_IP` match the pattern `xxx.xxx.xxx.xxx/16[|24|32]`.
 
-      > Note: 
+      > Note:
       >
       > 1. this `CIDR_IP` is the specified IPv4 or IPv6 CIDR address range which an inbound rule can permit instances to receive traffic from.
       >
       > 2. In one word, it will let your mac which IP is in the `CIDR_IP` to access instances.
 
+   6. Set the `ENABLE_MDX`, if you want to use `MDX for Kylin`, you can set this parameter to `true`. For `MDX for Kylin`, please refer to: [The manual of MDX for Kylin](https://kyligence.github.io/mdx-kylin/).
+
 4. Init python env.
 
+> Note: You need to ensure that the local machine has installed Python above version 3.6.6.
+
 ```shell
 $ bin/init.sh
 $ source venv/bin/activate
@@ -46,8 +50,6 @@ Check the python version:
 $ python --version
 ```
 
-> Note: If Python is already installed locally, you need to ensure that the python version is 3.6.6 or later.
-
 5. Execute commands to deploy a cluster quickly.
 
 ```shell
@@ -58,8 +60,9 @@ After this cluster is ready, you will see the message `Kylin Cluster already sta
 
 >  Note: 
 >
-> 1. For more details about the properties of kylin4 in a cluster, please check [configure kylin.properties](./configs.md#cluster).
-> 2. For more details about the index of the clusters,  please check [Indexes of clusters](./configs.md#indexofcluster).
+> 1. By default, the mode of kylin node in the deployed cluster is `all`, supports both `job` and `query`. If you want to deploy a read-write separated cluster, you can use command `python deploy.py --type deploy --mode job` to deploy a `job` cluster, and use command `python deploy.py --type deploy --mode query` to deploy a `query` cluster. AWS Glue is supported by default in `job` cluster.
+> 2. For more details about the properties of kylin4 in a cluster, please check [configure kylin.properties](./configuration.md#cluster).
+> 3. For more details about the index of the clusters,  please check [Indexes of clusters](./configuration.md#indexofcluster).
 
 6. Execute commands to list nodes of the cluster.
 
@@ -73,6 +76,8 @@ You can access `Kylin` web by `http://{kylin public ip}:7070/kylin`.
 
 ![kylin login](../images/kylinlogin.png)
 
+If you set `ENABLE_MDX` to true, you can access `MDX for Kylin` by `http://{kylin public ip}:7080/kylin`.
+
 7. Destroy the cluster quickly.
 
 ```shell
@@ -82,5 +87,5 @@ $ python deploy.py --type destroy
 > Note:
 >
 > 1. If you want to check about a quick start for multiple clusters, please referer to a [quick start for multiple clusters](./quick_start_for_multiple_clusters.md).
-> 2. **Current destroy operation will remain some stack which contains `RDS` and so on**. So if user want to destroy clearly, please modify the `ALWAYS_DESTROY_VPC_RDS_MONITOR` in `kylin_configs.yml` to be `true` and re-execute `destroy` command. 
+> 2. **Current destroy operation will remain some stack which contains `RDS` and so on**. So if user want to destroy clearly, please use `python deploy.py --type destroy-all`.
 
diff --git a/readme/quick_start_for_multiple_clusters.md b/readme/quick_start_for_multiple_clusters.md
index fd18836..e025941 100644
--- a/readme/quick_start_for_multiple_clusters.md
+++ b/readme/quick_start_for_multiple_clusters.md
@@ -8,9 +8,9 @@
    >
    > 1. `CLUSTER_INDEXES` means that cluster index is in the range of `CLUSTER_INDEXES`. 
    > 2. Configs for multiple clusters are also from `kylin_configs.yaml`.
-   > 3. For more details about the index of the clusters,  please check [Indexes of clusters](./configs.md#indexofcluster).
+   > 3. For more details about the index of the clusters,  please check [Indexes of clusters](./configuration.md#indexofcluster).
 
-2. Copy `kylin.properties.template` for expecting clusters to deploy, please check the [details](./configs.md#cluster).
+2. Copy `kylin.properties.template` for expecting clusters to deploy, please check the [details](./configuration.md#cluster).
 
 3. Execute commands to deploy all of the clusters.