You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@iotdb.apache.org by qi...@apache.org on 2022/11/26 03:16:43 UTC

[iotdb] 01/01: fix cluster consept

This is an automated email from the ASF dual-hosted git repository.

qiaojialin pushed a commit to branch fix_doc
in repository https://gitbox.apache.org/repos/asf/iotdb.git

commit 492a888622354b112b46042f753b23c27fc856f0
Author: qiaojialin <64...@qq.com>
AuthorDate: Sat Nov 26 11:16:29 2022 +0800

    fix cluster consept
---
 docs/UserGuide/Cluster/Cluster-Concept.md    | 42 +++++++++++-------------
 docs/zh/UserGuide/Cluster/Cluster-Concept.md | 49 +++++++++++++---------------
 2 files changed, 42 insertions(+), 49 deletions(-)

diff --git a/docs/UserGuide/Cluster/Cluster-Concept.md b/docs/UserGuide/Cluster/Cluster-Concept.md
index b54766df57..39d3865c8d 100644
--- a/docs/UserGuide/Cluster/Cluster-Concept.md
+++ b/docs/UserGuide/Cluster/Cluster-Concept.md
@@ -43,12 +43,12 @@ Client could only connect to the DataNode for operation.
 | DataNode          | node role                        | Data node, which manages data and meta data                                                                                                 |
 | Database          | meta data                        | Database, data are isolated physically from different databases                                                                             |
 | DeviceId          | device id                        | The full path from root to the penultimate level in the metadata tree represents a device id                                                |
-| SeriesSlot        | series partition slot            | Each database has a fixed number of series slots, containing the schemas of series                                                          |
-| SeriesTimeSlot    | a time partition of a SeriesSlot | All series of a time partition in a series slot                                                                                             |
+| SeriesSlot        | schema partition                 | Each database contains many SeriesSlot, the partition key is DeviceId                                                                       |
+| SchemaRegion      | schema region                    | A collection of multiple SeriesSlots                                                                                                        |
+| SchemaRegionGroup | logical concept                  | The number of SchemaRegions contained in group is the number of schema replication, it manages the same schema data, and back up each other |
+| SeriesTimeSlot    | data partition                   | The data of a time interval of SeriesSlot, a SeriesSlot contains multiple SeriesTimeSlots, the partition key is timestamp                   |
 | DataRegion        | data region                      | A collection of multiple SeriesTimeSlots                                                                                                    |
 | DataRegionGroup   | logical concept                  | The number of DataRegions contained in group is the number of data replication, it manages the same data, and back up each other            |
-| SchemaRegion      | schema region                    | A collection of multiple SeriesSlot                                                                                                         |
-| SchemaRegionGroup | logical concept                  | The number of SchemaRegions contained in group is the number of schema replication, it manages the same schema data, and back up each other |
 
 ## Characteristics of Cluster
 
@@ -70,47 +70,43 @@ Client could only connect to the DataNode for operation.
 
 The partitioning strategy partitions data and schema into different Regions, and allocates Regions to different DataNodes.
 
-It is recommended to set 1 database (there is no need to set the database according to the number of cores as in version 0.13), which is used as the database concept, and the cluster will dynamically allocate resources according to the number of nodes and cores.
+It is recommended to set 1 database, and the cluster will dynamically allocate resources according to the number of nodes and cores.
 
-The database contains multiple SchemaRegions (schema shards) and DataRegions (data shards), which are managed by DataNodes.
+The database contains multiple SchemaRegions and DataRegions, which are managed by DataNodes.
 
 * Schema partition strategy 
-    * For a time series schema, the ConfigNode maps the device ID (full path from root to the penultimate tier node) into a series\_partition\_slot and assigns this partition slot to a SchemaRegion group.
+    * For a time series schema, the ConfigNode maps the device ID (full path from root to the penultimate tier node) into a SeriesSlot and allocate this SeriesSlot to a SchemaRegionGroup.
 * Data partition strategy
-    * For a time series data point, the ConfigNode will map to a series\_partition\_slot (vertical partition) according to the device ID, and then map it to a time\_partition\_slot (horizontal partition) according to the data timestamp, and allocate this data partition to a DataRegion group.
+    * For a time series data point, the ConfigNode will map to a SeriesSlot according to the DeviceId, and then map it to a SeriesTimeSlot according to the timestamp, and allocate this SeriesTimeSlot to a DataRegionGroup.
   
 IoTDB uses a slot-based partitioning strategy, so the size of the partition information is controllable and does not grow infinitely with the number of time series or devices.
 
-Multiple replicas of a Region will be allocated to different DataNodes to avoid single point of failure, and the load balance of different DataNodes will be ensured when Regions are allocated.
+Regions will be allocated to different DataNodes to avoid single point of failure, and the load balance of different DataNodes will be ensured when Regions are allocated.
 
 ## Replication Strategy
 
 The replication strategy replicates data in multiple replicas, which are copies of each other. Multiple copies can provide high-availability services together and tolerate the failure of some copies.
 
-A region is the basic unit of replication. Multiple replicas of a region construct a high-availability replication group, to support high availability.
+A region is the basic unit of replication. Multiple replicas of a region construct a high-availability RegionGroup, to support high availability.
 
 * Replication and consensus
-  * Partition information: The cluster has 1 partition information group consisting of all ConfigNodes.
-  * Data: The cluster has multiple DataRegion groups, and each DataRegion group has multiple DataRegions with the same id.
-  * Schema: The cluster has multiple SchemaRegion groups, and each SchemaRegion group has multiple SchemaRegions with the same id.
+  * ConfigNode Group: Consisting of all ConfigNodes.
+  * SchemaRegionGroup: The cluster has multiple SchemaRegionGroups, and each SchemaRegionGroup has multiple SchemaRegions with the same id.
+  * DataRegionGroup: The cluster has multiple DataRegionGroups, and each DataRegionGroup has multiple DataRegions with the same id.
 
 An illustration of the partition allocation in cluster:
 
 <img style="width:100%; max-width:500px; max-height:500px; margin-left:auto; margin-right:auto; display:block;" src="https://github.com/apache/iotdb-bin-resources/blob/main/docs/UserGuide/Cluster/Data-Partition.png?raw=true">
 
-The figure contains 1 SchemaRegion group, and the schema_replication_factor is 3, so the 3 white SchemaRegion-0s form a replication group, and the Raft protocol is used to ensure data consistency.
+The figure contains 1 SchemaRegionGroup, and the schema_replication_factor is 3, so the 3 white SchemaRegion-0s form a replication group.
 
-The figure contains 3 DataRegion groups, and the data_replication_factor is 3, so there are 9 DataRegions in total.
+The figure contains 3 DataRegionGroups, and the data_replication_factor is 3, so there are 9 DataRegions in total.
 
 ## Consensus Protocol (Consistency Protocol)
 
-Among multiple replicas of each region group, data consistency is guaranteed through a consensus protocol, which routes read and write requests to multiple replicas.
+Among multiple Regions of each RegionGroup, consistency is guaranteed through a consensus protocol, which routes read and write requests to multiple replicas.
 
 * Current supported consensus protocol
-  * Standalone:Could only be used when replica is 1, which is the empty implementation of the consensus protocol.
-  * MultiLeader:Could be used in any number of replicas, only for DataRegion, writings can be applied on each replica and replicated asynchronously to other replicas.
-  * Ratis:Raft consensus protocol, Could be used in any number of replicas, and could be used for any region groups。
-  
-## 0.14.0-preview1 Function Map
-
-<img style="width:100%; max-width:800px; max-height:1000px; margin-left:auto; margin-right:auto; display:block;" src="https://github.com/apache/iotdb-bin-resources/blob/main/docs/UserGuide/Cluster/Preview1-Function.png?raw=true">
+  * SimpleConsensus:Provide strong consistency, could only be used when replica is 1, which is the empty implementation of the consensus protocol.
+  * IoTConsensus:Provide eventual consistency, could be used in any number of replicas, 2 replicas could avoid single point failure, only for DataRegion, writings can be applied on each replica and replicated asynchronously to other replicas.
+  * RatisConsensus:Provide Strong consistency, using raft consensus protocol, Could be used in any number of replicas, and could be used for any region groups。
\ No newline at end of file
diff --git a/docs/zh/UserGuide/Cluster/Cluster-Concept.md b/docs/zh/UserGuide/Cluster/Cluster-Concept.md
index 81c0db45ee..9e781fb7d9 100644
--- a/docs/zh/UserGuide/Cluster/Cluster-Concept.md
+++ b/docs/zh/UserGuide/Cluster/Cluster-Concept.md
@@ -39,16 +39,16 @@ Client 只能通过 DataNode 进行数据读写。
 
 | 名词                | 类型            | 解释                                   |
 |:------------------|:--------------|:-------------------------------------|
-| ConfigNode        | 节点角色          | 配置节点,管理集群节点信息、分区信息,监控集群状态、控制负载均衡     |
-| DataNode          | 节点角色          | 数据节点,管理数据、元数据                        |
+| ConfigNode        | 节点角色         | 配置节点,管理集群节点信息、分区信息,监控集群状态、控制负载均衡     |
+| DataNode          | 节点角色         | 数据节点,管理数据、元数据                        |
 | Database          | 元数据           | 数据库,不同数据库的数据物理隔离                     |
 | DeviceId          | 设备名           | 元数据树中从 root 到倒数第二级的全路径表示一个设备名        |
-| SeriesSlot        | 序列分区槽         | 每个 Database 会对应固定个数的序列槽,包含其中序列的元数据   |
-| SeriesTimeSlot    | 一个序列槽的一个时间分区槽 | 对应一个 SeriesSlot 内所有序列一个时间分区的数据       |
-| DataRegion        | 一组数据分区        | 多个 SeriesTimeSlot 的集合                |
-| DataRegionGroup   | 逻辑概念          | 包含数据副本数个 DataRegion,管理相同的数据,互为备份     |
-| SchemaRegion      | 一组元数据分区       | 多个 SeriesSlot 的集合                    |
+| SeriesSlot        | 元数据分区        | 每个 Database 包含多个元数据分区,根据设备名进行分区      |
+| SchemaRegion      | 一组元数据分区     | 多个 SeriesSlot 的集合                    |
 | SchemaRegionGroup | 逻辑概念          | 包含元数据副本数个 SchemaRegion,管理相同的元数据,互为备份 |
+| SeriesTimeSlot    | 数据分区          | 一个元数据分区的一段时间的数据对应一个数据分区,每个元数据分区对应多个数据分区,根据时间范围进行分区    |
+| DataRegion        | 一组数据分区       | 多个 SeriesTimeSlot 的集合                |
+| DataRegionGroup   | 逻辑概念          | 包含数据副本数个 DataRegion,管理相同的数据,互为备份     |
 
 ## 集群特点
 
@@ -60,7 +60,7 @@ Client 只能通过 DataNode 进行数据读写。
 * 大规模并行处理架构 MPP
     * 采用大规模并行处理架构及火山模型进行数据处理,具有高扩展性。
 * 可根据不同场景需求选择不同的共识协议
-    * 数据副本组和元数据副本组,可以采用 Standalone、多主复制、Raft 中的一种。
+    * 数据副本组和元数据副本组,可以采用不同的共识协议。
 * 可扩展分区策略
     * 集群采用分区表管理数据和元数据分区,自定义灵活的分配策略。
 * 内置监控框架
@@ -68,21 +68,21 @@ Client 只能通过 DataNode 进行数据读写。
 
 ## 分区策略
 
-分区策略将数据和元数据划分到不同的 Region 中,并把 Region 分配到不同的 DataNode。
+分区策略将数据和元数据划分到不同的 RegionGroup 中,并把 RegionGroup 的 Region 分配到不同的 DataNode。
 
-推荐设置 1 个 database(无需像 0.13 版本根据核数设置存储组),当做 database 概念使用,集群会根据节点数和核数动态分配资源。
+推荐设置 1 个 database,集群会根据节点数和核数动态分配资源。
 
-Database 包含多个 SchemaRegion(元数据分片) 和 DataRegion(数据分片),由 DataNode 管理。
+Database 包含多个 SchemaRegion 和 DataRegion,由 DataNode 管理。
 
 * 元数据分区策略 
-    * 对于一条未使用模板的时间序列的元数据,ConfigNode 会根据设备 ID (从 root 到倒数第二层节点的全路径)映射到一个序列分区槽内,并将此分区槽分配到一个 SchemaRegion 组中。
+    * 对于一条未使用模板的时间序列的元数据,ConfigNode 会根据设备 ID (从 root 到倒数第二层节点的全路径)映射到一个序列分区,并将此序列分区分配到一组 SchemaRegion 中。
 
 * 数据分区策略 
-    * 对于一个时间序列数据点,ConfigNode 会根据设备 ID 映射到一个序列分区槽内(纵向分区),再根据数据时间戳映射到一个时间分区槽内(横向分区),并将此序列分区槽下的此时间分区槽分配到一个 DataRegion 组中。
+    * 对于一个时间序列数据点,ConfigNode 会根据设备 ID 映射到一个序列分区(纵向分区),再根据时间戳映射到一个序列时间分区(横向分区),并将此序列时间分区分配到一组 DataRegion 中。
 
 IoTDB 使用了基于槽的分区策略,因此分区信息的大小是可控的,不会随时间序列或设备数无限增长。
 
-Region 的多个副本会分配到不同的 DataNode 上,避免单点失效,分配 Region 时会保证不同 DataNode 的负载均衡。
+Region 会分配到不同的 DataNode 上,分配 Region 时会保证不同 DataNode 的负载均衡。
 
 ## 复制策略
 
@@ -91,27 +91,24 @@ Region 的多个副本会分配到不同的 DataNode 上,避免单点失效,
 Region 是数据复制的基本单位,一个 Region 的多个副本构成了一个高可用复制组,数据互为备份。
 
 * 集群内的副本组
-    * 分区信息:集群有 1 个分区信息副本组,由所有 ConfigNode 组成。
-    * 数据:集群有多个 DataRegion 副本组,每个 DataRegion 副本组内有多个 id 相同的 DataRegion。
-    * 元数据:集群有多个 SchemaRegion 副本组,每个 SchemaRegion 副本组内有多个 id 相同的 SchemaRegion。
+    * ConfigNodeGroup:由所有 ConfigNode 组成。
+    * SchemaRegionGroup:集群有多个元数据组,每个 SchemaRegionGroup 内有多个 ID 相同的 SchemaRegion。
+    * DataRegionGroup:集群有多个数据组,每个 DataRegionGroup 内有多个 ID 相同的 DataRegion。
+    
 
 完整的集群分区复制的示意图如下:
 
 <img style="width:100%; max-width:500px; max-height:500px; margin-left:auto; margin-right:auto; display:block;" src="https://github.com/apache/iotdb-bin-resources/blob/main/docs/UserGuide/Cluster/Data-Partition.png?raw=true">
 
-图中包含 1 个 SchemaRegion 组,元数据采用 3 副本,因此 3 个白色的 SchemaRegion-0 组成了一个副本组。
+图中包含 1 个 SchemaRegionGroup,元数据采用 3 副本,因此 3 个白色的 SchemaRegion-0 组成了一个副本组。
 
-图中包含 3 个 DataRegion 组,数据采用 3 副本,因此一共有 9 个 DataRegion。
+图中包含 3 个 DataRegionGroup,数据采用 3 副本,因此一共有 9 个 DataRegion。
 
 ## 共识协议(一致性协议)
 
 每个副本组的多个副本之间,都通过一个具体的共识协议保证数据一致性,共识协议会将读写请求应用到多个副本上。
 
 * 现有的共识协议
-    * Standalone:仅单副本时可用,一致性协议的空实现,效率最高。
-    * MultiLeader:任意副本数可用,当前仅可用于 DataRegion 的副本上,写入可以在任一副本进行,并异步复制到其他副本。
-    * Ratis:Raft 协议的一种实现,任意副本数可用,当前可用于任意副本组上。
-
-## 0.14.0-Preview1 功能图
-
-<img style="width:100%; max-width:800px; max-height:1000px; margin-left:auto; margin-right:auto; display:block;" src="https://github.com/apache/iotdb-bin-resources/blob/main/docs/UserGuide/Cluster/Preview1-Function.png?raw=true">
\ No newline at end of file
+    * SimpleConsensus:提供强一致性,仅单副本时可用,一致性协议的极简实现,效率最高。
+    * IoTConsensus:提供最终一致性,任意副本数可用,2 副本时可容忍 1 节点失效,当前仅可用于 DataRegion 的副本上,写入可以在任一副本进行,并异步复制到其他副本。
+    * RatisConsensus:提供强一致性,Raft 协议的一种实现,任意副本数可用,当前可用于任意副本组上。
\ No newline at end of file