You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@inlong.apache.org by al...@apache.org on 2021/11/22 12:21:48 UTC

[incubator-inlong-website] branch master updated: [INLONG-1825] Add the principle introduction document of client partition assign (#195)

This is an automated email from the ASF dual-hosted git repository.

aloyszhang pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-inlong-website.git


The following commit(s) were added to refs/heads/master by this push:
     new 330035f  [INLONG-1825] Add the principle introduction document of client partition assign (#195)
330035f is described below

commit 330035fd330ea094da1a483874e04c0e65b0143c
Author: gosonzhang <46...@qq.com>
AuthorDate: Mon Nov 22 20:21:34 2021 +0800

    [INLONG-1825] Add the principle introduction document of client partition assign (#195)
---
 .../tubemq/client_partition_assign_introduction.md |  46 +++++++++++++++++++++
 .../tubemq/img/partition_assign/example.png        | Bin 0 -> 91627 bytes
 .../tubemq/img/partition_assign/flow_diagram.png   | Bin 0 -> 65305 bytes
 .../tubemq/img/partition_assign/topic_assign.png   | Bin 0 -> 85393 bytes
 .../tubemq/client_partition_assign_introduction.md |  44 ++++++++++++++++++++
 .../tubemq/img/partition_assign/example.png        | Bin 0 -> 102944 bytes
 .../tubemq/img/partition_assign/flow_diagram.png   | Bin 0 -> 60472 bytes
 .../tubemq/img/partition_assign/topic_assign.png   | Bin 0 -> 85395 bytes
 8 files changed, 90 insertions(+)

diff --git a/docs/modules/tubemq/client_partition_assign_introduction.md b/docs/modules/tubemq/client_partition_assign_introduction.md
new file mode 100644
index 0000000..449863b
--- /dev/null
+++ b/docs/modules/tubemq/client_partition_assign_introduction.md
@@ -0,0 +1,46 @@
+---
+title: Introduction to client partition assign
+---
+
+## 1 Preface
+Before version 0.12.0, the partition allocation of TubeMQ was controlled by the server-side. The advantage of this solution is that the client is simple to implement, after the client registers, it only needs to wait for the server-side to distribute the partition and register and consume the distributed partition. But its shortcomings are also more obvious:
+1. Data consumption waiting time is too long: the client has a relatively long time from startup to consumption to the first piece of data. In the best case, the client needs to wait for an allocation period (configurable, 30 seconds by default) to obtain the partition to be consumed and in extreme cases, it may exceed a few minutes. So the user is not satisfied with the waiting time;
+2. The partition allocation plan is not rich enough: the current service-side partition allocation plan is based on the total set of Topic partitions subscribed by the client, and the total number of clients in the distribution of this consumer group is distributed in a Hash modulo mode, and when the business needs a special distribution plan is adopted, the current distribution plan on the server-side is not friendly enough and cannot be changed at any time according to business needs;
+3. Does not support specified partition consumption: during the use of the version, the business feedback that the current server-side management and control is not flexible enough, for example, when the client needs to bind the consumption relationship between the consumer and the partition, or when you only want to consume certain partitions at a certain startup, The current service-side consumption control is not supported.
+In response to these problems, the 0.12.0 version launched a new client partition allocation management and control consumption model, combined with the current consumption lag situation awareness function of the partition, allowing the business to autonomously control the distribution and consumption of the partition.
+
+## 2 Usage Demo
+The client of partition assign is defined based on the ClientBalanceConsumer interface class, including 17 APIs in total. For related demo codes, please refer to the [ClientBalanceConsumerExample.java](https://github.com/apache/incubator-inlong/blob/master/inlong- tubemq/tubemq-example/src/main/java/org/apache/inlong/tubemq/example/ClientBalanceConsumerExample.java). 
+
+The overall code implementation logic is as follows:
+![](img/partition_assign/example.png)
+
+## 3 Implementation details
+### 3.1 The general idea
+According to business needs and analysis of the implementation of similar MQs, we added the ClientBalanceConsumer class on the TubeMQ consumer end. Through the API provided by the SDK, the business can periodically query the partition set information corresponding to the topic to be consumed; and the business can specify the partition and initial The offset is used to register and to cancel the registered partition; at the same time, the server does not control the partition allocation o [...]
+
+### 3.2 Field Definition
+Before using this type of consumer, the business needs to pay attention to the following two field definition information:
+- PartitionKey: Partition Key, String type, ID used to uniquely identify a partition in TubeMQ, globally unique within the cluster, in the format of "BrokerId:TopicName:PartitionId", the business query partition metadata information will return the result as PartitionKey gather;
+- bootstrapOffset: reset Offset, long type, the initial consumption value provided by the business when registering consumption on the specified partition, the effective value is [0, +value); when this interface is called, the field is set to -1, indicating that the business is stored on the service side Offset position continues consumption data
+
+### 4.3 Interactive Introduction
+#### 4.3.1 Core Ideas
+![](img/partition_assign/topic_assign.png)
+As shown above, the logic behind the client load balancing operation is mainly to deal with the partition set. The client must periodically obtain the subscribable partition set, and obtain the current consuming partition set of each client according to the allocation algorithm; the current consuming set is the same as The client is currently consuming the set of partitions to take the intersection to obtain the partitions that need to be released and newly registered; for the partitions [...]
+
+The following two issues need to be paid attention to during the client partition allocation and use business:
+- How to reduce the impact of partition expansion and contraction: TubeMQ will expand and contract at any time, such as abnormal Broker offline, operation and maintenance blacklist operation, operation and maintenance expansion topic partition, etc. In order to deal with this problem, business pull The partition metadata information obtained is the configuration information and the subscribed status of the partition; it is recommended that the business be distributed according to the com [...]
+- How to deal with the expansion and contraction on the client-side: By default, we believe that the business will use the number of partitions and the number of clients in the consumer group to allocate partitions based on the modulus. Therefore, we added sourceCount (consumer group) to the start() function of TubeMQ. The total number of nodes) and nodeId (the ID number of the current node) are two fields to tell the service side how many clients the consumer group will start, and what  [...]
+
+#### 4.3.2 Interactive Process
+The interaction between each node is as follows:
+![](img/partition_assign/flow_diagram.png)
+- The Master does not execute the balancing process on the Consumer controlled by the client. After the Master receives the consumer group registered by this type of client, it does not control partition assign, which is completely controlled by the client;
+- Consumer provides a partition query API for businesses to periodically query the partition set information corresponding to the Topic set to be consumed;
+- Consumer provides partition registration and deregistration APIs for the business to deregister the partitions that the client has registered and needs to be deregistered, register the designated unregistered partitions through the registration interface, and support the initial offset of the designated registration of the business during registration;
+- Consumers regularly report status and partition registration information, so that the Master side can perceive the current partition assign and registration status of each SDK so that the server can obtain the partition allocation information of the entire group;
+- Master provides a query API and supports operation and maintenance to query the partition allocation status of each node in the specified partition allocation consumer group through the API query interface.
+
+---
+<a href="#top">Back to top</a>
\ No newline at end of file
diff --git a/docs/modules/tubemq/img/partition_assign/example.png b/docs/modules/tubemq/img/partition_assign/example.png
new file mode 100644
index 0000000..78964fb
Binary files /dev/null and b/docs/modules/tubemq/img/partition_assign/example.png differ
diff --git a/docs/modules/tubemq/img/partition_assign/flow_diagram.png b/docs/modules/tubemq/img/partition_assign/flow_diagram.png
new file mode 100644
index 0000000..7d351d4
Binary files /dev/null and b/docs/modules/tubemq/img/partition_assign/flow_diagram.png differ
diff --git a/docs/modules/tubemq/img/partition_assign/topic_assign.png b/docs/modules/tubemq/img/partition_assign/topic_assign.png
new file mode 100644
index 0000000..65c2ee4
Binary files /dev/null and b/docs/modules/tubemq/img/partition_assign/topic_assign.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/client_partition_assign_introduction.md b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/client_partition_assign_introduction.md
new file mode 100644
index 0000000..a276cfd
--- /dev/null
+++ b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/client_partition_assign_introduction.md
@@ -0,0 +1,44 @@
+---
+title: 客户端分区分配介绍
+---
+
+## 1 前言
+在0.12.0版本以前,TubeMQ的分区分配都是由服务侧进行管控,这种方案的优势在于客户端实现简单,客户端注册后只需要等待服务侧派发分区,并对派发的分区进行注册消费即可,其缺点也比较明显:
+1. 数据消费等待时间过长: 客户端从启动到消费到第一条数据的时间比较长,主要原因在于服务侧是按照固定周期进行消费分区的任务分配,且过程中涉及到客户端对已分配分区的资源释放,客户端在最好的情况下都需要等待一个分配周期(可配置,默认30秒)才能获取到待消费分区,在极端情况下有可能超过几分钟,对业务及时消费到数据不满足;
+2. 分区分配方案不够丰富: 当前服务侧分区分配方案是按照客户端订阅的Topic分区总集合,与这个消费组分区分配时的总客户端个数进行Hash取模的方式进行分配,而业务需要采用特别的分配方案时,服务侧目前分配方案则显得不够友好,不能随业务需要随时变更;
+3. 不支持指定分区消费: 在版本使用过程中业务反馈当前服务侧管控不够灵活,比如客户端需要绑定消费者与分区的消费关系,或者某次启动只想消费其中某几个分区时,当前服务侧消费管控不支持。
+针对这些问题,3.9.1版本上线了新的客户端分区分配管控消费模式,结合分区当前消费滞后情况感知功能,让业务自主控制分区的分配和消费。
+
+## 2 使用示例
+客户端分区分配基于ClientBalanceConsumer接口类进行定义,一共17个API,相关的使用示例代码参考示例代码[ClientBalanceConsumerExample.java](https://github.com/apache/incubator-inlong/blob/master/inlong-tubemq/tubemq-example/src/main/java/org/apache/inlong/tubemq/example/ClientBalanceConsumerExample.java) 。总的代码实现逻辑如下图示:
+![](img/partition_assign/example.png)
+
+## 3 实现方案
+### 3.1 总的思路
+根据业务需求以及同类MQ的实现分析,我们在TubeMQ消费端增加ClientBalanceConsumer类,通过该SDK提供的API,业务可以定期查询待消费的Topic对应的分区集合信息;并且业务可以通过API对指定分区以及初始offset进行注册,以及注销之前已经注册的分区;同时服务端不管理该类消费组的分区分配,完全由客户端控制客户端与分区的分配、释放关系。
+
+### 3.2 字段定义
+在使用该类消费者前,业务需要注意如下2个字段信息定义:
+- PartitionKey: 分区Key,String类型,TubeMQ里用来唯一标识一个分区的ID,集群内全局唯一,格式为“brokerId:Topic名:partitionId”样式,业务查询分区元数据信息时将返回结果为PartitionKey的集合;
+- bootstrapOffset: 重置Offset,long类型,业务对指定分区进行注册消费时提供的初始消费值,有效值为[0, +value);调用该接口时该字段设置为-1表示业务按照服务侧存储的Offset位置接续消费数据
+
+### 4.3 交互介绍
+#### 4.3.1 核心思路
+![](img/partition_assign/topic_assign.png)
+如上图示,客户端负载均衡操作背后的逻辑主要是处理分区集合,客户端要定期获取可订阅分区集合,根据分配算法来获取每个客户端当前可消费的分区集合;当前可消费的集合与客户端当前在消费的分区集合取交集,获得需要释放和需要新注册的分区;对于需要新注册的分区,支持客户端指定初始消费的offset值。
+
+客户端分区分配使用中业务需要关注如下2个问题:
+- 如何减小分区扩缩容带来的影响: TubeMQ会随时进行扩缩容,比如Broker异常下线、运维进行黑名单操作、运维扩容Topic的分区等,为了应对这个问题,业务拉取到的分区元数据信息为配置信息,以及分区的可订阅状态;建议业务按照配置全集进行分配,然后针对元数据状态进行注销、注册处理(示例代码里有示例),这样可以避免因为Broker异常上下线、黑名单、临时不可订阅等操作带来的频繁释放和加入处理。
+- 如何应对客户端侧的扩缩容: 我们缺省认为业务会采用按照分区数与消费组的客户端数进行取模分配分区,因此,我们在TubeMQ的start()函数里增加了sourceCount(消费组的总节点数),nodeId(当前节点的ID号)两个字段,来告诉服务侧该消费组会启动多少客户端,每个客户端的ID号是多少,来保证取模分配的一致性;业务使用消费组时需要指定上述2个参数,sourceCount要确保同一个组里所有的消费者提供的值相同,nodeId要确保同一个组里所有消费者使用的ID是唯一的。通过这个方式确保消费组如果使用取模方案,对应的基础参数是没有冲突的。业务有可能不选取模方案,这个时候只需要设置sourceCount为无效值(小于0),则可关闭该缺省的参数要求。
+
+#### 4.3.2 交互流程
+各个节点间的交互如下:
+![](img/partition_assign/flow_diagram.png)
+- Master不对客户端管控的Consumer做负载均衡处理:Master收到这类客户端注册的消费组后,不进行负载均衡操作,完全由客户端自己控制;
+- Consumer提供分区查询接口,供业务定期查询待消费的Topic集合对应的分区集合信息;
+- Consumer提供分区注册、注销接口,供业务对该客户端已注册需注销的分区进行注销操作,通过注册接口对指定未注册的分区进行注册,注册的时候支持业务指定注册的初始offset;
+- Consumer定期上报状态及分区注册信息,让Master侧感知各个SDK当前分区分配及注册情况,便于服务端获取整个组的分区分配信息;
+- Master提供查询API,支持运维通过API查询接口查询指定分区分配消费组各节点的分区分配情况。
+
+---
+<a href="#top">Back to top</a>
\ No newline at end of file
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/example.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/example.png
new file mode 100644
index 0000000..fd19f0c
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/example.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/flow_diagram.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/flow_diagram.png
new file mode 100644
index 0000000..77bdabb
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/flow_diagram.png differ
diff --git a/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/topic_assign.png b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/topic_assign.png
new file mode 100644
index 0000000..37257f5
Binary files /dev/null and b/i18n/zh-CN/docusaurus-plugin-content-docs/current/modules/tubemq/img/partition_assign/topic_assign.png differ