You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@rocketmq.apache.org by "TheR1sing3un (via GitHub)" <gi...@apache.org> on 2023/02/06 05:10:54 UTC

[GitHub] [rocketmq] TheR1sing3un opened a new issue, #5989: The persistence broker id is used in place of address as the unique identification

TheR1sing3un opened a new issue, #5989:
URL: https://github.com/apache/rocketmq/issues/5989

   ## 现阶段问题
   现在采用brokerAddr作为broker的唯一标识,而brokerId作为一个可容忍丢失的标识。导致如下情景出现问题:
   
   - 容器环境下,每次Broker的重启会导致ip发生变化,导致之前的brokerAddr留下的记录没办法和重启后的broker联系起来,比如说syncStateSet等数据。
   - 临时加上VIP或者变更VIP,都会导致brokerAddr发生变化。
   ## 改进方案
   在Controller侧采用ClusterName:BrokerName:BrokerId作为唯一标识,不再以BrokerAddr作为唯一标识,并且需要对BrokerId进行持久化存储,由于ClusterName和BrokerName都是启动的时候在配置文件中配置好的,所以只需要处理BrokerId的分配和持久化问题。
   ### 上线流程
   ![](https://cdn.nlark.com/yuque/0/2023/jpeg/22446956/1673354680590-264b2e09-3cc8-4cfe-a269-11469e39b685.jpeg)
   #### 第一次上线
   ##### 1. GetNextBrokerIdReq
   Broker第一次上线的时候,只有配置文件中配置的ClusterName和BrokerName,以及自身的BrokerAddr。那么我们需要和Controller协商出一个在整个集群生命周期中都唯一确定的标识:BrokerId。该BrokerId从1开始,当Broker被选为Master的时候,即在其任期内都需要将BrokerId替换为0,当转变为Slave的时候再转化为原始的BrokerId。
   这时候发起一个GetNextBrokerId的请求到Controller,为了拿到当前的下一个待分配的BrokerId(从1开始分配)。
   #### 1.1 ReadFromDLedger
   此时controller接收到请求,然后走DLedger去获取到状态机的nextBrokerId数据。
   #### 2. GetNextBrokerIdResp
   Controller将nextBrokerId返回给Broker。
   #### 2.1, 2.2 CreateTempMetaFile
   Broker拿到NextBrokerId之后,创建一个临时文件`.broker.meta.temp`,里面记录了NextBrokerId(也就是期望apply的brokerId),以及自己生成一个Code也持久化到临时文件中。
   #### 3. ApplyBrokerIdReq
   Broker携带着当前自己的基本数据(ClusterName、BrokerName和BrokerAddress)以及此时期望apply的BrokerId和Code,发送一个ApplyBrokerId的请求到Controller。
   #### 3.1 CASApplyBrokerId
   Controller通过DLedger共识的append该事件,当该事件(日志)被apply到状态机的时候,判断此时是否可以apply该brokerId(若BrokerId已被分配则失败)。并且此时会记录下来该BrokerId和Code之间的关系。
   #### 4. ApplyBrokerIdResp
   若上一步成功Apply了该BrokerId,此时则返回成功给Broker,若失败则返回当前的nextBrokerId。
   #### 4.1, 4.2 CreateMetaFileFromTemp
   若上一步成功的apply了该BrokerId,那么此时可以视为Broker侧成功的分配了该BrokerId,那么此时我们也需要彻底将这个BrokerId的信息持久化,那么我们就可以直接原子删除`.broker.meta.temp`并创建`.broker.meta`。删除和创建这两步需为原子操作。
   > 经过上述流程,第一次上线的broker和controller成功协商出一个双方都认同的brokeId并持久化保存起来。
   
   #### 正常重启后的节点上线
   若是正常重启,那么则已经在双方协商出唯一的BrokerId,并且本地也在`broker.meta`中有该BrokerId的数据,那么就该注册流程不需要进行,直接继续后面的流程即可。
   > 如果在正常上线流程中出现了各种情况的宕机,则以下流程保证正确的BrokerId分配
   
   #### CreateTempMetaFile失败
   ![image.png](https://cdn.nlark.com/yuque/0/2023/png/22446956/1673356532517-9ce5c276-7d0a-4b18-96da-c4ddfee63906.png#averageHue=%23fcfbfb&clientId=u4ca8d54b-13ae-4&from=paste&height=419&id=u405eb7eb&name=image.png&originHeight=838&originWidth=1486&originalType=binary&ratio=1&rotation=0&showTitle=false&size=94541&status=done&style=none&taskId=ud2309f56-32f4-44f2-86b1-dfa20de096a&title=&width=743)
   如果是上图中的流程失败的话,那么Broker重启后,Controller侧的状态机本身也没有分配任何BrokerId。Broker自身也没有任何数据被保存。因此直接重新按照上述流程从头开始走即可。
   #### CreateTempMetaFile成功,ApplyBrokerId未成功
   若是Controller侧已经认为本次ApplyBrokerId请求不对(请求去分配一个已被分配的BrokerId,或者Code不相等),并且此时返回当前的NextBrokerId给Broker,那么此时Broker直接删除`.broker.meta.temp`文件,接下来回到第2步,重新开始该流程以及后续流程。
   ![image.png](https://cdn.nlark.com/yuque/0/2023/png/22446956/1673357744821-dca9d10a-53c3-4f4e-941f-61d637d4211e.png#averageHue=%23fbf7f6&clientId=u4ca8d54b-13ae-4&from=paste&height=166&id=ud515fbe1&name=image.png&originHeight=332&originWidth=640&originalType=binary&ratio=1&rotation=0&showTitle=false&size=21614&status=done&style=none&taskId=u75f79cae-d197-4a9c-80fc-80445751798&title=&width=320)
   #### ApplyBrokerId成功,CreateMetaFileFromTemp未成功
   上述情况可以出现在ApplyResult丢失、CAS删除并创建`broker.meta`但是失败了,这俩流程中。
   那么重启后,Controller侧是已经认为我们apply流程是成功的了,而且也已经在状态机中修改了BrokerId的分配数据,那么我们这时候重新直接开始步骤3,也就是发送applyBrokerId请求的这一步。
   ![image.png](https://cdn.nlark.com/yuque/0/2023/png/22446956/1673356873639-f66751a8-e0f6-426e-822a-1974f2ad70c2.png#averageHue=%23fdfdfd&clientId=u4ca8d54b-13ae-4&from=paste&height=69&id=u9f55e8f4&name=image.png&originHeight=138&originWidth=610&originalType=binary&ratio=1&rotation=0&showTitle=false&size=8975&status=done&style=none&taskId=u31d49c2b-647d-4f8f-928b-4a5dbaea417&title=&width=305)
   因为我们有`.broker.meta.temp`文件,可以从中拿到我们之前成功在Controller侧apply的BrokerId和Code,那么直接发送给Controller,如果Controller中存在该BrokerId并且Code和请求中的Code相等,那么视为成功。
   ### 正确上线后使用BrokerId作为唯一标识
   当正确上线之后,之后broker的请求和状态记录都以brokerId作为唯一标识。心跳等数据的记录都以brokerId为标识。
   同时controller侧也会记录当前该brokerId的address,在主从切换等时候用于通知broker主节点的address。
   
   
   
   The issue tracker is used for bug reporting purposes **ONLY** whereas feature request needs to follow the [RIP process](https://github.com/apache/rocketmq/wiki/RocketMQ-Improvement-Proposal). To avoid unnecessary duplication, please check whether there is a previous issue before filing a new one.
   
   It is recommended to start a discussion thread in the [mailing lists](http://rocketmq.apache.org/about/contact/) in cases of discussing your deployment plan, API clarification, and other non-bug-reporting issues.
   We welcome any friendly suggestions, bug fixes, collaboration, and other improvements.
   
   Please ensure that your bug report is clear and self-contained. Otherwise, it would take additional rounds of communication, thus more time, to understand the problem itself.
   
   Generally, fixing an issue goes through the following steps:
   1. Understand the issue reported;
   1. Reproduce the unexpected behavior locally;
   1. Perform root cause analysis to identify the underlying problem;
   1. Create test cases to cover the identified problem;
   1. Work out a solution to rectify the behavior and make the newly created test cases pass;
   1. Make a pull request and go through peer review;
   
   As a result, it would be very helpful yet challenging if you could provide an isolated project reproducing your reported issue. Anyway, please ensure your issue report is informative enough for the community to pick up. At a minimum, include the following hints:
   
   **BUG REPORT**
   
   1. Please describe the issue you observed:
   
   - What did you do (The steps to reproduce)?
   
   - What is expected to see?
   
   - What did you see instead?
   
   2. Please tell us about your environment:
   
   3. Other information (e.g. detailed explanation, logs, related issues, suggestions on how to fix, etc):
   
   **FEATURE REQUEST**
   
   1. Please describe the feature you are requesting.
   
   2. Provide any additional detail on your proposed use case for this feature.
   
   2. Indicate the importance of this issue to you (blocker, must-have, should-have, nice-to-have). Are you currently using any workarounds to address this issue?
   
   4. If there are some sub-tasks involved, use -[] for each sub-task and create a corresponding issue to map to the sub-task:
   
   - [sub-task1-issue-number](example_sub_issue1_link_here): sub-task1 description here, 
   - [sub-task2-issue-number](example_sub_issue2_link_here): sub-task2 description here,
   - ...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@rocketmq.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [rocketmq] TheR1sing3un closed issue #5989: The persistence broker id is used in place of address as the unique identification

Posted by "TheR1sing3un (via GitHub)" <gi...@apache.org>.
TheR1sing3un closed issue #5989: The persistence broker id is used in place of address as the unique identification
URL: https://github.com/apache/rocketmq/issues/5989


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@rocketmq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [rocketmq] TheR1sing3un commented on issue #5989: The persistence broker id is used in place of address as the unique identification

Posted by "TheR1sing3un (via GitHub)" <gi...@apache.org>.
TheR1sing3un commented on issue #5989:
URL: https://github.com/apache/rocketmq/issues/5989#issuecomment-1419417802

   > 现在不是已经有在Broker初次上线进行Register,并进行BrokerId的分配的逻辑,而且Apply到状态机,只是在change的时候没有持久化到broker.conf中。
   
   目前的分配的BrokerId,是无法作为一个唯一的broker标识来使用的,因为目前是单向的注册协议,也就是只有broker去controller侧拿到分配的id,但是controller并不知道该id是否被使用,以及broker侧是否成功的持久化保存成功该id。
   改进的目的主要是通过协商拿到一个在broker和controller双方视角都唯一且被持久化的brokerId,之后就可以可靠的使用该id作为标识了。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@rocketmq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [rocketmq] echooymxq commented on issue #5989: The persistence broker id is used in place of address as the unique identification

Posted by "echooymxq (via GitHub)" <gi...@apache.org>.
echooymxq commented on issue #5989:
URL: https://github.com/apache/rocketmq/issues/5989#issuecomment-1418579381

   现在不是已经有在Broker初次上线进行Register,并进行BrokerId的分配的逻辑,而且Apply到状态机,只是在change的时候没有持久化到broker.conf中。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@rocketmq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org