You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@rocketmq.apache.org by GitBox <gi...@apache.org> on 2022/08/04 08:07:07 UTC

[GitHub] [rocketmq] lizhimins opened a new issue, #4778: Design of optimize log transfer protocol

lizhimins opened a new issue, #4778:
URL: https://github.com/apache/rocketmq/issues/4778

   # Design of log replication transfer protocol
   
   In a distributed system, broker or namesrv nodes may go down, so we introduce a dynamic multiple replica architecture to achieve high availability at the application service level. In the traditional active-standby mode, the two replicas cannot achieve majority election and do not have the ability to switch by itself. However, the cost of three replicas required by Raft is relatively high, the storage layer does not reuse the persistence model of RocketMQ's native implementation, and there are some problems such as inflexible message confirmation. In the design of “RIP 44”, it has been proposed to implement the DLedger Controller consistency agreement on NameServer https://shimo.im/docs/N2A1Mz9QZltQZoAD.
   
   The proposal has the following core elements:
   
   1. Multiple DLedger Controllers elect the master through consensus protocol.
   
      1. The assignment of the identity of a single broker replica group is completed by the master controller (BrokerId = 0 means the master)
      2. For the non-switching version, you only need to control the controller not to elect a new master. It is controlled by the parameter allowElectMaster, which realizes the dynamic switching between the selected master and the non-selected master version.
   
   2. The Broker obtains the replica group information of the BrokerMemberGroup from the Controller by means of event notification, polling and redirection, and modifies its own identity.
   
   3. In the traditional architecture, the lease mechanism is generally used to avoid multi-master situation.
   
      Considering the network partition, the Controller is unavailable but not actively downgraded. In this design, without lease mechanism is not adopted, and dual masters cannot be avoided in the case of network partitions.
   
      For example, if the old master is isolated by the network, and allAckInSyncStateSet / minInSyncReplica is not configured, the message will still be written successfully, and the master and slave are not synchronized.
   
   4. Use Epoch-StartOffset and MaxOffset to determine lastConsistentPoint (Offset) and maintain CommitLog and EpochMap
   
   ## Transfer frame format
   
   The newly implemented log service protocol communicates based on frames in the format of
   
   frame length (int) + message type (int) + write timestamp (long) + epoch (long) + payload (byte[], length = total-24) 
   
   ![image-20220804143508950](https://user-images.githubusercontent.com/22487634/182796635-e1c1280a-2d5f-456f-baa7-b829417d63c5.png)
   
   ## Communication protocol and behavior
   
   ### 1. Five states of the state machine
   
   They are Ready - Handshake - Transfer - Suspend - Shutdown
   
   **Ready phase**: After recover the commitLog, consumeQueue, TimerLog and other states, broker will enter the ready state.
   
   At this time, the broker obtains the BrokerMemberGroup from controller, and will establish a TCP connection to the master, and the protocol negotiation will be completed at this stage.
   
   **HandShake phase**: The master and slave determine the fork point through the Epoch-StartOffset mechanism, and determine the replication start point according to the configuration to ensure data consistency between the master and the slave.
   
   **Transfer stage**: In the normal data transfer stage, the master pushes the part commitlog (block data) to the slave, and the slave will reply the current processing point to the master to calculate confirm offset
   
   **Suspend phase**: The master and slave are working online at the same time, and the heartbeat is maintained in the connection.
   
   **Shutdown phase**: Once process shutdown, HA-related services (GroupTransfer HaNotified, etc.) will graceful shutdown;
   
   ![image-20220804143508950](https://user-images.githubusercontent.com/22487634/182796695-7545495e-a01d-4cda-8b57-93cdddce3f5c.png)
   
   ### 2. Communication Protocol
   
   Divided into 3 groups, HA negotiation, offset negotiation and transmission, a total of 7 Request Response Code
   
   #### 1. HA negotiation phase
   
   Code: HANDSHAKE_SLAVE - HANDSHAKE_MASTER
   
   The standby initiates a HANDSHAKE_SLAVE request to the master to add itself as the slave to obtain the master's data.
   
   1. In the non-elect version, the brokerId will not unchanged, and the slave brokerPerm is read-only.
   
   2. In selecting the master version, the brokerId will not be repeated for the master because the controller assigns the brokerId.
   
   CheckResult indicates whether the master accepts the slave to establish a connection, which is an enumerated value.
   
   If master accept to establish a connection, it is Success. Or the reason for disagreement may be ha protocol not supported, identity error such as incorrect brokerName. 
   
   #### 2. Offset Negotiation Phase
   
   Code: QUERY_EPOCH - RETURN_EPOCH - CONFIRM_TRUNCATE
   
   Content: The slave broker queries the epoch to the master, the master broker returns Map<epoch, start offset>, and the slave confirms to the primary after completing the data correction
   
   #### 3. Log copy and transfer phase
   
   Code: TRANSFER_DATA - TRANSFER_ACK
   
   Content: currentBlockEpoch, currentEpochStartOffset, currentBlockStartOffset, BlockStartOffset, payload
   
   The master sends the epoch information of the current block, the starting point, the current confirmation point and other information to the slave, and the slave returns the currently received point after receiving it.
   
   The master will calculate the confirm offset of the replica group according to the ack information of the standby node and the configuration of message confirmation (minInSyncReplica).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@rocketmq.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [rocketmq] lizhimins closed issue #4778: Design of optimize log transfer protocol

Posted by "lizhimins (via GitHub)" <gi...@apache.org>.
lizhimins closed issue #4778: Design of optimize log transfer protocol
URL: https://github.com/apache/rocketmq/issues/4778


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@rocketmq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org