You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/11/01 22:54:00 UTC

[jira] [Commented] (KAFKA-7537) Only include live brokers in the UpdateMetadataRequest sent to existing brokers if there is no change in the partition states

    [ https://issues.apache.org/jira/browse/KAFKA-7537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16672309#comment-16672309 ] 

ASF GitHub Bot commented on KAFKA-7537:
---------------------------------------

hzxa21 opened a new pull request #5869:  KAFKA-7537: Avoid sending full UpdateMetadataRequest to existing brokers in the cluster on broker changes to reduce controller memory footprint
URL: https://github.com/apache/kafka/pull/5869
 
 
   Currently when brokers join/leave the cluster without any partition states changes, controller will send out UpdateMetadataRequests containing the states of all partitions to all brokers. But for existing brokers in the cluster, the metadata diff between controller and the broker should only be the "live_brokers"/"offline_replicas" info. Only the brokers with empty metadata cache need the full UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all brokers can place nonnegligible memory pressure on the controller side, especially for large clusters with many brokers and a large number of partitions.
   
   Let's say in total we have N brokers, M partitions in the cluster and we want to add 1 brand new broker in the cluster. With RF=2, the memory footprint per partition in the UpdateMetadataRequest is ~200 Bytes. In the current controller implementation, if each of the N RequestSendThreads serializes and sends out the UpdateMetadataRequest at roughly the same time (which is very likely the case), we will end up using (N+1)*M*200B memory. However, we only need to send out one full UpdateMetadataReuqest in this case. More detail can be found in the jira ticket KAFKA-7537.
   
   This PR avoids sending out full UpdateMetadataReuqest in the following scenarios:
   1. On broker startup, send out full UpdateMetadataRequest to newly added brokers and only send out UpdateMetadataReuqest with empty partition states to existing brokers.
   2. On broker failure, if it doesn't require leader election, only include the states of partitions that are hosted by the dead broker(s) in the UpdateMetadataReuqest instead of including all partition states.
   
   _Note that after partition state change and replica state change, UpdateMetadataReuqests still need to be sent and this behavior remains unchanged after this PR._
   
   This PR also introduces a minor optimization in the MetadataCache update to avoid copying the previous partition states upon receiving UpdateMetadataRequest with no partition states.
   
   
   
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Only include live brokers in the UpdateMetadataRequest sent to existing brokers if there is no change in the partition states
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7537
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7537
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller
>            Reporter: Zhanxiang (Patrick) Huang
>            Assignee: Zhanxiang (Patrick) Huang
>            Priority: Major
>
> Currently if when brokers join/leave the cluster without any partition states changes, controller will send out UpdateMetadataRequests containing the states of all partitions to all brokers. But for existing brokers in the cluster, the metadata diff between controller and the broker should only be the "live_brokers" info. Only the brokers with empty metadata cache need the full UpdateMetadataRequest. Sending the full UpdateMetadataRequest to all brokers can place nonnegligible memory pressure on the controller side.
> Let's say in total we have N brokers, M partitions in the cluster and we want to add 1 brand new broker in the cluster. With RF=2, the memory footprint per partition in the UpdateMetadataRequest is ~200 Bytes. In the current controller implementation, if each of the N RequestSendThreads serializes and sends out the UpdateMetadataRequest at roughly the same time (which is very likely the case), we will end up using *(N+1)*M*200B*. In a large kafka cluster, we can have:
> {noformat}
> N=99
> M=100k
> Memory usage to send out UpdateMetadataRequest to all brokers:
> 100 * 100K * 200B = 2G
> However, we only need to send out full UpdateMetadataRequest to the newly added broker. We only need to include live broker ids (4B * 100 brokers) in the UpdateMetadataRequest sent to the existing 99 brokers. So the amount of data that is actully needed will be:
> 1 * 100K * 200B + 99 * (100 * 4B) = ~21M
> We will can potentially reduce 2G / 21M = ~95x memory footprint as well as the data tranferred in the network.{noformat}
>  
> This issue kind of hurts the scalability of a kafka cluster. KIP-380 and KAFKA-7186 also help to further reduce the controller memory footprint.
>  
> In terms of implementation, we can keep some in-memory state in the controller side to differentiate existing brokers and uninitialized brokers (e.g. brand new brokers) so that if there is no change in partition states, we only send out live brokers info to existing brokers.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)