You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "GEORGE LI (JIRA)" <ji...@apache.org> on 2019/02/22 21:22:00 UTC
[jira] [Comment Edited] (KAFKA-6794) Support for incremental replica reassignment

    [ https://issues.apache.org/jira/browse/KAFKA-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775598#comment-16775598 ] 

GEORGE LI edited comment on KAFKA-6794 at 2/22/19 9:21 PM:
-----------------------------------------------------------

I also have seen this issue.  When more than one broker is in the New Replicas of the reassignments,  the topic is big,  even with throttle,  the leader is working hard to sync to all the extra followers and could cause latency jump. 

 

{{One of the solutions is to execute the reassignment plans in an "Optimal" way.  Submit the reassignment plans in batches.   making sure each batch, the topic/partition will have only one extra New broker in the New Replicas,  wait till that reassignment completes, then resubmit another one.  e.g.  if the reassignment is (1,2,3,4) =>  (5,6,7,8),    Split it in 4 batches (buckets), every batch only 1 new replica.  }}

 

{{Batch 1:  (1,2,3,5)}}

{{Batch 2:  (1,2,5,6)}}

{{Batch 3:  (1,5,6,7)}}

{{Batch 4:  (5,6,7,8)}}

 

Between each batch,  check ZK node /admin/reassign_partitions exists,  if yes, sleep and check again,   if not, submit next batch. 

 

 


was (Author: sql_consulting):
I also have seen this issue.  When more than one broker is in the New Replicas of the reassignments,  the topic is big,  even with throttle,  the leader is working hard to sync to all the extra followers and could cause latency jump. 

 

{{One of solutions is execute the reassignment plans in an "Optimal" way.  Submit the reassignment plans in batches.   making sure each batch, the topic/partition will have only one extra New broker in the New Replicas,  wait till that reassignment completes, then resubmit another one.  e.g.  for if the reassignment is (1,2,3,4) =>  (5,6,7,8).   Split it in 4 batches (buckets), every batch only 1 new replica.  }}

 

{{Batch 1:  (1,2,3,5)}}

{{Batch 2:  (1,2,5,6)}}

{{Batch 3:  (1,5,6,7)}}

{{Batch 4:  (5,6,7,8)}}

 

Between each batch,  check ZK node /admin/reassign_partitions exists,  if yes, sleep and check again,   if not, submit next batch. 

 

 

> Support for incremental replica reassignment
> --------------------------------------------
>
>                 Key: KAFKA-6794
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6794
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jason Gustafson
>            Assignee: Viktor Somogyi-Vass
>            Priority: Major
>
> Say you have a replication factor of 4 and you trigger a reassignment which moves all replicas to new brokers. Now 8 replicas are fetching at the same time which means you need to account for 8 times the current producer load plus the catch-up replication. To make matters worse, the replicas won't all become in-sync at the same time; in the worst case, you could have 7 replicas in-sync while one is still catching up. Currently, the old replicas won't be disabled until all new replicas are in-sync. This makes configuring the throttle tricky since ISR traffic is not subject to it.
> Rather than trying to bring all 4 new replicas online at the same time, a friendlier approach would be to do it incrementally: bring one replica online, bring it in-sync, then remove one of the old replicas. Repeat until all replicas have been changed. This would reduce the impact of a reassignment and make configuring the throttle easier at the cost of a slower overall reassignment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)