You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Jason Gustafson (JIRA)" <ji...@apache.org> on 2019/05/20 23:37:00 UTC

[jira] [Comment Edited] (KAFKA-6029) Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest

    [ https://issues.apache.org/jira/browse/KAFKA-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844388#comment-16844388 ] 

Jason Gustafson edited comment on KAFKA-6029 at 5/20/19 11:36 PM:
------------------------------------------------------------------

I think we can actually resolve this as an unintended benefit of [KIP-320|https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation]. When the controller shrinks the ISR, it bumps the epoch. The bumped epoch prevents the shutting down follower from being added back to the ISR. The controller may still send a LeaderAndIsr request to the shutting down broker with the updated epoch, but the shutting down broker will not restart the fetcher.


was (Author: hachikuji):
I think we can actually resolve this as a unintended benefit of [KIP-320|https://cwiki.apache.org/confluence/display/KAFKA/KIP-320%3A+Allow+fetchers+to+detect+and+handle+log+truncation]. When the controller shrinks the ISR, it bumps the epoch. The bumped epoch prevents the shutting down follower from being added back to the ISR. The controller may still send a LeaderAndIsr request to the shutting down broker with the updated epoch, but the shutting down broker will not restart the fetcher.

> Controller should wait for the leader migration to finish before ack a ControlledShutdownRequest
> ------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-6029
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6029
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: controller, core
>    Affects Versions: 1.0.0
>            Reporter: Jiangjie Qin
>            Assignee: Zhanxiang (Patrick) Huang
>            Priority: Major
>
> In the controlled shutdown process, the controller will return the ControlledShutdownResponse immediately after the state machine is updated. Because the LeaderAndIsrRequests and UpdateMetadataRequests may not have been successfully processed by the brokers, the leader migration and active ISR shrink may not have done when the shutting down broker proceeds to shut down. This will cause some of the leaders to take up to replica.lag.time.max.ms to kick the broker out of ISR. Meanwhile the produce purgatory size will grow.
> Ideally, the controller should wait until all the LeaderAndIsrRequests and UpdateMetadataRequests has been acked before sending back the ControlledShutdownResponse.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)