You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@kafka.apache.org by "Ben Kirwin (JIRA)" <ji...@apache.org> on 2018/05/10 15:15:00 UTC

[jira] [Commented] (KAFKA-661) Prevent a shutting down broker from re-entering the ISR

    [ https://issues.apache.org/jira/browse/KAFKA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16470529#comment-16470529 ] 

Ben Kirwin commented on KAFKA-661:
----------------------------------

{quote}The leader that is being shut down receives a leaderAndIsrRequest informing it is no longer the leader and thus starts up a follower which starts issuing fetch requests to the new leader. We then shrink the ISR and send a StopReplicaRequest to the shutting down broker. However, the new leader upon receiving the fetch request expands the ISR again.
{quote}
This seems to happen when the dying broker is a follower as well, for similar reasons: it can send a fetch request after the controlled shutdown request is complete, which re-expands the ISR to include the dying broker.

I'm having a look at what it will take to use the stop-replica callbacks to implement this suggestion. Hopefully not too complicated!

> Prevent a shutting down broker from re-entering the ISR
> -------------------------------------------------------
>
>                 Key: KAFKA-661
>                 URL: https://issues.apache.org/jira/browse/KAFKA-661
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.0, 0.8.1
>            Reporter: Joel Koshy
>            Priority: Major
>
> There is a timing issue in controlled shutdown that affects low-volume topics. The leader that is being shut down receives a leaderAndIsrRequest informing it is no longer the leader and thus starts up a follower which starts issuing fetch requests to the new leader. We then shrink the ISR and send a StopReplicaRequest to the shutting down broker. However, the new leader upon receiving the fetch request expands the ISR again.
> This does not really have critical impact in the sense that it can cause producers to that topic to timeout. However, there are probably very few or no produce requests coming in as it primarily affects low-volume topics. The shutdown logic itself seems to be working correctly in that the leader has been successfully moved.
> One possible approach would be to use the callback feature in the ControllerBrokerRequestBatch and wait until the StopReplicaRequest has been processed by the shutting down broker before shrinking the ISR; and there are probably other ways as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)