You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Dhruvil Shah (Jira)" <ji...@apache.org> on 2020/05/06 04:48:00 UTC
[jira] [Updated] (KAFKA-9961) Brokers may be left in an inconsistent state after reassignment

     [ https://issues.apache.org/jira/browse/KAFKA-9961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dhruvil Shah updated KAFKA-9961:
--------------------------------
    Description: 
When completing a reassignment, the controller sends StopReplicaRequest to replicas that are not in the target assignment and removes them from the assignment in ZK. We do not have any retry mechanism to ensure that the broker is able to process the StopReplicaRequest successfully. Under certain circumstances, this could leave brokers in an inconsistent state, where they continue being the follower for this partition and end up with an inconsistent metadata cache.

We have seen messages like the following being spammed in the broker logs when we get into this situation:
{code:java}
While recording the replica LEO, the partition topic-1 hasn't been created.
{code}
This happens because the broker has neither received an updated LeaderAndIsrRequest for the new leader nor a StopReplicaRequest from the controller when the replica was removed from the assignment.

Note that we would require a restart of the affected broker to fix this situation. A controller failover would not fix it as the broker could continue being a replica for the partition until it receives a StopReplicaRequest, which would never happen in this case.

There seem to be couple of problems we should address:
 # We need a mechanism to retry replica deletions after partition reassignment is complete. The main challenge here is to be able to deal with cases where a broker has been decommissioned and may never come back up.
 # We could perhaps consider a mechanism to reconcile replica states across brokers, something similar to the solution proposed in [https://cwiki.apache.org/confluence/display/KAFKA/KIP-550%3A+Mechanism+to+Delete+Stray+Partitions+on+Broker].

  was:
When completing a reassignment, the controller sends StopReplicaRequest to replicas that are not in the target assignment and removes them from the assignment in ZK. We do not have any retry mechanism to ensure that the broker is able to process the StopReplicaRequest successfully. Under certain circumstances, this could leave brokers in an inconsistent state, where they continue being the follower for this partition and end up with an inconsistent metadata cache.

We have seen messages like the following being spammed in the broker logs when we get into this situation:
{code:java}
While recording the replica LEO, the partition topic-1 hasn't been created.
{code}
This happens because the broker has not an updated LeaderAndIsrRequest for the new leader nor a StopReplicaRequest from the controller when the replica was removed from the assignment.

Note that we would require a restart of the affected broker to fix this situation. A controller failover would not fix it as the broker could continue being a replica for the partition until it receives a StopReplicaRequest, which would never happen in this case.

There seem to be couple of problems we should address:
 # We need a mechanism to retry replica deletions after partition reassignment is complete. The main challenge here is to be able to deal with cases where a broker has been decommissioned and may never come back up.
 # We could perhaps consider a mechanism to reconcile replica states across brokers, something similar to the solution proposed in [https://cwiki.apache.org/confluence/display/KAFKA/KIP-550%3A+Mechanism+to+Delete+Stray+Partitions+on+Broker].


> Brokers may be left in an inconsistent state after reassignment
> ---------------------------------------------------------------
>
>                 Key: KAFKA-9961
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9961
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Dhruvil Shah
>            Priority: Major
>
> When completing a reassignment, the controller sends StopReplicaRequest to replicas that are not in the target assignment and removes them from the assignment in ZK. We do not have any retry mechanism to ensure that the broker is able to process the StopReplicaRequest successfully. Under certain circumstances, this could leave brokers in an inconsistent state, where they continue being the follower for this partition and end up with an inconsistent metadata cache.
> We have seen messages like the following being spammed in the broker logs when we get into this situation:
> {code:java}
> While recording the replica LEO, the partition topic-1 hasn't been created.
> {code}
> This happens because the broker has neither received an updated LeaderAndIsrRequest for the new leader nor a StopReplicaRequest from the controller when the replica was removed from the assignment.
> Note that we would require a restart of the affected broker to fix this situation. A controller failover would not fix it as the broker could continue being a replica for the partition until it receives a StopReplicaRequest, which would never happen in this case.
> There seem to be couple of problems we should address:
>  # We need a mechanism to retry replica deletions after partition reassignment is complete. The main challenge here is to be able to deal with cases where a broker has been decommissioned and may never come back up.
>  # We could perhaps consider a mechanism to reconcile replica states across brokers, something similar to the solution proposed in [https://cwiki.apache.org/confluence/display/KAFKA/KIP-550%3A+Mechanism+to+Delete+Stray+Partitions+on+Broker].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)