You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (JIRA)" <ji...@apache.org> on 2013/04/05 01:14:16 UTC

[jira] [Updated] (KAFKA-849) Bug in controller's startup/failover logic fails to update in memory leader and isr cache causing other state changes to work incorrectly

     [ https://issues.apache.org/jira/browse/KAFKA-849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-849:
--------------------------------

    Attachment: kafka-849-v1.patch

Fixed the bug so that leader and isr cache is updated whether or not the leader is alive. This is the right thing to do since the purpose of the cache is to record the last decision made. On controller failover, this is the value read from zookeeper.

Other than that, fixed couple other issues -

1. Changed list topics tool to also print whether or not the partition is under replicated. This makes it very easy to script the output of list topics to show only partitions that are under replicated
2. Reduced the noise in the logs due to failed metadata requests. There is not much value in logging this since when some brokers are down, the stack trace just complains that those brokers are down. We still return the correct error code to the client, so turned this error message to debug
                
> Bug in controller's startup/failover logic fails to update in memory leader and isr cache causing other state changes to work incorrectly
> -----------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-849
>                 URL: https://issues.apache.org/jira/browse/KAFKA-849
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: kafka-0.8, p1
>         Attachments: kafka-849-v1.patch
>
>
> partitionLeadershipInfo is the in memory cache of the controller that keeps track of every partition's "last elected" leader and isr. On controller startup/failover, this cache is bootstrapped only with those partitions whose leader is alive. This causes the leader and isr cache to be initialized incorrectly causing other state transitions related to new broker startup, existing broker failure to not work correctly. For instance, it does not allow the controller to send the list of *all* replicas that exist on a broker to it during startup.
> Another bug during controller startup is that it invokes OnlinePartition state change before OnlineReplica state change. This also breaks the guarantee that the controller sends a full list of replicas to a broker on startup

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira