You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/14 15:52:46 UTC

[jira] [Commented] (KAFKA-2300) Error in controller log when broker tries to rejoin cluster

    [ https://issues.apache.org/jira/browse/KAFKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743533#comment-14743533 ] 

ASF GitHub Bot commented on KAFKA-2300:
---------------------------------------

GitHub user fpj opened a pull request:

    https://github.com/apache/kafka/pull/212

    KAFKA-2300: Error in controller log when broker tries to rejoin cluster

    I have reopened this issue because the controller isn't cleaning up the state upon an exception and the test case was legitimately failing for me every now and then. I'm proposing a change to fix this.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/fpj/kafka 2300

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/kafka/pull/212.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #212
    
----
commit dbd1bf3a91c3e15ed2d14bf941c41c87b8116608
Author: flavio junqueira <fp...@apache.org>
Date:   2015-07-29T17:07:51Z

    KAFKA-2300: Error in controller log when broker tries to rejoin cluster

commit 9b6390ae1c474b90689ff53036120b4be44a3f8f
Author: flavio junqueira <fp...@apache.org>
Date:   2015-07-29T22:36:16Z

    Updated package name and removed unnecessary imports.

commit f1261b15b007d08e87d0ed56f7ec3fecbeddc276
Author: flavio junqueira <fp...@apache.org>
Date:   2015-07-30T09:57:34Z

    Fixed some style issues.

commit aa6ec90b15ac6d0e0f9e5a58d4fed7b1909d50c2
Author: flavio junqueira <fp...@apache.org>
Date:   2015-08-12T16:37:07Z

    KAFKA-2300: Wrapped all occurences of sendRequestToBrokers with try/catch
    and fixed string typo.

commit 7bd2edb83054a9be72dda3425930a68ea3ad494b
Author: flavio junqueira <fp...@apache.org>
Date:   2015-08-12T16:40:13Z

    KAFKA-2300: Removed unnecessary s" occurrences.

commit d5cfba343dac5967733c9415d4574256efdd764a
Author: fpj <fp...@apache.org>
Date:   2015-09-14T13:00:15Z

    Merge remote-tracking branch 'upstream/trunk' into 2300

commit 742519349463c879d8413aee2b3f12b2ae8888a8
Author: fpj <fp...@apache.org>
Date:   2015-09-14T13:47:50Z

    KAFKA-2300: Cleaning the state of broker request batch upon an exception.

----


> Error in controller log when broker tries to rejoin cluster
> -----------------------------------------------------------
>
>                 Key: KAFKA-2300
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2300
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.2.1
>            Reporter: Johnny Brown
>            Assignee: Flavio Junqueira
>             Fix For: 0.9.0.0
>
>         Attachments: KAFKA-2300-controller-logs.tar.gz, KAFKA-2300-repro.patch, KAFKA-2300.patch, KAFKA-2300.patch
>
>
> Hello Kafka folks,
> We are having an issue where a broker attempts to join the cluster after being restarted, but is never added to the ISR for its assigned partitions. This is a three-node cluster, and the controller is broker 2.
> When broker 1 starts, we see the following message in broker 2's controller.log.
> {{
> [2015-06-23 13:57:16,535] ERROR [BrokerChangeListener on Controller 2]: Error while handling broker changes (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> java.lang.IllegalStateException: Controller to broker state change requests batch is not empty while creating a new one. Some UpdateMetadata state changes Map(2 -> Map([prod-sver-end,1] -> (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)), 1 -> Map([prod-sver-end,1] -> (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)), 3 -> Map([prod-sver-end,1] -> (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1))) might be lost 
>   at kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:202)
>   at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:974)
>   at kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:399)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:371)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
>   at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
>   at kafka.utils.Utils$.inLock(Utils.scala:535)
>   at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
>   at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
>   at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> }}
> {{prod-sver-end}} is a topic we previously deleted. It seems some remnant of it persists in the controller's memory, causing an exception which interrupts the state change triggered by the broker startup.
> Has anyone seen something like this? Any idea what's happening here? Any information would be greatly appreciated.
> Thanks,
> Johnny



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)