You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/09/14 15:52:46 UTC
[jira] [Commented] (KAFKA-2300) Error in controller log when broker
tries to rejoin cluster
[ https://issues.apache.org/jira/browse/KAFKA-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743533#comment-14743533 ]
ASF GitHub Bot commented on KAFKA-2300:
---------------------------------------
GitHub user fpj opened a pull request:
https://github.com/apache/kafka/pull/212
KAFKA-2300: Error in controller log when broker tries to rejoin cluster
I have reopened this issue because the controller isn't cleaning up the state upon an exception and the test case was legitimately failing for me every now and then. I'm proposing a change to fix this.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/fpj/kafka 2300
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/kafka/pull/212.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #212
----
commit dbd1bf3a91c3e15ed2d14bf941c41c87b8116608
Author: flavio junqueira <fp...@apache.org>
Date: 2015-07-29T17:07:51Z
KAFKA-2300: Error in controller log when broker tries to rejoin cluster
commit 9b6390ae1c474b90689ff53036120b4be44a3f8f
Author: flavio junqueira <fp...@apache.org>
Date: 2015-07-29T22:36:16Z
Updated package name and removed unnecessary imports.
commit f1261b15b007d08e87d0ed56f7ec3fecbeddc276
Author: flavio junqueira <fp...@apache.org>
Date: 2015-07-30T09:57:34Z
Fixed some style issues.
commit aa6ec90b15ac6d0e0f9e5a58d4fed7b1909d50c2
Author: flavio junqueira <fp...@apache.org>
Date: 2015-08-12T16:37:07Z
KAFKA-2300: Wrapped all occurences of sendRequestToBrokers with try/catch
and fixed string typo.
commit 7bd2edb83054a9be72dda3425930a68ea3ad494b
Author: flavio junqueira <fp...@apache.org>
Date: 2015-08-12T16:40:13Z
KAFKA-2300: Removed unnecessary s" occurrences.
commit d5cfba343dac5967733c9415d4574256efdd764a
Author: fpj <fp...@apache.org>
Date: 2015-09-14T13:00:15Z
Merge remote-tracking branch 'upstream/trunk' into 2300
commit 742519349463c879d8413aee2b3f12b2ae8888a8
Author: fpj <fp...@apache.org>
Date: 2015-09-14T13:47:50Z
KAFKA-2300: Cleaning the state of broker request batch upon an exception.
----
> Error in controller log when broker tries to rejoin cluster
> -----------------------------------------------------------
>
> Key: KAFKA-2300
> URL: https://issues.apache.org/jira/browse/KAFKA-2300
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8.2.1
> Reporter: Johnny Brown
> Assignee: Flavio Junqueira
> Fix For: 0.9.0.0
>
> Attachments: KAFKA-2300-controller-logs.tar.gz, KAFKA-2300-repro.patch, KAFKA-2300.patch, KAFKA-2300.patch
>
>
> Hello Kafka folks,
> We are having an issue where a broker attempts to join the cluster after being restarted, but is never added to the ISR for its assigned partitions. This is a three-node cluster, and the controller is broker 2.
> When broker 1 starts, we see the following message in broker 2's controller.log.
> {{
> [2015-06-23 13:57:16,535] ERROR [BrokerChangeListener on Controller 2]: Error while handling broker changes (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> java.lang.IllegalStateException: Controller to broker state change requests batch is not empty while creating a new one. Some UpdateMetadata state changes Map(2 -> Map([prod-sver-end,1] -> (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)), 1 -> Map([prod-sver-end,1] -> (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1)), 3 -> Map([prod-sver-end,1] -> (LeaderAndIsrInfo:(Leader:-2,ISR:1,LeaderEpoch:0,ControllerEpoch:165),ReplicationFactor:1),AllReplicas:1))) might be lost
> at kafka.controller.ControllerBrokerRequestBatch.newBatch(ControllerChannelManager.scala:202)
> at kafka.controller.KafkaController.sendUpdateMetadataRequest(KafkaController.scala:974)
> at kafka.controller.KafkaController.onBrokerStartup(KafkaController.scala:399)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ReplicaStateMachine.scala:371)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1$$anonfun$apply$mcV$sp$1.apply(ReplicaStateMachine.scala:359)
> at kafka.metrics.KafkaTimer.time(KafkaTimer.scala:33)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply$mcV$sp(ReplicaStateMachine.scala:358)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener$$anonfun$handleChildChange$1.apply(ReplicaStateMachine.scala:357)
> at kafka.utils.Utils$.inLock(Utils.scala:535)
> at kafka.controller.ReplicaStateMachine$BrokerChangeListener.handleChildChange(ReplicaStateMachine.scala:356)
> at org.I0Itec.zkclient.ZkClient$7.run(ZkClient.java:568)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> }}
> {{prod-sver-end}} is a topic we previously deleted. It seems some remnant of it persists in the controller's memory, causing an exception which interrupts the state change triggered by the broker startup.
> Has anyone seen something like this? Any idea what's happening here? Any information would be greatly appreciated.
> Thanks,
> Johnny
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)