You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Xavier Lange (JIRA)" <ji...@apache.org> on 2016/11/29 19:17:58 UTC

[jira] [Updated] (KAFKA-4464) Clean shutdown of broker fails due to controller error

     [ https://issues.apache.org/jira/browse/KAFKA-4464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xavier Lange updated KAFKA-4464:
--------------------------------
    Affects Version/s: 0.10.1.0

> Clean shutdown of broker fails due to controller error
> ------------------------------------------------------
>
>                 Key: KAFKA-4464
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4464
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.10.1.0
>         Environment: kafka@86a156fd9dda:~$ java -version
> java version "1.8.0_60"
> Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
> Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode)
> kafka@86a156fd9dda:~$ uname -a
> Linux 86a156fd9dda 4.7.3-coreos-r2 #1 SMP Tue Nov 1 01:38:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
> kafka@86a156fd9dda:~$ ps alx | grep java
> 4  1000     1     0  20   0 75887304 3820220 futex_ Ssl ?     9379:49 /usr/lib/jvm/java-8-oracle/bin/java -Xmx3G -Xms3G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=10.60.68.122 -Dcom.sun.management.jmxremote.rmi.port=7203 -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7203 -Dkafka.logs.dir=/kafka/bin/../logs -Dlog4j.configuration=file:/kafka/bin/../config/log4j.properties -cp :/kafka/bin/../libs/aopalliance-repackaged-2.4.0-b34.jar:/kafka/bin/../libs/argparse4j-0.5.0.jar:/kafka/bin/../libs/connect-api-0.10.1.0.jar:/kafka/bin/../libs/connect-file-0.10.1.0.jar:/kafka/bin/../libs/connect-json-0.10.1.0.jar:/kafka/bin/../libs/connect-runtime-0.10.1.0.jar:/kafka/bin/../libs/guava-18.0.jar:/kafka/bin/../libs/hk2-api-2.4.0-b34.jar:/kafka/bin/../libs/hk2-locator-2.4.0-b34.jar:/kafka/bin/../libs/hk2-utils-2.4.0-b34.jar:/kafka/bin/../libs/jackson-annotations-2.6.0.jar:/kafka/bin/../libs/jackson-core-2.6.3.jar:/kafka/bin/../libs/jackson-databind-2.6.3.jar:/kafka/bin/../libs/jackson-jaxrs-base-2.6.3.jar:/kafka/bin/../libs/jackson-jaxrs-json-provider-2.6.3.jar:/kafka/bin/../libs/jackson-module-jaxb-annotations-2.6.3.jar:/kafka/bin/../libs/javassist-3.18.2-GA.jar:/kafka/bin/../libs/javax.annotation-api-1.2.jar:/kafka/bin/../libs/javax.inject-1.jar:/kafka/bin/../libs/javax.inject-2.4.0-b34.jar:/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/kafka/bin/../libs/javax.ws.rs-api-2.0.1.jar:/kafka/bin/../libs/jersey-client-2.22.2.jar:/kafka/bin/../libs/jersey-common-2.22.2.jar:/kafka/bin/../libs/jersey-container-servlet-2.22.2.jar:/kafka/bin/../libs/jersey-container-servlet-core-2.22.2.jar:/kafka/bin/../libs/jersey-guava-2.22.2.jar:/kafka/bin/../libs/jersey-media-jaxb-2.22.2.jar:/kafka/bin/../libs/jersey-server-2.22.2.jar:/kafka/bin/../libs/jetty-continuation-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-http-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-io-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-security-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-server-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-servlet-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-servlets-9.2.15.v20160210.jar:/kafka/bin/../libs/jetty-util-9.2.15.v20160210.jar:/kafka/bin/../libs/jopt-simple-4.9.jar:/kafka/bin/../libs/kafka-clients-0.10.1.0.jar:/kafka/bin/../libs/kafka-log4j-appender-0.10.1.0.jar:/kafka/bin/../libs/kafka-streams-0.10.1.0.jar:/kafka/bin/../libs/kafka-streams-examples-0.10.1.0.jar:/kafka/bin/../libs/kafka-tools-0.10.1.0.jar:/kafka/bin/../libs/kafka_2.11-0.10.1.0-sources.jar:/kafka/bin/../libs/kafka_2.11-0.10.1.0-test-sources.jar:/kafka/bin/../libs/kafka_2.11-0.10.1.0.jar:/kafka/bin/../libs/log4j-1.2.17.jar:/kafka/bin/../libs/lz4-1.3.0.jar:/kafka/bin/../libs/metrics-core-2.2.0.jar:/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/kafka/bin/../libs/reflections-0.9.10.jar:/kafka/bin/../libs/rocksdbjni-4.9.0.jar:/kafka/bin/../libs/scala-library-2.11.8.jar:/kafka/bin/../libs/scala-parser-combinators_2.11-1.0.4.jar:/kafka/bin/../libs/slf4j-api-1.7.21.jar:/kafka/bin/../libs/slf4j-log4j12-1.7.21.jar:/kafka/bin/../libs/snappy-java-1.1.2.6.jar:/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/kafka/bin/../libs/zkclient-0.9.jar:/kafka/bin/../libs/zookeeper-3.4.8.jar kafka.Kafka /kafka/config/server.properties
> This is running inside a docker container.
>            Reporter: Xavier Lange
>
> My cluster is unable to communicate to one of my brokers (Broker 1 in this case) and is spinning on logs:
> {code}
> [2016-11-29 19:05:08,659] WARN [ReplicaFetcherThread-0-1], Error in fetch kafka.server.ReplicaFetcherThread$FetchRequest@27aeb5f4 (kafka.server.ReplicaFetcherThread)
> java.io.IOException: Connection to 10.60.68.122:9092 (id: 1 rack: null) failed
> 	at kafka.utils.NetworkClientBlockingOps$.awaitReady$1(NetworkClientBlockingOps.scala:83)
> 	at kafka.utils.NetworkClientBlockingOps$.blockingReady$extension(NetworkClientBlockingOps.scala:93)
> 	at kafka.server.ReplicaFetcherThread.sendRequest(ReplicaFetcherThread.scala:248)
> 	at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:238)
> 	at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
> 	at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:118)
> 	at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:103)
> 	at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:63)
> {code}
> and on Broker 1 I tried issuing a shutdown and I get this:
> {code}
> [2016-11-29 18:33:16,152] INFO [Kafka Server 1], shutting down (kafka.server.KafkaServer)
> [2016-11-29 18:33:16,163] INFO [Kafka Server 1], Starting controlled shutdown (kafka.server.KafkaServer)
> [2016-11-29 18:33:38,347] INFO [Kafka Server 1], Remaining partitions to move: LONG_LIST_OF_TOPICS
> [2016-11-29 18:33:38,350] INFO [Kafka Server 1], Error code from controller: 0 (kafka.server.KafkaServer)
> [2016-11-29 18:33:43,356] WARN [Kafka Server 1], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer)
> [2016-11-29 18:34:04,053] INFO [Kafka Server 1], Remaining partitions to move: SAME_LONG_LIST_OF_TOPICS_AGAIN
> [2016-11-29 18:34:04,053] INFO [Kafka Server 1], Error code from controller: 0 (kafka.server.KafkaServer)
> [2016-11-29 18:34:09,054] WARN [Kafka Server 1], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer)
> [2016-11-29 18:34:32,577] INFO [Kafka Server 1], Remaining partitions to move: SAM_LONG_LIST_OF_TOPICS_AGAIN_AGAIN
> [2016-11-29 18:34:32,578] INFO [Kafka Server 1], Error code from controller: 0 (kafka.server.KafkaServer)
> [2016-11-29 18:34:37,579] WARN [Kafka Server 1], Retrying controlled shutdown after the previous attempt failed... (kafka.server.KafkaServer)
> [2016-11-29 18:34:37,586] WARN [Kafka Server 1], Proceeding to do an unclean shutdown as all the controlled shutdown attempts failed (kafka.server.KafkaServer)
> [2016-11-29 18:34:37,612] INFO [Socket Server on Broker 1], Shutting down (kafka.network.SocketServer)
> [2016-11-29 18:42:36,940] INFO Rolled new log segment for '__consumer_offsets-30' in 6 ms. (kafka.log.Log)
> [2016-11-29 18:43:52,440] INFO Deleting segment 71712593 from log __consumer_offsets-30. (kafka.log.Log)
> [2016-11-29 18:43:52,440] INFO Deleting segment 0 from log __consumer_offsets-30. (kafka.log.Log)
> [2016-11-29 18:43:52,492] INFO Deleting index /data/__consumer_offsets-30/00000000000071712593.index.deleted (kafka.log.OffsetIndex)
> [2016-11-29 18:43:52,532] INFO Deleting index /data/__consumer_offsets-30/00000000000000000000.index.deleted (kafka.log.OffsetIndex)
> [2016-11-29 18:43:52,532] INFO Deleting index /data/__consumer_offsets-30/00000000000000000000.timeindex.deleted (kafka.log.TimeIndex)
> [2016-11-29 18:43:52,549] INFO Deleting index /data/__consumer_offsets-30/00000000000071712593.timeindex.deleted (kafka.log.TimeIndex)
> [2016-11-29 18:43:53,370] INFO Deleting segment 72483593 from log __consumer_offsets-30. (kafka.log.Log)
> [2016-11-29 18:43:53,478] INFO Deleting index /data/__consumer_offsets-30/00000000000072483593.index.deleted (kafka.log.OffsetIndex)
> [2016-11-29 18:43:53,479] INFO Deleting index /data/__consumer_offsets-30/00000000000072483593.timeindex.deleted (kafka.log.TimeIndex)
> {code}
> so it says it's doing an unclean shutdown but then it refuses to stop the process. now I have this sort of zombie process and the other brokers are spinning even faster on trying to connect to it.
> What other logs can I provide to help debug this broker's failure?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)