You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Michał Woś (JIRA)" <ji...@apache.org> on 2017/02/18 03:41:44 UTC

[jira] [Comment Edited] (KAFKA-2319) After controlled shutdown: IllegalStateException: Kafka scheduler has not been started

    [ https://issues.apache.org/jira/browse/KAFKA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872967#comment-15872967 ] 

Michał Woś edited comment on KAFKA-2319 at 2/18/17 3:41 AM:
------------------------------------------------------------

I've got the same problem after SIGTERM. Moreover: each time I restart gracefully brokers I'm loosing data. I'm pretty sure not in the producer side, I see from producer metrics that there are retries during initializing of broker stop, but no records dropped recorded. Producer thinks that all records were stored. However they are missing on the server side.
Moreover between "kafka.log.LogManager: Shutting down." and "INFO kafka.log.LogManager: Shutdown complete." there is 4 minute pause (nothing is printed in logs). So yes - stopping broker takes >4 minutes.
I'm running kafka from cloudera which is based on 0.8.2.0 + (I belive) 127 commits. To be exact, Cloudera version is named: 	0.8.2.0+kafka1.4.0+127. Each broker leads ~200 topics.

Full log:
{code}
...
2017-02-17 17:55:54,059 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-0-2032], Shutting down
2017-02-17 17:55:54,061 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-0-2032], Stopped 
2017-02-17 17:55:54,061 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-0-2032], Shutdown completed
2017-02-17 17:55:54,062 INFO kafka.server.ReplicaFetcherManager: [ReplicaFetcherManager on broker 721] shutdown completed
2017-02-17 17:55:54,090 INFO kafka.server.ReplicaManager: [Replica Manager on Broker 721]: Shut down completely
2017-02-17 17:55:54,091 INFO kafka.log.LogManager: Shutting down.
// note it is still not killed, 4 minutes break and finally:
2017-02-17 17:59:56,174 INFO kafka.log.LogManager: Shutdown complete.
2017-02-17 17:59:56,179 WARN kafka.utils.Utils$: Kafka scheduler has not been started
java.lang.IllegalStateException: Kafka scheduler has not been started
	at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
	at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
	at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:354)
	at kafka.controller.KafkaController.shutdown(KafkaController.scala:677)
	at kafka.server.KafkaServer$$anonfun$shutdownapply$mcV$sp(KafkaServer.scala:285)
	at kafka.utils.Utils$.swallow(Utils.scala:172)
	at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
	at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
	at kafka.utils.Logging$class.swallow(Logging.scala:94)
	at kafka.utils.Utils$.swallow(Utils.scala:45)
	at kafka.server.KafkaServer.shutdown(KafkaServer.scala:285)
	at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42)
	at kafka.Kafka$$anonrun(Kafka.scala:42)
2017-02-17 17:59:56,179 INFO org.I0Itec.zkclient.ZkEventThread: Terminate ZkClient event thread.
2017-02-17 17:59:56,190 INFO org.apache.zookeeper.ZooKeeper: Session: 0x45a1d0f178a1267 closed
2017-02-17 17:59:56,190 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2017-02-17 17:59:56,199 INFO kafka.server.KafkaServer: [Kafka Server 721], shut down completed
{code}

Can someone confirm that kafka 0.9.0.0 resolves that issue for sure? [~hachikuji] Did you upgrade? Did it help?



was (Author: wosiu):
I've got the same problem after SIGTERM. Moreover: each time I restart gracefully brokers I'm loosing data. I'm pretty sure not in the producer side, I see from producer metrics that there are retries during initializing of broker stop, but no records dropped recorded. Producer thinks that all records were stored. However they are missing on the server side.
Moreover between "kafka.log.LogManager: Shutting down." and "INFO kafka.log.LogManager: Shutdown complete." there is 4 minute pause (nothing is printed in logs). So yes - stopping broker takes ~4 minutes.
I'm running kafka from cloudera which is based on 0.8.2.0 + (I belive) 127 commits. To be exact, Cloudera version is named: 	0.8.2.0+kafka1.4.0+127. Each broker leads ~200 topics.

Full log:
{code}
...
2017-02-17 17:55:54,059 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-0-2032], Shutting down
2017-02-17 17:55:54,061 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-0-2032], Stopped 
2017-02-17 17:55:54,061 INFO kafka.server.ReplicaFetcherThread: [ReplicaFetcherThread-0-2032], Shutdown completed
2017-02-17 17:55:54,062 INFO kafka.server.ReplicaFetcherManager: [ReplicaFetcherManager on broker 721] shutdown completed
2017-02-17 17:55:54,090 INFO kafka.server.ReplicaManager: [Replica Manager on Broker 721]: Shut down completely
2017-02-17 17:55:54,091 INFO kafka.log.LogManager: Shutting down.
// note it is still not killed, 4 minutes break and finally:
2017-02-17 17:59:56,174 INFO kafka.log.LogManager: Shutdown complete.
2017-02-17 17:59:56,179 WARN kafka.utils.Utils$: Kafka scheduler has not been started
java.lang.IllegalStateException: Kafka scheduler has not been started
	at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
	at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
	at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:354)
	at kafka.controller.KafkaController.shutdown(KafkaController.scala:677)
	at kafka.server.KafkaServer$$anonfun$shutdownapply$mcV$sp(KafkaServer.scala:285)
	at kafka.utils.Utils$.swallow(Utils.scala:172)
	at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
	at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
	at kafka.utils.Logging$class.swallow(Logging.scala:94)
	at kafka.utils.Utils$.swallow(Utils.scala:45)
	at kafka.server.KafkaServer.shutdown(KafkaServer.scala:285)
	at kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:42)
	at kafka.Kafka$$anonrun(Kafka.scala:42)
2017-02-17 17:59:56,179 INFO org.I0Itec.zkclient.ZkEventThread: Terminate ZkClient event thread.
2017-02-17 17:59:56,190 INFO org.apache.zookeeper.ZooKeeper: Session: 0x45a1d0f178a1267 closed
2017-02-17 17:59:56,190 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2017-02-17 17:59:56,199 INFO kafka.server.KafkaServer: [Kafka Server 721], shut down completed
{code}

Can someone confirm that kafka 0.9.0.0 resolves that issue for sure? [~hachikuji] Did you upgrade? Did it help?


> After controlled shutdown: IllegalStateException: Kafka scheduler has not been started
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-2319
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2319
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jason Rosenberg
>
> Running 0.8.2.1, just saw this today at the end of a controlled shutdown.  It doesn't happen every time, but I've seen it several times:
> {code}
> 2015-07-07 18:54:28,424  INFO [Thread-4] server.KafkaServer - [Kafka Server 99], Controlled shutdown succeeded
> 2015-07-07 18:54:28,425  INFO [Thread-4] network.SocketServer - [Socket Server on Broker 99], Shutting down
> 2015-07-07 18:54:28,435  INFO [Thread-4] network.SocketServer - [Socket Server on Broker 99], Shutdown completed
> 2015-07-07 18:54:28,435  INFO [Thread-4] server.KafkaRequestHandlerPool - [Kafka Request Handler on Broker 99], shutting down
> 2015-07-07 18:54:28,444  INFO [Thread-4] server.KafkaRequestHandlerPool - [Kafka Request Handler on Broker 99], shut down completely
> 2015-07-07 18:54:28,649  INFO [Thread-4] server.ReplicaManager - [Replica Manager on Broker 99]: Shut down
> 2015-07-07 18:54:28,649  INFO [Thread-4] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] shutting down
> 2015-07-07 18:54:28,650  INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-95], Shutting down
> 2015-07-07 18:54:28,750  INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-95], Shutdown completed
> 2015-07-07 18:54:28,750  INFO [ReplicaFetcherThread-0-95] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-95], Stopped
> 2015-07-07 18:54:28,750  INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-98], Shutting down
> 2015-07-07 18:54:28,791  INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-98], Shutdown completed
> 2015-07-07 18:54:28,791  INFO [ReplicaFetcherThread-0-98] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-98], Stopped
> 2015-07-07 18:54:28,791  INFO [Thread-4] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] shutdown completed
> 2015-07-07 18:54:28,819  INFO [Thread-4] server.ReplicaManager - [Replica Manager on Broker 99]: Shut down completely
> 2015-07-07 18:54:28,826  INFO [Thread-4] log.LogManager - Shutting down.
> 2015-07-07 18:54:30,459  INFO [Thread-4] log.LogManager - Shutdown complete.
> 2015-07-07 18:54:30,463  WARN [Thread-4] utils.Utils$ - Kafka scheduler has not been started
> java.lang.IllegalStateException: Kafka scheduler has not been started
>         at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
>         at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
>         at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
>         at kafka.controller.KafkaController.shutdown(KafkaController.scala:664)
>         at kafka.server.KafkaServer$$anonfun$shutdown$8.apply$mcV$sp(KafkaServer.scala:285)
>         at kafka.utils.Utils$.swallow(Utils.scala:172)
>         at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
>         at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
>         at kafka.utils.Logging$class.swallow(Logging.scala:94)
>         at kafka.utils.Utils$.swallow(Utils.scala:45)
>         at kafka.server.KafkaServer.shutdown(KafkaServer.scala:285)
>  ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)