You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (JIRA)" <ji...@apache.org> on 2017/12/12 00:52:00 UTC
[jira] [Commented] (KAFKA-2319) After controlled shutdown:
IllegalStateException: Kafka scheduler has not been started
[ https://issues.apache.org/jira/browse/KAFKA-2319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286884#comment-16286884 ]
Guozhang Wang commented on KAFKA-2319:
--------------------------------------
I think this issue still exists, but probably in a slight different manner.
Here is what I observed while shutting down a server:
{code}
[2017-12-11 16:37:32,620] ERROR [kafka-log-cleaner-thread-0]: Error due to (kafka.log.LogCleaner$CleanerThread:73)
java.lang.IllegalStateException: Kafka scheduler is not running.
at kafka.utils.KafkaScheduler.ensureRunning(KafkaScheduler.scala:132)
at kafka.utils.KafkaScheduler.schedule(KafkaScheduler.scala:106)
at kafka.log.Log.kafka$log$Log$$asyncDeleteSegment(Log.scala:1637)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1682)
at kafka.log.Log$$anonfun$replaceSegments$1.apply(Log.scala:1677)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.Log.replaceSegments(Log.scala:1677)
at kafka.log.Cleaner.cleanSegments(LogCleaner.scala:489)
at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:400)
at kafka.log.Cleaner$$anonfun$doClean$4.apply(LogCleaner.scala:399)
at scala.collection.immutable.List.foreach(List.scala:392)
at kafka.log.Cleaner.doClean(LogCleaner.scala:399)
at kafka.log.Cleaner.clean(LogCleaner.scala:376)
at kafka.log.LogCleaner$CleanerThread.cleanOrSleep(LogCleaner.scala:267)
at kafka.log.LogCleaner$CleanerThread.doWork(LogCleaner.scala:247)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:68)
{code}
The root cause is that we call {{Log#asyncDeleteSegment}} to asynchronously delete the segment, AND we shutdown scheduler first then cleaner thread. Hence the following race condition can happen (very likely):
1. Kafka server shutdown, {{KafkaScheduler#shutdown}} called first.
2. {{cachedExecutor.shutdown()}} called, which executed all the already-scheduled tasks.
3. {{KafkaScheduler#shutdown}} completed, {{executor}} is set to {{null}}.
4. The {{LogCleaner.doWork()}} is called then before it is shutting down, and then calling {{scheduler.schedule("delete-file", deleteSeg _, delay = config.fileDeleteDelayMs)}}. BOOM.
An easy fix would be re-ordering the shutdown sequence, to make sure callers of the scheduler like the cleaner thread is shutdown first.
> After controlled shutdown: IllegalStateException: Kafka scheduler has not been started
> --------------------------------------------------------------------------------------
>
> Key: KAFKA-2319
> URL: https://issues.apache.org/jira/browse/KAFKA-2319
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 1.0.1
> Reporter: Jason Rosenberg
>
> Running 0.8.2.1, just saw this today at the end of a controlled shutdown. It doesn't happen every time, but I've seen it several times:
> {code}
> 2015-07-07 18:54:28,424 INFO [Thread-4] server.KafkaServer - [Kafka Server 99], Controlled shutdown succeeded
> 2015-07-07 18:54:28,425 INFO [Thread-4] network.SocketServer - [Socket Server on Broker 99], Shutting down
> 2015-07-07 18:54:28,435 INFO [Thread-4] network.SocketServer - [Socket Server on Broker 99], Shutdown completed
> 2015-07-07 18:54:28,435 INFO [Thread-4] server.KafkaRequestHandlerPool - [Kafka Request Handler on Broker 99], shutting down
> 2015-07-07 18:54:28,444 INFO [Thread-4] server.KafkaRequestHandlerPool - [Kafka Request Handler on Broker 99], shut down completely
> 2015-07-07 18:54:28,649 INFO [Thread-4] server.ReplicaManager - [Replica Manager on Broker 99]: Shut down
> 2015-07-07 18:54:28,649 INFO [Thread-4] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] shutting down
> 2015-07-07 18:54:28,650 INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-95], Shutting down
> 2015-07-07 18:54:28,750 INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-95], Shutdown completed
> 2015-07-07 18:54:28,750 INFO [ReplicaFetcherThread-0-95] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-95], Stopped
> 2015-07-07 18:54:28,750 INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-98], Shutting down
> 2015-07-07 18:54:28,791 INFO [Thread-4] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-98], Shutdown completed
> 2015-07-07 18:54:28,791 INFO [ReplicaFetcherThread-0-98] server.ReplicaFetcherThread - [ReplicaFetcherThread-0-98], Stopped
> 2015-07-07 18:54:28,791 INFO [Thread-4] server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 99] shutdown completed
> 2015-07-07 18:54:28,819 INFO [Thread-4] server.ReplicaManager - [Replica Manager on Broker 99]: Shut down completely
> 2015-07-07 18:54:28,826 INFO [Thread-4] log.LogManager - Shutting down.
> 2015-07-07 18:54:30,459 INFO [Thread-4] log.LogManager - Shutdown complete.
> 2015-07-07 18:54:30,463 WARN [Thread-4] utils.Utils$ - Kafka scheduler has not been started
> java.lang.IllegalStateException: Kafka scheduler has not been started
> at kafka.utils.KafkaScheduler.ensureStarted(KafkaScheduler.scala:114)
> at kafka.utils.KafkaScheduler.shutdown(KafkaScheduler.scala:86)
> at kafka.controller.KafkaController.onControllerResignation(KafkaController.scala:350)
> at kafka.controller.KafkaController.shutdown(KafkaController.scala:664)
> at kafka.server.KafkaServer$$anonfun$shutdown$8.apply$mcV$sp(KafkaServer.scala:285)
> at kafka.utils.Utils$.swallow(Utils.scala:172)
> at kafka.utils.Logging$class.swallowWarn(Logging.scala:92)
> at kafka.utils.Utils$.swallowWarn(Utils.scala:45)
> at kafka.utils.Logging$class.swallow(Logging.scala:94)
> at kafka.utils.Utils$.swallow(Utils.scala:45)
> at kafka.server.KafkaServer.shutdown(KafkaServer.scala:285)
> ...
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)