You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (Resolved) (JIRA)" <ji...@apache.org> on 2011/11/12 19:26:51 UTC

[jira] [Resolved] (KAFKA-148) Shutdown During a Log Flush Throws Exception, Triggers Startup in Recovery Mode

     [ https://issues.apache.org/jira/browse/KAFKA-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Kreps resolved KAFKA-148.
-----------------------------

       Resolution: Duplicate
    Fix Version/s: 0.7
    
> Shutdown During a Log Flush Throws Exception, Triggers Startup in Recovery Mode
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-148
>                 URL: https://issues.apache.org/jira/browse/KAFKA-148
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.6
>         Environment: Independent. Observed on Linux data05 2.6.38-10-server #46~lucid1-Ubuntu SMP Wed Jul 6 20:19:32 UTC 2011 x86_64 GNU/Linux.
>            Reporter: C. Scott Andreas
>            Priority: Minor
>              Labels: patch
>             Fix For: 0.7
>
>         Attachments: executor-shutdown-patch.diff
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> If a log is being flushed when Kafka is shut down, KafkaScheduler immediately terminates the execution of the flush tasks, causing exceptions to be thrown and aborting the clean shutdown process. When restarted, the instance will launch in recovery mode, scanning the most recent log segment for each topic before launching.
> This issue is particularly highlighted by environments which have a large number of topics and a low flush threshold. In our case, nearly every deploy to our Kafka service in which the app is restarted triggers an unclean shutdown followed by startup in recovery mode. The real pain here is reading through every log file on each restart, but this can be avoided entirely.
> The behavior is caused by the way the KafkaScheduler threadpool (responsible for log flushing) is shut down. Currently, KafkaScheduler.shutdown() calls "executor.shutdownNow()." executor.shutdownNow()'s behavior immediately cancels all running tasks in a threadpool and terminates. This causes each Log to throw a java.nio.channels.ClosedByInterruptException as the flush to disk is interrupted, aborting the clean shutdown. Here's a sample log of the exceptions we see thrown when restarting Kafka: https://gist.github.com/1265940
> The fix for this is pretty simple. Rather than calling executor.shutdownNow() in KafkaScheduler.shutdown(), we can call executor.shutdown() (compare Javadoc: http://bit.ly/mZUSaD). This method "initiates an orderly shutdown in which previously submitted tasks are executed, but no new tasks will be accepted." Currently-running flushes are allowed to complete (generally very quick), future flush tasks are cancelled, no exceptions are thrown, and the shutdown finishes cleanly within about a second.
> I've attached the patch we've applied and deployed. It applies cleanly to 0.6.1.
> - Scott

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira