You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Neha Narkhede (JIRA)" <ji...@apache.org> on 2013/02/05 02:09:11 UTC

[jira] [Updated] (KAFKA-749) Bug in socket server shutdown logic makes the broker hang on shutdown until it has to be killed

     [ https://issues.apache.org/jira/browse/KAFKA-749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neha Narkhede updated KAFKA-749:
--------------------------------

    Attachment: kafka-749-v1.patch

Bug fix includes the following changes -

1. KafkaRequestHandler
- Removed AllDone, instead the shutdown() command will set isRunning to false and wait for the request handler thread to finish processing existing request and then stop

2. RequestChannel
- Modified receiveRequest to wait on a condition variable if the queue is empty. The purpose is to introduce a clean way to wake it up when it is time to shutdown. 
- Added a close() API that will set isShuttingDown to true and signal the condition so all io threads waiting to receiveRequest() will return null
- Did not clear the queue since the io threads shutdown after the socket server. If we clear the queue, all io threads will try to get the next request and will get null until it shuts down. This time period should hopefully be very short, but it is still inefficient. This is ok for the io thread shutdown logic since it will just process one request before it shuts down as well. So its ok to not clear the queue. 

3. SocketServer
- Invoke close on request channel after the acceptor and processor threads are shutdown

4. KafkaServer
- Shutdown the socket server before the request handler. This ensures we don't accept and enqueue more requests that will timeout anyway.
                
> Bug in socket server shutdown logic makes the broker hang on shutdown until it has to be killed
> -----------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-749
>                 URL: https://issues.apache.org/jira/browse/KAFKA-749
>             Project: Kafka
>          Issue Type: Bug
>          Components: network
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>            Assignee: Neha Narkhede
>            Priority: Blocker
>              Labels: bugs, p1
>         Attachments: kafka-749-v1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> The current shutdown logic of the server shuts down the io threads first, followed by acceptor and finally processor threads. The shutdown API of io threads enqueues a special AllDone command into the common request queue. It shuts down the io thread when it dequeues this special all done command. What can happen is that while this shutdown command processing is happening on the io threads, the network/processor threads can still accept new connections and requests and will add those new requests to the request queue. That means, more requests can be enqueued after the AllDone command. What happens is that after the io threads have shutdown, there is no thread available to dequeue from the request queue. So the processor threads can hang while adding new requests to a full request queue, thereby blocking the server from shutting down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira