You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@bookkeeper.apache.org by "Ivan Kelly (Updated) (JIRA)" <ji...@apache.org> on 2011/11/25 12:57:40 UTC

[jira] [Updated] (BOOKKEEPER-56) Race condition of message handler in connection recovery in Hedwig client

     [ https://issues.apache.org/jira/browse/BOOKKEEPER-56?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ivan Kelly updated BOOKKEEPER-56:
---------------------------------

    Description: 
There's a race condition in the connection recovery logic in Hedwig client. The message handler user set might be overwritten incorrectly. 

When handling channelDisconnected event, we try to reconnect to Hedwig server. After the connection is created and subscribed, we'll call StartDelivery() to recover the message handler to the original one of the disconnected connection. But if during this process, user calls StartDelivery() to set a new message handler, it will get overwritten to the original one.

The process can be demonstrated as below:

|| main thread || netty worker thread ||
| StartDelivery(messageHandlerA) | |
| (connection Broken here, and recovered later...) |
|                                             | ResponseHandler::channelDisconnected()   (connection disconnected event received) |
|                                             | new SubscribeReconnectCallback(subHandler.getMessageHandler()) (store messageHandlerA in SubscribeReconnectCallback to recover later) |
|                                             | client.doConnect() (try reconnect)  |
|                                             | doSubUnsub() (resubscribe) |
|                                             | SubscriberResponseHandler::handleSubscribeResponse()  (subscription succeeds) |
| StartDelivery(messageHandlderB)             |                                                                               |
|                                             | SubscribeReconnectCallback::operationFinished()                               |
|                                             | StartDelvery(messageHandlerA)   (messageHandler get overwritten)              |   

I can stably reproduce this by simulating this race condition by put some sleep in ResponseHandler.

I think essentially speaking we should not store messageHandler in ResponseHandler, since the message handler is supposed to be bound to connection. Instead, no matter which connection is in use, we should use the same messageHandler, the one user set last time. So I think we should change to store messageHandler in the HedwigSubscriber, in this way we don't need to recover the handler in connection recovery and thus won't face this race condition.



  was:
There's a race condition in the connection recovery logic in Hedwig client. The message handler user set might be overwritten incorrectly. 

When handling channelDisconnected event, we try to reconnect to Hedwig server. After the connection is created and subscribed, we'll call StartDelivery() to recover the message handler to the original one of the disconnected connection. But if during this process, user calls StartDelivery() to set a new message handler, it will get overwritten to the original one.

The process can be demonstrated as below:

main thread__________________________________netty worker thread
__________________________________________________________________________________________________
StartDelivery(messageHandlerA)

(connection Broken here, and recovered later...)

____________________________________________ResponseHandler::channelDisconnected()   (connection disconnected event received)
____________________________________________new SubscribeReconnectCallback(subHandler.getMessageHandler()) (store messageHandlerA in SubscribeReconnectCallback to recover later)
____________________________________________client.doConnect() (try reconnect)
____________________________________________doSubUnsub() (resubscribe)
____________________________________________SubscriberResponseHandler::handleSubscribeResponse()  (subscription succeeds)
StartDelivery(messageHandlderB)

____________________________________________SubscribeReconnectCallback::operationFinished()
____________________________________________StartDelvery(messageHandlerA)   (messageHandler get overwritten)

I can stably reproduce this by simulating this race condition by put some sleep in ResponseHandler.

I think essentially speaking we should not store messageHandler in ResponseHandler, since the message handler is supposed to be bound to connection. Instead, no matter which connection is in use, we should use the same messageHandler, the one user set last time. So I think we should change to store messageHandler in the HedwigSubscriber, in this way we don't need to recover the handler in connection recovery and thus won't face this race condition.



    
> Race condition of message handler in connection recovery in Hedwig client
> -------------------------------------------------------------------------
>
>                 Key: BOOKKEEPER-56
>                 URL: https://issues.apache.org/jira/browse/BOOKKEEPER-56
>             Project: Bookkeeper
>          Issue Type: Bug
>          Components: hedwig-client
>    Affects Versions: 4.0.0
>            Reporter: Gavin Li
>            Assignee: Gavin Li
>             Fix For: 4.0.0
>
>         Attachments: patch_56
>
>
> There's a race condition in the connection recovery logic in Hedwig client. The message handler user set might be overwritten incorrectly. 
> When handling channelDisconnected event, we try to reconnect to Hedwig server. After the connection is created and subscribed, we'll call StartDelivery() to recover the message handler to the original one of the disconnected connection. But if during this process, user calls StartDelivery() to set a new message handler, it will get overwritten to the original one.
> The process can be demonstrated as below:
> || main thread || netty worker thread ||
> | StartDelivery(messageHandlerA) | |
> | (connection Broken here, and recovered later...) |
> |                                             | ResponseHandler::channelDisconnected()   (connection disconnected event received) |
> |                                             | new SubscribeReconnectCallback(subHandler.getMessageHandler()) (store messageHandlerA in SubscribeReconnectCallback to recover later) |
> |                                             | client.doConnect() (try reconnect)  |
> |                                             | doSubUnsub() (resubscribe) |
> |                                             | SubscriberResponseHandler::handleSubscribeResponse()  (subscription succeeds) |
> | StartDelivery(messageHandlderB)             |                                                                               |
> |                                             | SubscribeReconnectCallback::operationFinished()                               |
> |                                             | StartDelvery(messageHandlerA)   (messageHandler get overwritten)              |   
> I can stably reproduce this by simulating this race condition by put some sleep in ResponseHandler.
> I think essentially speaking we should not store messageHandler in ResponseHandler, since the message handler is supposed to be bound to connection. Instead, no matter which connection is in use, we should use the same messageHandler, the one user set last time. So I think we should change to store messageHandler in the HedwigSubscriber, in this way we don't need to recover the handler in connection recovery and thus won't face this race condition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira