You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "nabarun (JIRA)" <ji...@apache.org> on 2017/12/14 00:39:00 UTC

[jira] [Updated] (GEODE-4096) Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper thread and the _dispatchBatch method for the connection global variable.

     [ https://issues.apache.org/jira/browse/GEODE-4096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

nabarun updated GEODE-4096:
---------------------------
    Description: 
*+Order of execution for this race condition to occur+*.
#  _dispatchBatch is trying to dispatch a batch of events but was somehow unsuccessful 
# It silently decides that the remote server may not be ready so it wants to retry
# Same time we decide to stop the SerialGatewaySenderEventProcessor hence we call the Stopper Thread.
# Before the threads are started on all the senders / dispatchers it sets the isStopped flag for the SerialGatewaySenderEventProcessor to true.
# Then the _dispatchBatch method which was in retry mode makes a getConnection call to get the connection. This method does a check on the SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is set and this return null.
# This null is stored in the global variable connection for the dispatcher.
# Now that the _dispatchBatch method calls sees that the connection is null it should raise an exception and destroyConnection.
# Meanwhile there was a AckThreadReader that was running and the stopper thread for the event processor wants to stop it, but since the connection global variable was set to null by the get connection method call by _disptachBatch.
# Hence the shutDownAckReaderThreadConnection is executed on null and hence the AckReaderThread continues to keep running - being stuck on socketRead0.
# But the problem is that the AckReaderThread acquire a connectionLifeCycle.readLock. to readAcknowledgement, but the destroyConnection calls from the stopper thread and _dispatchBatch's exception handling code needs a connectionLifeCycleLock.writeLock which they can't because readLock is held by the AckReaderThread, causing a deadlock




  was:
*+Order of execution for this race condition to occur+*.
#  _dispatchBatch is trying to dispatch a batch of events but was somehow unsuccessful 
# It silently decides that the remote server may not be ready so it wants to retry
# Same time we decide to stop the SerialGatewaySenderEventProcessor hence we call the Stopper Thread.
# Before the threads are started on all the senders / dispatchers it sets the isStopped flag for the SerialGatewaySenderEventProcessor to true.
# Then the _dispatchBatch method which was in retry mode makes a getConnection call to get the connection. This method does a check on the SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is false and this return null.
# This null is stored in the global variable connection for the dispatcher.
# Now that the _dispatchBatch method calls sees that the connection is null it should raise an exception and destroyConnection.
# Meanwhile there was a AckThreadReader that was running and the stopper thread for the event processor wants to stop it, but since the connection global variable was set to null by the get connection method call by _disptachBatch.
# Hence the shutDownAckReaderThreadConnection is executed on null and hence the AckReaderThread continues to keep running - being stuck on socketRead0.
# But the problem is that the AckReaderThread acquire a connectionLifeCycle.readLock. to readAcknowledgement, but the destroyConnection calls from the stopper thread and _dispatchBatch's exception handling code needs a connectionLifeCycleLock.writeLock which they can't because readLock is held by the AckReaderThread, causing a deadlock





> Race Condition between ConcurrentSerialGatewaySenderEventProcessor stopper thread and the _dispatchBatch method for the connection global variable.
> ---------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-4096
>                 URL: https://issues.apache.org/jira/browse/GEODE-4096
>             Project: Geode
>          Issue Type: Bug
>          Components: wan
>            Reporter: nabarun
>            Assignee: nabarun
>
> *+Order of execution for this race condition to occur+*.
> #  _dispatchBatch is trying to dispatch a batch of events but was somehow unsuccessful 
> # It silently decides that the remote server may not be ready so it wants to retry
> # Same time we decide to stop the SerialGatewaySenderEventProcessor hence we call the Stopper Thread.
> # Before the threads are started on all the senders / dispatchers it sets the isStopped flag for the SerialGatewaySenderEventProcessor to true.
> # Then the _dispatchBatch method which was in retry mode makes a getConnection call to get the connection. This method does a check on the SerialGatewaySenderEventProcessor's isStopped flag. It sees that the flag is set and this return null.
> # This null is stored in the global variable connection for the dispatcher.
> # Now that the _dispatchBatch method calls sees that the connection is null it should raise an exception and destroyConnection.
> # Meanwhile there was a AckThreadReader that was running and the stopper thread for the event processor wants to stop it, but since the connection global variable was set to null by the get connection method call by _disptachBatch.
> # Hence the shutDownAckReaderThreadConnection is executed on null and hence the AckReaderThread continues to keep running - being stuck on socketRead0.
> # But the problem is that the AckReaderThread acquire a connectionLifeCycle.readLock. to readAcknowledgement, but the destroyConnection calls from the stopper thread and _dispatchBatch's exception handling code needs a connectionLifeCycleLock.writeLock which they can't because readLock is held by the AckReaderThread, causing a deadlock



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)