You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by "Stirling Chow (JIRA)" <ji...@apache.org> on 2010/11/04 21:03:00 UTC

[jira] Created: (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.
-----------------------------------------------------------------------------------------

                 Key: AMQ-3016
                 URL: https://issues.apache.org/activemq/browse/AMQ-3016
             Project: ActiveMQ
          Issue Type: Bug
          Components: Connector, Transport
    Affects Versions: 5.4.1
            Reporter: Stirling Chow


Symptom
========
I set up two Brokers and a network bridge from Broker A to Broker B over HTTP.  When the bridge is established, each Broker has a single transport connection (VM on broker A and HTTP on broker B) as recorded in RegionBroker.connections

I noticed that when Broker A was stopped (normally), periodically the HTTP connection would remain in Broker B's RegionBroker.connections until the InactivityMonitor on the connection timed out.  If the InactivityMonitor was disbled, then the connection would remain indefinitely.  

If Broker A was restarted, the bridge would be restarted and a second connection would be recorded in Broker B's RegionBroker.connections.  Repeating restarts of Broker A would cause an accumulation of "dead" connections, which would eventually lead to an OOM.

Cause
=====
When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and sends a ShutdownInfo command to the local and remote transports.  When the transports receive the ShutdownInfo, they remove the connection from their associated RegionBroker.connections as part of  TransportConnection.processRemoveConnection(ConnectionId, long):

    public synchronized Response processRemoveConnection(ConnectionId id, long lastDeliveredSequenceId)
            throws InterruptedException {
...
            try {
                broker.removeConnection(cs.getContext(), cs.getInfo(), null);
            } catch (Throwable e) {
                SERVICELOG.warn("Failed to remove connection " + cs.getInfo(), e);
            }

In the cases were Broker B would not clean up its connection, I traced the code and discovered that the ShutdownInfo message was not being sent because the remote transport (HttpClientTransport) had already had its "stopped" flag set to true as part of HttpClientTransport.oneway(Object command):

    public void oneway(Object command) throws IOException {

        if (isStopped()) {
            throw new IOException("stopped.");
        }
...

DemandForwardingBridgeSupport's stop() method has the following code:

    public void stop() throws Exception {
...
                    ASYNC_TASKS.execute(new Runnable() {
                        public void run() {
                            try {
                                localBroker.oneway(new ShutdownInfo());
                                sendShutdown.countDown();
                                remoteBroker.oneway(new ShutdownInfo());
                            } catch (Throwable e) {
                                LOG.debug("Caught exception sending shutdown", e);
                            } finally {
                                sendShutdown.countDown();
                            }
                        }
                    });
                    if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
                        LOG.info("Network Could not shutdown in a timely manner");
                    }
                } finally {
                    ServiceStopper ss = new ServiceStopper();
                    ss.stop(remoteBroker);
                    ss.stop(localBroker);
                    // Release the started Latch since another thread could be
                    // stuck waiting for it to start up.
                    startedLatch.countDown();
                    startedLatch.countDown();
                    localStartedLatch.countDown();
                    ss.throwFirstException();
                }
            }

ShutdownInfo is sent asynchronously to the local and remote transports by a slave thread:

                                localBroker.oneway(new ShutdownInfo());
                                sendShutdown.countDown();
                                remoteBroker.oneway(new ShutdownInfo());

The sendShutdown  latch is used by the master thread to prevent running the finally clause until the ShutdownInfo has been sent:

                    if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
                        LOG.info("Network Could not shutdown in a timely manner");
                    }
                } finally {
                    ServiceStopper ss = new ServiceStopper();
                    ss.stop(remoteBroker);
                    ss.stop(localBroker);
...
                }
            }

However, because the latch is countdown *before* remoteTransport.oneway(new ShutdownInfo()) there is a race condition and in most cases the main thread calls ss.stop(remoteBroker) before the slave thread has completed its call to remoteTransport.oneway(new ShutdownInfo()).  As a result, the remoteTransport appears already stopped and the ShutdownInfo is discarded.  This leaves the connection dangling on the remote broker until the InactivityMonitor closes it.

Solution
======
The sendShutdown latch should be countdown *after* remoteTransport.oneway(new ShutdownInfo()).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Posted by "Eric (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/activemq/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63084#action_63084 ] 

Eric edited comment on AMQ-3016 at 11/5/10 4:26 AM:
----------------------------------------------------

Hi

Since duplex connections are very different from non-duplex one, this patch must be carefully test in all conditions. 
Did you launch all ActiveMQ Maven tests and verify that no border effects are discovered ?

Regards
Eric-AWL

      was (Author: eric-awl):
    Hi

Since duplex connections are very different from non-duplex one, this patch must be carefully tests in all conditions. 
Did you launch all ActiveMQ Maven tests and verify that no border effects are discovered ?

Regards
Eric-AWL
  
> Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-3016
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3016
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Connector, Transport
>    Affects Versions: 5.4.1
>            Reporter: Stirling Chow
>         Attachments: ConnectionLeakTest.java, patch.txt
>
>
> Symptom
> ========
> I set up two Brokers and a network bridge from Broker A to Broker B over HTTP.  When the bridge is established, each Broker has a single transport connection (VM on broker A and HTTP on broker B) as recorded in RegionBroker.connections
> I noticed that when Broker A was stopped (normally), periodically the HTTP connection would remain in Broker B's RegionBroker.connections until the InactivityMonitor on the connection timed out.  If the InactivityMonitor was disbled, then the connection would remain indefinitely.  
> If Broker A was restarted, the bridge would be restarted and a second connection would be recorded in Broker B's RegionBroker.connections.  Repeating restarts of Broker A would cause an accumulation of "dead" connections, which would eventually lead to an OOM.
> Cause
> =====
> When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and sends a ShutdownInfo command to the local and remote transports.  When the transports receive the ShutdownInfo, they remove the connection from their associated RegionBroker.connections as part of  TransportConnection.processRemoveConnection(ConnectionId, long):
>     public synchronized Response processRemoveConnection(ConnectionId id, long lastDeliveredSequenceId)
>             throws InterruptedException {
> ...
>             try {
>                 broker.removeConnection(cs.getContext(), cs.getInfo(), null);
>             } catch (Throwable e) {
>                 SERVICELOG.warn("Failed to remove connection " + cs.getInfo(), e);
>             }
> In the cases were Broker B would not clean up its connection, I traced the code and discovered that the ShutdownInfo message was not being sent because the remote transport (HttpClientTransport) had already had its "stopped" flag set to true as part of HttpClientTransport.oneway(Object command):
>     public void oneway(Object command) throws IOException {
>         if (isStopped()) {
>             throw new IOException("stopped.");
>         }
> ...
> DemandForwardingBridgeSupport's stop() method has the following code:
>     public void stop() throws Exception {
> ...
>                     ASYNC_TASKS.execute(new Runnable() {
>                         public void run() {
>                             try {
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
>                             } catch (Throwable e) {
>                                 LOG.debug("Caught exception sending shutdown", e);
>                             } finally {
>                                 sendShutdown.countDown();
>                             }
>                         }
>                     });
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
>                     // Release the started Latch since another thread could be
>                     // stuck waiting for it to start up.
>                     startedLatch.countDown();
>                     startedLatch.countDown();
>                     localStartedLatch.countDown();
>                     ss.throwFirstException();
>                 }
>             }
> ShutdownInfo is sent asynchronously to the local and remote transports by a slave thread:
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
> The sendShutdown  latch is used by the master thread to prevent running the finally clause until the ShutdownInfo has been sent:
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
> ...
>                 }
>             }
> However, because the latch is countdown *before* remoteTransport.oneway(new ShutdownInfo()) there is a race condition and in most cases the main thread calls ss.stop(remoteBroker) before the slave thread has completed its call to remoteTransport.oneway(new ShutdownInfo()).  As a result, the remoteTransport appears already stopped and the ShutdownInfo is discarded.  This leaves the connection dangling on the remote broker until the InactivityMonitor closes it.
> Solution
> ======
> The sendShutdown latch should be countdown *after* remoteTransport.oneway(new ShutdownInfo()).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Posted by "Eric (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/activemq/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=63084#action_63084 ] 

Eric commented on AMQ-3016:
---------------------------

Hi

Since duplex connections are very different from non-duplex one, this patch must be carefully tests in all conditions. 
Did you launch all ActiveMQ Maven tests and verify that no border effects are discovered ?

Regards
Eric-AWL

> Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-3016
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3016
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Connector, Transport
>    Affects Versions: 5.4.1
>            Reporter: Stirling Chow
>         Attachments: ConnectionLeakTest.java, patch.txt
>
>
> Symptom
> ========
> I set up two Brokers and a network bridge from Broker A to Broker B over HTTP.  When the bridge is established, each Broker has a single transport connection (VM on broker A and HTTP on broker B) as recorded in RegionBroker.connections
> I noticed that when Broker A was stopped (normally), periodically the HTTP connection would remain in Broker B's RegionBroker.connections until the InactivityMonitor on the connection timed out.  If the InactivityMonitor was disbled, then the connection would remain indefinitely.  
> If Broker A was restarted, the bridge would be restarted and a second connection would be recorded in Broker B's RegionBroker.connections.  Repeating restarts of Broker A would cause an accumulation of "dead" connections, which would eventually lead to an OOM.
> Cause
> =====
> When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and sends a ShutdownInfo command to the local and remote transports.  When the transports receive the ShutdownInfo, they remove the connection from their associated RegionBroker.connections as part of  TransportConnection.processRemoveConnection(ConnectionId, long):
>     public synchronized Response processRemoveConnection(ConnectionId id, long lastDeliveredSequenceId)
>             throws InterruptedException {
> ...
>             try {
>                 broker.removeConnection(cs.getContext(), cs.getInfo(), null);
>             } catch (Throwable e) {
>                 SERVICELOG.warn("Failed to remove connection " + cs.getInfo(), e);
>             }
> In the cases were Broker B would not clean up its connection, I traced the code and discovered that the ShutdownInfo message was not being sent because the remote transport (HttpClientTransport) had already had its "stopped" flag set to true as part of HttpClientTransport.oneway(Object command):
>     public void oneway(Object command) throws IOException {
>         if (isStopped()) {
>             throw new IOException("stopped.");
>         }
> ...
> DemandForwardingBridgeSupport's stop() method has the following code:
>     public void stop() throws Exception {
> ...
>                     ASYNC_TASKS.execute(new Runnable() {
>                         public void run() {
>                             try {
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
>                             } catch (Throwable e) {
>                                 LOG.debug("Caught exception sending shutdown", e);
>                             } finally {
>                                 sendShutdown.countDown();
>                             }
>                         }
>                     });
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
>                     // Release the started Latch since another thread could be
>                     // stuck waiting for it to start up.
>                     startedLatch.countDown();
>                     startedLatch.countDown();
>                     localStartedLatch.countDown();
>                     ss.throwFirstException();
>                 }
>             }
> ShutdownInfo is sent asynchronously to the local and remote transports by a slave thread:
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
> The sendShutdown  latch is used by the master thread to prevent running the finally clause until the ShutdownInfo has been sent:
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
> ...
>                 }
>             }
> However, because the latch is countdown *before* remoteTransport.oneway(new ShutdownInfo()) there is a race condition and in most cases the main thread calls ss.stop(remoteBroker) before the slave thread has completed its call to remoteTransport.oneway(new ShutdownInfo()).  As a result, the remoteTransport appears already stopped and the ShutdownInfo is discarded.  This leaves the connection dangling on the remote broker until the InactivityMonitor closes it.
> Solution
> ======
> The sendShutdown latch should be countdown *after* remoteTransport.oneway(new ShutdownInfo()).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Posted by "Stirling Chow (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/activemq/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stirling Chow updated AMQ-3016:
-------------------------------

    Attachment: ConnectionLeakTest.java

This unit test exhibits the problem being reported.  In this test, a "validation" run is made using TCP as the bridge transport --- the race condition does not cause a leak because the remote end of the TCP connection detects the closing of the socket and cleans up the connection.  The "failed" run is made using HTTP as the bridge transport.  On my system, the race condition always occurs and causes the unit test to fail.  With the patch applied, both tests always succeed.

> Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-3016
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3016
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Connector, Transport
>    Affects Versions: 5.4.1
>            Reporter: Stirling Chow
>         Attachments: ConnectionLeakTest.java
>
>
> Symptom
> ========
> I set up two Brokers and a network bridge from Broker A to Broker B over HTTP.  When the bridge is established, each Broker has a single transport connection (VM on broker A and HTTP on broker B) as recorded in RegionBroker.connections
> I noticed that when Broker A was stopped (normally), periodically the HTTP connection would remain in Broker B's RegionBroker.connections until the InactivityMonitor on the connection timed out.  If the InactivityMonitor was disbled, then the connection would remain indefinitely.  
> If Broker A was restarted, the bridge would be restarted and a second connection would be recorded in Broker B's RegionBroker.connections.  Repeating restarts of Broker A would cause an accumulation of "dead" connections, which would eventually lead to an OOM.
> Cause
> =====
> When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and sends a ShutdownInfo command to the local and remote transports.  When the transports receive the ShutdownInfo, they remove the connection from their associated RegionBroker.connections as part of  TransportConnection.processRemoveConnection(ConnectionId, long):
>     public synchronized Response processRemoveConnection(ConnectionId id, long lastDeliveredSequenceId)
>             throws InterruptedException {
> ...
>             try {
>                 broker.removeConnection(cs.getContext(), cs.getInfo(), null);
>             } catch (Throwable e) {
>                 SERVICELOG.warn("Failed to remove connection " + cs.getInfo(), e);
>             }
> In the cases were Broker B would not clean up its connection, I traced the code and discovered that the ShutdownInfo message was not being sent because the remote transport (HttpClientTransport) had already had its "stopped" flag set to true as part of HttpClientTransport.oneway(Object command):
>     public void oneway(Object command) throws IOException {
>         if (isStopped()) {
>             throw new IOException("stopped.");
>         }
> ...
> DemandForwardingBridgeSupport's stop() method has the following code:
>     public void stop() throws Exception {
> ...
>                     ASYNC_TASKS.execute(new Runnable() {
>                         public void run() {
>                             try {
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
>                             } catch (Throwable e) {
>                                 LOG.debug("Caught exception sending shutdown", e);
>                             } finally {
>                                 sendShutdown.countDown();
>                             }
>                         }
>                     });
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
>                     // Release the started Latch since another thread could be
>                     // stuck waiting for it to start up.
>                     startedLatch.countDown();
>                     startedLatch.countDown();
>                     localStartedLatch.countDown();
>                     ss.throwFirstException();
>                 }
>             }
> ShutdownInfo is sent asynchronously to the local and remote transports by a slave thread:
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
> The sendShutdown  latch is used by the master thread to prevent running the finally clause until the ShutdownInfo has been sent:
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
> ...
>                 }
>             }
> However, because the latch is countdown *before* remoteTransport.oneway(new ShutdownInfo()) there is a race condition and in most cases the main thread calls ss.stop(remoteBroker) before the slave thread has completed its call to remoteTransport.oneway(new ShutdownInfo()).  As a result, the remoteTransport appears already stopped and the ShutdownInfo is discarded.  This leaves the connection dangling on the remote broker until the InactivityMonitor closes it.
> Solution
> ======
> The sendShutdown latch should be countdown *after* remoteTransport.oneway(new ShutdownInfo()).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (AMQ-3016) Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.

Posted by "Stirling Chow (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/activemq/browse/AMQ-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stirling Chow updated AMQ-3016:
-------------------------------

    Attachment: patch.txt

Patch to DemandForwardingBridgeSupport which removes the premature call to sendShutdown.countDown.  countDown is already called by the finally clause.

> Race condition in DemandForwardingBridgeSupport can cause remote connection to be leaked.
> -----------------------------------------------------------------------------------------
>
>                 Key: AMQ-3016
>                 URL: https://issues.apache.org/activemq/browse/AMQ-3016
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: Connector, Transport
>    Affects Versions: 5.4.1
>            Reporter: Stirling Chow
>         Attachments: ConnectionLeakTest.java, patch.txt
>
>
> Symptom
> ========
> I set up two Brokers and a network bridge from Broker A to Broker B over HTTP.  When the bridge is established, each Broker has a single transport connection (VM on broker A and HTTP on broker B) as recorded in RegionBroker.connections
> I noticed that when Broker A was stopped (normally), periodically the HTTP connection would remain in Broker B's RegionBroker.connections until the InactivityMonitor on the connection timed out.  If the InactivityMonitor was disbled, then the connection would remain indefinitely.  
> If Broker A was restarted, the bridge would be restarted and a second connection would be recorded in Broker B's RegionBroker.connections.  Repeating restarts of Broker A would cause an accumulation of "dead" connections, which would eventually lead to an OOM.
> Cause
> =====
> When Broker A is stopped, DemandForwardingBridgeSupport.stop() is called and sends a ShutdownInfo command to the local and remote transports.  When the transports receive the ShutdownInfo, they remove the connection from their associated RegionBroker.connections as part of  TransportConnection.processRemoveConnection(ConnectionId, long):
>     public synchronized Response processRemoveConnection(ConnectionId id, long lastDeliveredSequenceId)
>             throws InterruptedException {
> ...
>             try {
>                 broker.removeConnection(cs.getContext(), cs.getInfo(), null);
>             } catch (Throwable e) {
>                 SERVICELOG.warn("Failed to remove connection " + cs.getInfo(), e);
>             }
> In the cases were Broker B would not clean up its connection, I traced the code and discovered that the ShutdownInfo message was not being sent because the remote transport (HttpClientTransport) had already had its "stopped" flag set to true as part of HttpClientTransport.oneway(Object command):
>     public void oneway(Object command) throws IOException {
>         if (isStopped()) {
>             throw new IOException("stopped.");
>         }
> ...
> DemandForwardingBridgeSupport's stop() method has the following code:
>     public void stop() throws Exception {
> ...
>                     ASYNC_TASKS.execute(new Runnable() {
>                         public void run() {
>                             try {
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
>                             } catch (Throwable e) {
>                                 LOG.debug("Caught exception sending shutdown", e);
>                             } finally {
>                                 sendShutdown.countDown();
>                             }
>                         }
>                     });
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
>                     // Release the started Latch since another thread could be
>                     // stuck waiting for it to start up.
>                     startedLatch.countDown();
>                     startedLatch.countDown();
>                     localStartedLatch.countDown();
>                     ss.throwFirstException();
>                 }
>             }
> ShutdownInfo is sent asynchronously to the local and remote transports by a slave thread:
>                                 localBroker.oneway(new ShutdownInfo());
>                                 sendShutdown.countDown();
>                                 remoteBroker.oneway(new ShutdownInfo());
> The sendShutdown  latch is used by the master thread to prevent running the finally clause until the ShutdownInfo has been sent:
>                     if (!sendShutdown.await(10, TimeUnit.SECONDS)) {
>                         LOG.info("Network Could not shutdown in a timely manner");
>                     }
>                 } finally {
>                     ServiceStopper ss = new ServiceStopper();
>                     ss.stop(remoteBroker);
>                     ss.stop(localBroker);
> ...
>                 }
>             }
> However, because the latch is countdown *before* remoteTransport.oneway(new ShutdownInfo()) there is a race condition and in most cases the main thread calls ss.stop(remoteBroker) before the slave thread has completed its call to remoteTransport.oneway(new ShutdownInfo()).  As a result, the remoteTransport appears already stopped and the ShutdownInfo is discarded.  This leaves the connection dangling on the remote broker until the InactivityMonitor closes it.
> Solution
> ======
> The sendShutdown latch should be countdown *after* remoteTransport.oneway(new ShutdownInfo()).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.