You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@activemq.apache.org by "Eric (JIRA)" <ji...@apache.org> on 2010/07/11 10:25:53 UTC

[jira] Issue Comment Edited: (AMQ-2774) Network of brokers : Multicast discovery stopped to work

    [ https://issues.apache.org/activemq/browse/AMQ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=60582#action_60582 ] 

Eric edited comment on AMQ-2774 at 7/11/10 4:25 AM:
----------------------------------------------------

If I resume my view of the problem

When the problem occurs
- Network links between 2 brokers, are quickly and alternatively on/off. In french, we say that the network is "bagotting" :-)
- The DemandForwardingBridgeSupport.stop() method is called before the start() method  (including sons threads start methods) is fully exectuted

Consequences
- With NO DUPLEX connection, network of brokers are re-established, but some threads are created and blocked on the RemoteBrokerNameKnownLatch latch. In this case, there is no RemoteBrokerNameKnownLatch.await() call in the start() method, so the latch is awaited in the startLocalBridge() method which is called by a dedicated thread. So, a major problem occured when ressources are consumed if this kind of network faults are frequent, and the number of network connections is important. 
- With DUPLEX connection,  the latch is awaited in the start() method itself.  The main network connector thread is concerned. So the network connector is completely blocked.

I currently try to
- add a RemoteBrokerNameLatch.countDown at the end of the stop() method
- test the disposed AtomicBoolean value to correctly break the starting process in start() for DUPLEX, and in startLocalBridge() for NO-DUPLEX.

I think it will be better for DUPLEX since network connector thread will be freed, but I don't know if the son thread will be correctly destroyed.

Eric-AWL 

      was (Author: eric-awl):
    If I resume my view of the problem

When the problem occurs
- Network links between 2 brokers, are quickly and alternatively on/off. In french, we say that the network is "bagotting" :-)
- The DemandForwardingBridgeSupport.stop() method is called before the start() method  (including sons threads start methods) is fully exectuted

- With NO DUPLEX connection, network of brokers are re-established, but some threads are created and blocked on the RemoteBrokerNameKnownLatch latch. In this case, there is no RemoteBrokerNameKnownLatch.await() call in the start() method, so the latch is awaited in the startLocalBridge() method which is called by a dedicated thread. So, a major problem occured when ressources are consumed if this kind of network faults are frequent, and the number of network connections is important. 
- With DUPLEX connection,  the latch is awaited in the start() method itself.  The main network connector thread is concerned. So the network connector is completely blocked.

I currently try to
- add a RemoteBrokerNameLatch.countDown at the end of the stop() method
- test the disposed AtomicBoolean value to correctly break the starting process in start() for DUPLEX, and in startLocalBridge() for NO-DUPLEX.

I think it will be better for DUPLEX since network connector thread will be freed, but I don't know if the son thread will be correctly destroyed.

Eric-AWL 
  
> Network of brokers : Multicast discovery stopped to work
> --------------------------------------------------------
>
>                 Key: AMQ-2774
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2774
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.2.0
>         Environment: Linux
>            Reporter: Eric
>             Fix For: 5.4.1
>
>         Attachments: JMAC-BEA-lastlog.log-20100315
>
>
> Hi everybody
> I experiment a big problem with the multicast discovery algorithm, in a network of brokers topology.
> In some conditions, a broker can't reestablish a distant connection even if the distant broker is restarted.
> I have the log traces that would help to identify the origin of the problem.
> When there is no discovery/connection error, I can see these 2 lines in the activemq log file
> #08 Jun 2010 14:31:30,639  INFO  [Multicast Discovery Agent Notifier] org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false
> #08 Jun 2010 14:31:30,692  INFO  [StartLocalBridge: localBroker=vm://ACCLU-tpnocp04v#26] org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#26 and tcp://tpnocp09v-bus/10.18.126.28:13100(MOM-tpnocp09v) has been established.
> When the connection is broken, I can see this line in the log.
> #11 Jun 2010 12:37:32,585  INFO  [Multicast Discovery Agent Notifier] org.apache.activemq.network.DemandForwardingBridge
> ACCLU-tpnocp04v bridge to MOM-tpnocp09v stopped
> Then the current ACCLU-tpnocp04v broker tries to reestablish the connection :
> #11 Jun 2010 12:37:34,475  INFO  [Multicast Discovery Agent Notifier] org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to tcp://tpnocp09v-bus:13100?useLocalHost=false
> But, here, the second line of the log ("has been established") doesn't appear in the log file !! I don't know exactly if the connection is up or not.
> Then the connection is broken again (look at "Unknown" instead of "MOM-tpnocp09v".
> #11 Jun 2010 13:33:58,655  WARN  [ActiveMQ Transport: tcp://tpnocp09v-bus/10.18.126.28:13100] org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#58 and tcp://tpnocp09v-bus/10.18.126.28:13100 shutdown due to a remote error: java.net.SocketException: Connection reset
> #11 Jun 2010 13:33:58,657  INFO  [NetworkBridge] org.apache.activemq.network.DemandForwardingBridge^M
> ACCLU-tpnocp04v bridge to Unknown stopped
> And, now, even if I restart the distant broker ( MOM-tpnocp09v ), no line (Establishing/Has been established) appears, and no network connection is reestablished between ACCLU-tpnocp04v and MOM-tpnocp09v. it seems that this ACCLU-tpnocp04v broker can no longer establish a connection with the MOM-tpnocp09v broker !!!
> The production teams tell me that this problem seems not to be resolved in fuse-5.3.0.6 version.
> Eric-AWL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.