You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by agrabil <gr...@ins.com> on 2007/04/27 21:24:12 UTC

Network connector failover problems

Hello,
I'm using ActiveMQ 4.1.1 and testing a scenario using network-connector
failover.  I have three brokers, A, B, and C.  On brokers A and B, I have
the ConsumerTool running, waiting for messages for MyQueue.  On broker C, I
have the following network-connector defined:

<networkConnectors>
      <networkConnector name="failover-test"
uri="static://(failover://(tcp://brokerA:61616,tcp://brokerB:61616)?randomize=false)"/>
</networkConnectors>

When I start up the three brokers, C connects successfully to A.  Then, I
run the ProducerTool on broker C, posting messages to MyQueue, which happily
are consumed by the ConsumerTool waiting for messages there.

Now, I shutdown broker A (or pull the ethernet cable), and broker C is
notified, and appears to successfully "failover" to broker B.  That is, I
see the TCP connection from broker C established to the IP:61616 of broker
B.  However, if I now run the ProducerTool again on broker C, posting to
MyQueue, the messages are not sent to broker B, and are therefore not
consumed by that ConsumerTool that is waiting on B.  If I restart broker C
at this point (with broker A still down), it connects to broker B and
immediately forwards the persistent messages for MyQueue such that the
ConsumerTool receives them.

I see that Bug AMQ-734 seems to be related to this, but as I am running
4.1.1, which is where this bug is fixed, I think I may have a different
issue.

Any help would be greatly appreciated.

Greg


-- 
View this message in context: http://www.nabble.com/Network-connector-failover-problems-tf3659372s2354.html#a10224971
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: Network connector failover problems

Posted by agrabil <gr...@ins.com>.
Hello again,
I *think* I have found the source of this issue in the code, and I have made
a patch that seems to work.  However, I really don't know all of the
implications of this patch.  I would greatly appreciate if someone from the
development team  who knows this code could let me know what they think. 
Basically, the patch comes down to the transportResumed() method of the
inline TransportListener  for the remoteBroker instance in the start()
method of the DemandForwardingBridgeSupport class.

When the DemandForwardingBridgeSupport.start() method is originally called,
it performs the following at the end of that start() method:

        localBroker.start();
        remoteBroker.start();
        
        try{                    	
            triggerRemoteStartBridge();
        }catch(IOException e){
            log.warn("Caught exception from remote start",e);
        }

When the connection from Broker C to Broker A is broken, the remoteBroker's
TransportListener transportInterrupted method is invoked, which ends by
setting the local and remote bridges to not started:
                	
                    localBridgeStarted.set(false);
                    remoteBridgeStarted.set(false);
                    startedLatch = new CountDownLatch(2);


When the connection from Broker C fails-over to Broker B, the remoteBroker's
TransportListener transportResumed method is invoked, but it does not
trigger the (re)start of the local and remote bridges.  I believe that this
is necessary, because later when the MessageDispatch command is received, it
blocks in the waitStarted() method.

So, the patch is to add the following to the end of the transportResumed()
method of the remoteBroker's inline TransportListener:

public void transportResumed(){
                    ...
                    <code left out for brevity>
                    ...
                    lastConnectSucceeded.set(false);

                    log.debug("Outbound transport to " + remoteBrokerName +
" resumed");                                        

                    try{                    	
                        localBroker.start();
                        remoteBroker.start();                        
                        triggerRemoteStartBridge();
                    }catch(Exception e){
                        log.warn("Caught exception from remote restart",e);
                    }
                }
            }
}


In my tests, as described below, this resolves the issue such that messages
produced on C, after it has failed-over to B, are properly forwarded and
consumed by the consumer on B.  Again, I am not sure of all of the
implications of this patch, so I would like some feedback from someone who
would know better than me ;-)

Thanks,
Greg Rabil




agrabil wrote:
> 
> Hello,
> I'm using ActiveMQ 4.1.1 and testing a scenario using network-connector
> failover.  I have three brokers, A, B, and C.  On brokers A and B, I have
> the ConsumerTool running, waiting for messages for MyQueue.  On broker C,
> I have the following network-connector defined:
> 
> <networkConnectors>
>       <networkConnector name="failover-test"
> uri="static://(failover://(tcp://brokerA:61616,tcp://brokerB:61616)?randomize=false)"/>
> </networkConnectors>
> 
> When I start up the three brokers, C connects successfully to A.  Then, I
> run the ProducerTool on broker C, posting messages to MyQueue, which
> happily are consumed by the ConsumerTool waiting for messages there.
> 
> Now, I shutdown broker A (or pull the ethernet cable), and broker C is
> notified, and appears to successfully "failover" to broker B.  That is, I
> see the TCP connection from broker C established to the IP:61616 of broker
> B.  However, if I now run the ProducerTool again on broker C, posting to
> MyQueue, the messages are not sent to broker B, and are therefore not
> consumed by that ConsumerTool that is waiting on B.  If I restart broker C
> at this point (with broker A still down), it connects to broker B and
> immediately forwards the persistent messages for MyQueue such that the
> ConsumerTool receives them.
> 
> I see that Bug AMQ-734 seems to be related to this, but as I am running
> 4.1.1, which is where this bug is fixed, I think I may have a different
> issue.
> 
> Any help would be greatly appreciated.
> 
> Greg
> 
> 
> 

-- 
View this message in context: http://www.nabble.com/Network-connector-failover-problems-tf3659372s2354.html#a10275742
Sent from the ActiveMQ - User mailing list archive at Nabble.com.