You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Julian Scheid <ju...@googlemail.com> on 2007/09/05 14:15:33 UTC

Network-of-brokers re-synchronization after network disconnect?

Hi,

I have a durable topic distributed over two broker nodes and it's 
working just fine, however messages get lost when I artificially 
disconnect and later reconnect one of the brokers. To elaborate:

I've set up two broker nodes on two different hosts, broker B1 on host 
H1 and broker B2 on host H2. (In the production environment, these two 
hosts will be in two separate LANs that are connected through a WAN for 
which there's no guaranteed 100% availability, hence my 
disconnect/reconnect tests.)

Each broker is configured with a static network connection to forward to 
the broker on the other host, so B1 is configured to forward to B2 and 
B2 is configured to forward to B1. See below for the corresponding 
snippets from the configuration files.

On each host, I'm running a subscriber connected to the broker running 
on the same host, so I have subscriber S1 on H1 connected to B1, and S2 
on H2 connected to B2. Both subscribers are subscribed to the same topic 
T. (I've tried both durable and non-durable subscriptions.)

I'm also running a publisher on each host connected to the broker on 
localhost, so publisher P1 is running on H1 connected to B1, and P2 on 
H2 connected to B2, both publishing to the same topic T the subscribers 
are listening to. Both publishers are configured to send persistent 
messages. (I've tried both with infinite expiry and other expiry values, 
say 100 seconds).

To summarize, my setup looks like this:

    S1         S2       (subscribers)
     |         |
    B1 <-----> B2       (brokers)
     |         |
    P1         P2       (publishers)

(Host H1) (Host H2)

Now, in the normal case everything works as expected. If P1 sends a test 
message to topic T, both S1 and S2 get the message. Same if P2 sends a 
message. So the forwarding of messages between the two brokers 
apparently works fine.

My tests with durable subscriptions work fine too - if I temporarily 
unsubscribe, say S2, and then resubscribe it later, it gets any messages 
sent while it was unsubscribed - no matter whether those messages were 
sent from P1 or P2.

However, if I artificially disconnect host H2 from the network (by 
pulling the network cable) and then send a message from P1 to B1, that 
message will not be received by S2 after I reconnect H2 to the network. 
(It will obviously be received by S1 running on the same host as the 
publisher. It also WILL be received by S2 if I remove H2 from the 
network for only a short amount of time, maybe 5-10 seconds - but any 
longer, the message will get lost.)

I've tried re-subscribing S2 after reconnecting H2, but that didn't seem 
to help even in the case of a durable subscription, and it probably 
wouldn't be an acceptable solution anyway because then the subscribers 
would need to pay extra attention to network connectivity.

I've cranked up the log level to DEBUG and tried to find any hint in the 
broker logs, maybe something about a message dropped but couldn't find 
anything suspicious.

I've tried all of the above with both ActiveMQ 4.1.1 and the 5.0 
snapshot as of yesterday, September 4th.

I've also tried sending messages directly from the web console just to 
make sure that there's nothing wrong with my publishers, double-checking 
that messages are sent with persistent delivery.

Am I wrong to expect that B1 and B2 should re-synchronize after the 
connection between them has been rebuilt, or is maybe my forwarding 
configuration wrong? How could I go about debugging what's happening to 
the message that's sent while H2 is down, whether it ever gets 
replicated from B1 to B2 and if not, why not?

Please let me know if you need the full configuration files or log files.

Thanks in advance for any advise,

Julian


Configuration for broker B1 running on host H1:

     <networkConnectors>
       <networkConnector uri="static:(tcp://H2:61616)"/>
     </networkConnectors>


Configuration for B2 running on H2:

     <networkConnectors>
       <networkConnector uri="static:(tcp://H1:61616)"/>
     </networkConnectors>



Re: Network-of-brokers re-synchronization after network disconnect?

Posted by Julian Scheid <ju...@googlemail.com>.
Julian Scheid wrote:
> I have a durable topic distributed over two broker nodes and it's 
> working just fine, however messages get lost when I artificially 
> disconnect and later reconnect one of the brokers.

This could be related to http://issues.apache.org/activemq/browse/AMQ-1076