You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Josh Carlson <jc...@e-dialog.com> on 2010/11/11 20:11:13 UTC

ActiveMQ Broker Failover

We are using version 5.3.0 with a shared file system master slave configuration and using persistence messaging with client acknowledgements. A NFSV4 mount point is used for both the lock file and the persistent storage. KahaDB is being used as the persistence adaptor.

We have encountered issues where the broker does not failover gracefully whenever there is a problem with the NFS server. The most reliable test case I have come up with is starting and stopping the NFS server. When the NFS server is restarted one of the slaves acquires the lock and become a master, but the original master stays active and listening for connections. Clients can successfully connect to it and subscribe to queues (but no messages get dispatched) and enqueues hang until there is a timeout on the socket. Connections that go to the new master work. Hence the questions:

	Why was the lock released? Shouldn't it have been retained?

       Why isn't the original master dispatching messages and blocking sends?

I have seen other issues but have not been able to reproduce them reliably,

	* NFS timeout due to a DNS issue
	* Possible Linux kernel bug. Problem arrises when /var/log/messages: kernel: decode_op_hdr: reply buffer overflowed in line 2121.<6>      blocks= 585871964 block_size= 512

Any help would be appreciated.

Thanks

Josh