You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by gbrown <gb...@mediaocean.com> on 2018/04/04 15:18:46 UTC

Both instances of ActiveMQ connected to kahadb after network outage

We had a short outage on the network and once the this came back both
instances in our master / slave setup were up and connectable. Once this was
discovered when messages on queues were not browsable or able to be consumed
the instances were restarted after renaming the db.data file as other
methods to start (persistenceAdapter options) would not work.

Once started the messages on the queues were gone so probably lost.

We use an nfs4 mount point.

ActiveMQ Version is 5.11.1

so can anyone help with 

1. How is it possible that both master and slave connected to the kahabd
2. Is there anyway I could have recovered that would have kept the messages
on the queues



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Both instances of ActiveMQ connected to kahadb after network outage

Posted by gbrown <gb...@mediaocean.com>.
I tried but was unable to re-create the error so for now no closer to finding
the cause or a solution.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Both instances of ActiveMQ connected to kahadb after network outage

Posted by gbrown <gb...@mediaocean.com>.
thanks for the reply,  only db.data was deleted (renamed in fact).

The locks work in testing when stopping/killing/switching off the master
instance of ActiveMQ and the slave takes over and the same when going back. 

I'll need to see if I can have the NFS share failing on both servers at the
same time and coming back at the same time to then see what happens.



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Both instances of ActiveMQ connected to kahadb after network outage

Posted by Tim Bain <tb...@alumni.duke.edu>.
On Wed, Apr 4, 2018 at 9:18 AM, gbrown <gb...@mediaocean.com> wrote:

> We had a short outage on the network and once the this came back both
> instances in our master / slave setup were up and connectable. Once this
> was
> discovered when messages on queues were not browsable or able to be
> consumed
> the instances were restarted after renaming the db.data file as other
> methods to start (persistenceAdapter options) would not work.
>
> Once started the messages on the queues were gone so probably lost.
>
> We use an nfs4 mount point.
>
> ActiveMQ Version is 5.11.1
>
> so can anyone help with
>
> 1. How is it possible that both master and slave connected to the kahabd
>


It sure sounds like your NFS setup isn't successfully doing shared
exclusive locks, even though it's an NFSv4 mount.
http://activemq.2283324.n4.nabble.com/Unreliable-NFS-exclusive-locks-on-unreliable-networks-td4737992.html
has some discussion of the NFS mount options that some other users are
using, but I can't say that anyone's built a consensus around "these
settings work and these other ones don't" so all you have to go on at the
moment are these reports from other users. If you're able to tell us what
settings you end up using that fix the problem (and you should plan on
doing thorough testing, given that you've just demonstrated that your
current settings appeared to work but didn't actually), maybe we can
establish enough of a consensus among the community to consider documenting
recommended values on the wiki.



> 2. Is there anyway I could have recovered that would have kept the messages
> on the queues
>


db.data is the index, and is simply cached information derived from the
actual journal files. It can be safely deleted without data loss, because
it will simply be rebuilt from the journal files. If all you deleted was
that one file (which is what it sounds like) and you ended up not having
messages upon restart, it means they had already been deleted from the
journal files, and there wasn't anything you could have done to avoid
losing the messages. If on the other hand you deleted *.log files in
addition to db.data, then you could have avoided losing your messages by
not deleting those journal files (*.log). I think from what you wrote that
the message loss was unavoidable, unless your description of which files
you deleted was incomplete.

Tim