You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by akpuvvada <ak...@gmail.com> on 2017/07/14 10:53:07 UTC

Active MQ - Master/Slave Config Not working

I observed an issue with the Fault Tolerance configuration:
When the primary/master is down, clients are not able to reconnect to the
secondary on retry; it is throwing a warning and trying to reconnect. Also,
it is considering the first URL as Master and first trying to connect to and
fall back is not working even if we stop the Master and restart the client.
The fail-over URL has to be changed to make the connection work such that
the Machine 2 (current master) tcp URL is the first one.

I configured the Fault Tolerance as per
http://activemq.apache.org/shared-file-system-master-slave.html
Shared File System - 
<persistenceAdapter>
  <kahaDB directory="/sharedFileSystem/sharedBrokerData"/>
</persistenceAdapter>

Please help me identify where I am ding wrong.
I am using URL : failover:(tcp://host1:61616,tcp:/host2:61616)

Appreciate any help.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
Heap dumps are useful in pretty limited circumstances (of which this is not
one). What I was asking for is a thread dump, which will give the current
stack trace for every thread in the JVM, so we can see where the failover
transport is getting stuck.

Tim

On Jul 26, 2017 12:25 AM, "akpuvvada" <ak...@gmail.com> wrote:

> heapdump-1501050187914.zip
> <http://activemq.2283324.n4.nabble.com/file/n4728904/
> heapdump-1501050187914.zip>
>
> Hi Tim,
> Please find attached the Head Dump.
>
> I am not sure how to read this, so, no idea what it is saying.
>
> Let us know if you find anything.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-
> tp4728550p4728904.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
heapdump-1501050187914.zip
<http://activemq.2283324.n4.nabble.com/file/n4728904/heapdump-1501050187914.zip>  

Hi Tim,
Please find attached the Head Dump.

I am not sure how to read this, so, no idea what it is saying.

Let us know if you find anything.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728904.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
OK, based on everything you just tested, it sounds like the problem is in
the ActiveMQ client code (either in the failover transport itself or maybe
somewhere else that's not hit when using the raw TCP transport). My best
guess at the moment is that we're somehow getting stuck (waiting for a
lock, in a loop, etc) that prevents us from making the tcp://host2:61616
connection, but that's just a guess.

Can you please re-run test 2 and take (and post here) a thread dump of the
client process so we can see where the thread is, to see if that supports
the theory of getting stuck?

Tim

On Jul 25, 2017 12:33 AM, "akpuvvada" <ak...@gmail.com> wrote:

> Hi Tim,
> Please find below the observations.
>
>
>
> 1. Kill host1 so host2 becomes the master and there is no slave. Connect a
> consumer with a URI of tcp://host1:61616. Does it connect? What's in the
> consumer's logs?
>
> ActiveMQConnectionFactory connectionFactory = new
> ActiveMQConnectionFactory("tcp://potaplq00001la:61616");
> Jul 25, 2017 11:45:34 AM
> activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
> SEVERE: null
> javax.jms.JMSException: Could not connect to broker URL:
> tcp://potaplq00001la:61616. Reason: java.net.ConnectException: Connection
> refused: connect
>         at
> org.apache.activemq.util.JMSExceptionSupport.create(
> JMSExceptionSupport.java:36)
>         at
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(
> ActiveMQConnectionFactory.java:374)
>         at
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(
> ActiveMQConnectionFactory.java:304)
>         at
> org.apache.activemq.ActiveMQConnectionFactory.createConnection(
> ActiveMQConnectionFactory.java:244)
>         at
> activemqjmsclients.FTTest.ActiveMQJMSClients_Producer.
> run(ActiveMQJMSClients_Producer.java:53)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused: connect
>         at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
>         at
> java.net.DualStackPlainSocketImpl.socketConnect(
> DualStackPlainSocketImpl.java:85)
>         at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:
> 345)
>         at
> java.net.AbstractPlainSocketImpl.connectToAddress(
> AbstractPlainSocketImpl.java:206)
>         at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
>         at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:589)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.connect(
> TcpTransport.java:525)
>         at
> org.apache.activemq.transport.tcp.TcpTransport.doStart(
> TcpTransport.java:488)
>         at org.apache.activemq.util.ServiceSupport.start(
> ServiceSupport.java:55)
>         at
> org.apache.activemq.transport.AbstractInactivityMonitor.start(
> AbstractInactivityMonitor.java:169)
>         at
> org.apache.activemq.transport.InactivityMonitor.start(
> InactivityMonitor.java:52)
>         at
> org.apache.activemq.transport.TransportFilter.start(
> TransportFilter.java:64)
>         at
> org.apache.activemq.transport.WireFormatNegotiator.start(
> WireFormatNegotiator.java:72)
>         at
> org.apache.activemq.transport.TransportFilter.start(
> TransportFilter.java:64)
>         at
> org.apache.activemq.transport.TransportFilter.start(
> TransportFilter.java:64)
>         at
> org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(
> ActiveMQConnectionFactory.java:354)
>         ... 4 more
>
> Exception in thread "Thread-1" java.lang.NullPointerException
>         at
> activemqjmsclients.FTTest.ActiveMQJMSClients_Producer.
> run(ActiveMQJMSClients_Producer.java:91)
>         at java.lang.Thread.run(Thread.java:745)
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------
> 2. Kill host1 so host2 becomes the master and there is no slave. Connect a
> consumer with your normal failover URI. Does it connect? What's in the
> consumer's logs?
>
> ActiveMQConnectionFactory connectionFactory = new
> ActiveMQConnectionFactory("failover:(tcp://potaplq00001la:61616,tcp:/
> potaplq00001lb:61616)");
>
> Just hung/waiting for more than 5 mnts and then I killed the code manually.
> No entries in logs.
>
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------
> 3. Kill host1 so host2 becomes the master and there is no slave. Start
> host1 so it becomes the slave. Kill host2 so host1 becomes the master and
> there is no slave. Connect a consumer with your normal failover URI. Does
> it connect? What's in the consumer's logs?
>
> ActiveMQConnectionFactory connectionFactory = new
> ActiveMQConnectionFactory("failover:(tcp://potaplq00001la:61616,tcp:/
> potaplq00001lb:61616)");
>
>
> It is connecting without any issues.
> ------------------------------------------------------------
> ------------------------------------------------------------
> -------------------
> 4. Remove the timeout attribute from your failover URI and try your
> "regular" test again. You never mentioned that you were providing
> additional options (randomize and timeout) to the failover URI, neither
> here nor in the JIRA bug you submitted.
>
> I only used them yesterday. I tested the above scenarios without the
> options.
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-
> tp4728550p4728862.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Hi Tim,
Please find below the observations.



1. Kill host1 so host2 becomes the master and there is no slave. Connect a
consumer with a URI of tcp://host1:61616. Does it connect? What's in the
consumer's logs?

ActiveMQConnectionFactory connectionFactory = new
ActiveMQConnectionFactory("tcp://potaplq00001la:61616");
Jul 25, 2017 11:45:34 AM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
SEVERE: null
javax.jms.JMSException: Could not connect to broker URL:
tcp://potaplq00001la:61616. Reason: java.net.ConnectException: Connection
refused: connect
	at
org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:36)
	at
org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:374)
	at
org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:304)
	at
org.apache.activemq.ActiveMQConnectionFactory.createConnection(ActiveMQConnectionFactory.java:244)
	at
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer.run(ActiveMQJMSClients_Producer.java:53)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.net.ConnectException: Connection refused: connect
	at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
	at
java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
	at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
	at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
	at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at
org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:525)
	at
org.apache.activemq.transport.tcp.TcpTransport.doStart(TcpTransport.java:488)
	at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
	at
org.apache.activemq.transport.AbstractInactivityMonitor.start(AbstractInactivityMonitor.java:169)
	at
org.apache.activemq.transport.InactivityMonitor.start(InactivityMonitor.java:52)
	at
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64)
	at
org.apache.activemq.transport.WireFormatNegotiator.start(WireFormatNegotiator.java:72)
	at
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64)
	at
org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64)
	at
org.apache.activemq.ActiveMQConnectionFactory.createActiveMQConnection(ActiveMQConnectionFactory.java:354)
	... 4 more

Exception in thread "Thread-1" java.lang.NullPointerException
	at
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer.run(ActiveMQJMSClients_Producer.java:91)
	at java.lang.Thread.run(Thread.java:745)

-------------------------------------------------------------------------------------------------------------------------------------------
2. Kill host1 so host2 becomes the master and there is no slave. Connect a
consumer with your normal failover URI. Does it connect? What's in the
consumer's logs?

ActiveMQConnectionFactory connectionFactory = new
ActiveMQConnectionFactory("failover:(tcp://potaplq00001la:61616,tcp:/potaplq00001lb:61616)");

Just hung/waiting for more than 5 mnts and then I killed the code manually.
No entries in logs.

-------------------------------------------------------------------------------------------------------------------------------------------
3. Kill host1 so host2 becomes the master and there is no slave. Start
host1 so it becomes the slave. Kill host2 so host1 becomes the master and
there is no slave. Connect a consumer with your normal failover URI. Does
it connect? What's in the consumer's logs?

ActiveMQConnectionFactory connectionFactory = new
ActiveMQConnectionFactory("failover:(tcp://potaplq00001la:61616,tcp:/potaplq00001lb:61616)");


It is connecting without any issues.
-------------------------------------------------------------------------------------------------------------------------------------------
4. Remove the timeout attribute from your failover URI and try your
"regular" test again. You never mentioned that you were providing
additional options (randomize and timeout) to the failover URI, neither
here nor in the JIRA bug you submitted. 

I only used them yesterday. I tested the above scenarios without the
options.




--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728862.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Hi Tim,
I was not originally using the options like timeout, etc.
I only used them yesterday.

I will test and let you know.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728861.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
I understand that this is only a problem when using the failover transport,
and clearly TCP connections to the master are working (otherwise you'd get
no connectivity at all no matter which host was the master).

What I'm trying to do is determine whether the problem (failure to detect
that the master goes down and is unavailable to make connections) is caused
by something in the failover transport (i.e. on the client side of the
connection) or in the broker.

To that end, I'd like you to please run the following tests, each from the
same starting point (host1 as master, host2 as slave, both having started
after you killed all brokers and all clients from the previous test, and no
consumer processes running):

1. Kill host1 so host2 becomes the master and there is no slave. Connect a
consumer with a URI of tcp://host1:61616. Does it connect? What's in the
consumer's logs?
2. Kill host1 so host2 becomes the master and there is no slave. Connect a
consumer with your normal failover URI. Does it connect? What's in the
consumer's logs?
3. Kill host1 so host2 becomes the master and there is no slave. Start
host1 so it becomes the slave. Kill host2 so host1 becomes the master and
there is no slave. Connect a consumer with your normal failover URI. Does
it connect? What's in the consumer's logs?
4. Remove the timeout attribute from your failover URI and try your
"regular" test again. You never mentioned that you were providing
additional options (randomize and timeout) to the failover URI, neither
here nor in the JIRA bug you submitted.

Also, I'm curious what's in the logs after the client timeout exceptions
you quoted. Can you post a longer snippet to show that?

Tim

On Jul 24, 2017 9:23 AM, "akpuvvada" <ak...@gmail.com> wrote:

> Hi Tim,
> If I use the regular TCP URL, I am able to connect to the Active server
> anytime it is acting as Master.
> The issue is happening only if I use the failover URL.
>
> If I use the logic like try {connect to master} catch {connect to slave},
> it
> is working fine.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-
> tp4728550p4728832.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Hi Tim,
If I use the regular TCP URL, I am able to connect to the Active server
anytime it is acting as Master.
The issue is happening only if I use the failover URL.

If I use the logic like try {connect to master} catch {connect to slave}, it
is working fine.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728832.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
It should like the underlying problem is that the slave broker isn't
rejecting incoming connections (which would be a bug, so thanks for
submitting the JIRA bug), but otherwise it sounds like both failover itself
and shared filesystem KahaDB are working.

Can you please confirm that by making sure that clients fail over properly
when the master is stopped (and not restarted) and then fail over again
when the original master is started again and the new master is stopped
(and not restarted)? That is, we want to make sure that failover works
properly when only one broker at a time is up.

Tim

On Jul 24, 2017 6:35 AM, "akpuvvada" <ak...@gmail.com> wrote:

>
> https://issues.apache.org/jira/browse/AMQ-6777
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-
> tp4728550p4728796.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
https://issues.apache.org/jira/browse/AMQ-6777




--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728796.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Please find log below :

Jul 24, 2017 4:45:24 PM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
INFO: Connected - Producer - 3
Jul 24, 2017 4:45:25 PM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
INFO: Connected to :
failover:(tcp://potaplq00001lb:61616,tcp:/potaplq00001la:61616)?randomize=false&timeout=60000
Jul 24, 2017 4:45:27 PM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
INFO: Connected to :
failover:(tcp://potaplq00001lb:61616,tcp:/potaplq00001la:61616)?randomize=false&timeout=60000
Jul 24, 2017 4:45:28 PM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
INFO: Connected to :
failover:(tcp://potaplq00001lb:61616,tcp:/potaplq00001la:61616)?randomize=false&timeout=60000
Jul 24, 2017 4:45:29 PM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
INFO: Connected to :
failover:(tcp://potaplq00001lb:61616,tcp:/potaplq00001la:61616)?randomize=false&timeout=60000
Jul 24, 2017 4:46:33 PM
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer run
SEVERE: null
javax.jms.JMSException: Failover timeout of 60000 ms reached.
	at
org.apache.activemq.util.JMSExceptionSupport.create(JMSExceptionSupport.java:72)
	at
org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1413)
	at
org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1428)
	at
org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1323)
	at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1967)
	at
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
	at
org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
	at
org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:241)
	at
activemqjmsclients.FTTest.ActiveMQJMSClients_Producer.run(ActiveMQJMSClients_Producer.java:75)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failover timeout of 60000 ms reached.
	at
org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:639)
	at
org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
	at
org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
	at
org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)
	at
org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1388)
	... 8 more



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728795.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Hi Tim,
Yes, I am able to see the logs as it should.

And also, the admin UI, I am able to see the queues and details only in the
primary server admin UI. Other one says "Error! Exception occurred while
processing this request, check the log for more information!".

Even when both are up (master and slave), if I give URL as
failover:(tcp://slave:port,tcp://master.port)
The connection is not working.

I am stopping the primary using the command "./activemq stop" and checking
the status "./activemq status" and also checking if admin is accessible. I
also verified using "ps-ef | grep activemq". I can confirm that primary is
stopped.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728794.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
Thanks for confirming that (at least at broker startup) the
shared-filesystem locking is working. When you stop the master broker, do
you see lines in the slave log at that exact time saying that it has
acquired the lock and become the master? And after you shut down the
original master, when you start it again, do you see this same line saying
it couldn't acquire the lock and was therefore the slave? If any of those
things isn't working, something is wrong and we need to address that first,
before we spend any effort on the client code.

One question about the client, though: if you start both brokers and
connect with your M1,M2 URI, can you also connect with your M2,M1 URI? Does
the order of the failover URI matter when both brokers are up, or only
after M1 is taken down?

One other question: when you kill M1, how do you do that, and how do you
know that the process has actually stopped? Is there any possibility that
the M1 broker might still be running after you tr to stop it?

Tim

Tim

On Jul 18, 2017 4:09 AM, "akpuvvada" <ak...@gmail.com> wrote:

> Hi Tim,
> I can see logs indicating that lock is identified, please see below.
>
> 2017-07-14 03:18:14,428 | INFO  | Database
> /tibco_installables/ActiveMQ/DataStore/kahadb/lock is locked by another
> server. This broker is now in slave mode waiting a lock to be acquired |
> org.apache.activemq.store.SharedFileLocker | main
>
>
> So, this is not a file system issues.
>
> My issue is:
>  - When I first bring up the servers, M1 first and M2 seconds. M1 is taking
> up as Primary and M2 as Secondary.
>  - I am using M1:port,M2:port failover URL.
>  - It is working fine
> However, the issue is coming up when the Primary is stopped, the same above
> failover URL is not working.
> If I change the URL to M2:port,M1:port, it is working.
>
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-
> tp4728550p4728622.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Hi Tim,
I can see logs indicating that lock is identified, please see below.

2017-07-14 03:18:14,428 | INFO  | Database
/tibco_installables/ActiveMQ/DataStore/kahadb/lock is locked by another
server. This broker is now in slave mode waiting a lock to be acquired |
org.apache.activemq.store.SharedFileLocker | main


So, this is not a file system issues.

My issue is:
 - When I first bring up the servers, M1 first and M2 seconds. M1 is taking
up as Primary and M2 as Secondary.
 - I am using M1:port,M2:port failover URL.
 - It is working fine
However, the issue is coming up when the Primary is stopped, the same above
failover URL is not working.
If I change the URL to M2:port,M1:port, it is working.




--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728622.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
Is the shared drive using a filesystem where distributed exclusive locks
work correctly and reliably? NFSv4 is safe, NFSv3 isn't, and some others
work while others don't.
http://activemq.apache.org/shared-file-system-master-slave.html has some
details, though it's hardly an exhaustive list of all the shared filesystem
that exist.

If your exclusive locks are working, the slave broker will fail to start
(well, it'll start, but you'll see logs saying it's waiting to acquire the
lock, and it won't accept any connections). If both brokers come up and
accept connections at the same time, you don't have a master/slave pair,
and you need to figure out why not before you can go any further with
master/slave.

Tim

On Fri, Jul 14, 2017 at 7:10 PM, akpuvvada <ak...@gmail.com> wrote:

> Hi Tim,
>
> The paths configured for Kaha DB and data in broker attributes are same for
> both instances - pointing to the same folder on the shared drive.
> Anything else needs to be changed/added?
>
> Thanks
> Anil
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-
> tp4728550p4728567.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>

Re: Active MQ - Master/Slave Config Not working

Posted by akpuvvada <ak...@gmail.com>.
Hi Tim,

The paths configured for Kaha DB and data in broker attributes are same for
both instances - pointing to the same folder on the shared drive.
Anything else needs to be changed/added?

Thanks 
Anil



--
View this message in context: http://activemq.2283324.n4.nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550p4728567.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Active MQ - Master/Slave Config Not working

Posted by Tim Bain <tb...@alumni.duke.edu>.
By default, the failover transport attempts to connect to the child URIs in
order. That doesn't mean it considers the first one to be the master, just
that it tries the first one first.

When the client doesn't attempt to connect to the secondary broker, that
means it's successfully connecting to the first broker. That means that
either the first broker failed to shut down (the logs may tell you why), or
it shut down fine but you brought it back up and it failed to detect that
the second broker was the master (which would mean you haven't set up the
shared filesystem KahaDB correctly).

Tim

On Jul 14, 2017 5:12 AM, "akpuvvada" <ak...@gmail.com> wrote:

> I observed an issue with the Fault Tolerance configuration:
> When the primary/master is down, clients are not able to reconnect to the
> secondary on retry; it is throwing a warning and trying to reconnect. Also,
> it is considering the first URL as Master and first trying to connect to
> and
> fall back is not working even if we stop the Master and restart the client.
> The fail-over URL has to be changed to make the connection work such that
> the Machine 2 (current master) tcp URL is the first one.
>
> I configured the Fault Tolerance as per
> http://activemq.apache.org/shared-file-system-master-slave.html
> Shared File System -
> <persistenceAdapter>
>   <kahaDB directory="/sharedFileSystem/sharedBrokerData"/>
> </persistenceAdapter>
>
> Please help me identify where I am ding wrong.
> I am using URL : failover:(tcp://host1:61616,tcp:/host2:61616)
>
> Appreciate any help.
>
>
>
> --
> View this message in context: http://activemq.2283324.n4.
> nabble.com/Active-MQ-Master-Slave-Config-Not-working-tp4728550.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>