You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Aaron Hu <li...@qq.com> on 2016/02/01 07:38:04 UTC

Connection lost when ActiveMQ shutdown and restart twice

Scenario:
I have five process: guard, watchdog, mdp, ems and security. Each process's
work is as follows:
Gurad: start an ActiveMQ server; monitor process watchdog, mdp, ems and
security, restart these process if they shutdown abnormally.
Watchdog: monitor process guard, restart it if it shutdown abnormally.
Mdp, Ems and Security: send alive message to process guard every 5 seconds.

Question:
Step 1: start all these process. (Now process guard can receive alive
message sent by other processes.)

Step 2: kill process guard. (Then process ems, mdp and security loss
connection with ActiveMQ, after a short time, process guard is restarted by
process watchdog. Now mdp, ems and security reconnect to ActiveMQ because of
failover function.)

Step 3: kill process mdp, ems and security. (Then after a short time, these
processes are restarted by process guard. Now process guard can receive
alive message sent by these new started processes.)

Step 4: kill process guard the SECOND time. (This time when process guard is
restarted by process watchdog, NOT ALL the processes mdp, ems and security
can detect this restart event, for example, process ems reconnect to
ActiveMQ and process Guard can receive alive message sent by process ems.
But the other two process CAN'T detect that process guard has restarted and
still send alive message to the former connection. )

HOW COULD THIS HAPPEN?
(ActiveMQ 5.10, JDK 1.6)




--
View this message in context: http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Connection lost when ActiveMQ shutdown and restart twice

Posted by Aaron Hu <li...@qq.com>.
It's very hard to DEBUG all these process because GUARD will restart the
process exit abnormally using JAR file. 

Connection URI we use:
failover:(tcp://127.0.0.1:61616?wireFormat.maxInactivityDuration=0) 
(It is said that the target IP must be 127.0.0.1 if PROCESSes and JMS SERVER
are running on the same host, IS THIS RIGHT?) 

The first time GUARD is killed in STEP 2, LOG of all these four process
print: 
2016-02-02 08:55:19,398 WARN
[org.apache.activemq.transport.failover.FailoverTransport] - Transport
(tcp://127.0.0.1:61616) failed, reason:  java.net.SocketException:
Connection reset, attempting to automatically reconnect 
2016-02-02 08:55:23,692 INFO
[org.apache.activemq.transport.failover.FailoverTransport] - Successfully
reconnected to tcp://QH-20151209WEVY:61616 

The second time GUARD is killed in STEP 4, LOG of abnormal PROCESSes print
message as usual: 
2016-02-02 08:55:17,690 INFO [com.nm.server.comm.pm.ProcessManager] - send
alive msg to guard. 
These abnormal PROCESSes haven't detected the ActiveMQ has been restarted. 

Thanks for replying to me. 



--
View this message in context: http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706768.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Connection lost when ActiveMQ shutdown and restart twice

Posted by Aaron Hu <li...@qq.com>.
There is another weird thing: 
I tried that starting ActiveMQ in a single process named AMQ, and let GUARD
watch it and restart it when AMQ exit abnormally.

In this senario, GUARD just do one thing. monitor all the other processes
and restart them if needed.

Then amazing things happen: no matter how many time AMQ, GUARD, MDP,
SECURITY and EMS have been restarted or in what sequence they are restarted,
FAILOVER works perfectly. Every time AMQ is restarted, the other four
process CAN detect it and reconnect to new ActiveMQ Server, LOG print:

2016-02-02 08:55:19,398 WARN
[org.apache.activemq.transport.failover.FailoverTransport] - Transport
(tcp://127.0.0.1:61616) failed, reason:  java.net.SocketException:
Connection reset, attempting to automatically reconnect 
2016-02-02 08:55:23,692 INFO
[org.apache.activemq.transport.failover.FailoverTransport] - Successfully
reconnected to tcp://QH-20151209WEVY:61616

I CAN'T TELL WHY THIS HAPPEN. THIS REALLY CONFUSE ME.



--
View this message in context: http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706767.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Connection lost when ActiveMQ shutdown and restart twice

Posted by Aaron Hu <li...@qq.com>.
About the three point you have mentioned:
1, My OS is Window 7, I use system explorer to kill process every time.
2, We have done some experiments, but nothing useful.
3, If normally kill GUARD, every process works perfectly.

Finnally we choose to run ActiveMQ in a single process cause we don't have
too much time to do more tests.

Thanks again!




--
View this message in context: http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706859.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Connection lost when ActiveMQ shutdown and restart twice

Posted by Tim Bain <tb...@alumni.duke.edu>.
DEBUG is a logging level in Log4J, which can be applied to loggers via
log4j.properties.  Setting your logging level to DEBUG prints more detailed
information to the logs, which is sometimes useful for figuring out what's
going on and sometimes not.

127.0.0.1 should work, but isn't your only choice.  localhost should
resolve to 127.0.0.1 and will behave equivalently.  Your machine also has a
hostname and one or more IP addresses, which clients on the same host
should be able to resolve; using one of those would allow the client's
configuration to not change if the client needs to be moved to another
host.  It might also have other FQDNs that can be resolved in DNS or
iptables and therefore allow a client to connect.  Ultimately the only
requirement is that the IP layer be able to deliver packets to the right
host and process.  I'd use the hostname because of the ability to move
clients around without reconfiguring them, but things should work any way
you do it.

What your log line from the second restart indicates to me is not that the
clients don't detect that the new process is back up, but rather that they
don't detect that it went away in the first place.  Have you confirmed
first and foremost that the previous process actually exited?  Second, is
the behavior different if the watchdog isn't running and doesn't
immediately restart the guard?  If so, how long does it take before the
clients detect that the broker isn't available and the failover logic (and
log lines) kick in?  And third, if you're hard-killing (kill -9) your guard
process, is the behavior different if you do a normal kill and allow the
process to exit gracefully?

I suspect that part of the cause here is TCP's inability to immediately
detect the closure of connections that are severed without going through
the TCP connection teardown logic, as happens when a process is kill
-9'ed.  It might even be that the clients who don't fail over simply can't
tell that there was ever a time when there was no process responding on
port 61616, and so the TCP layer perceives just a single connection when
you know it's really two.  If the TCP layer thinks that the connection
never closed, the failover logic will never kick in, which would explain
what you're seeing.

I'm not sure why the behavior would be different with an external broker
than with an embedded one, though it's possible that it's something about
startup speed, or an artifact of how you're performing your test.

Tim
It's very hard to DEBUG all these process because GUARD will restart the
process exit abnormally using JAR file.

Connection URI we use:
failover:(tcp://127.0.0.1:61616?wireFormat.maxInactivityDuration=0)
(It is said that the target IP must be 127.0.0.1 if PROCESSes and JMS SERVER
are running on the same host, IS THIS RIGHT?)

The first time GUARD is killed in STEP 2, LOG of all these four process
print:
2016-02-02 08:55:19,398 WARN
[org.apache.activemq.transport.failover.FailoverTransport] - Transport
(tcp://127.0.0.1:61616) failed, reason:  java.net.SocketException:
Connection reset, attempting to automatically reconnect
2016-02-02 08:55:23,692 INFO
[org.apache.activemq.transport.failover.FailoverTransport] - Successfully
reconnected to tcp://QH-20151209WEVY:61616

The second time GUARD is killed in STEP 4, LOG of abnormal PROCESSes print
message as usual:
2016-02-02 08:55:17,690 INFO [com.nm.server.comm.pm.ProcessManager] - send
alive msg to guard.
These abnormal PROCESSes haven't detected the ActiveMQ has been restarted.

Thanks for replying to me.




--
View this message in context:
http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706766.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Connection lost when ActiveMQ shutdown and restart twice

Posted by Aaron Hu <li...@qq.com>.
It's very hard to DEBUG all these process because GUARD will restart the
process exit abnormally using JAR file.

Connection URI we use:
failover:(tcp://127.0.0.1:61616?wireFormat.maxInactivityDuration=0)
(It is said that the target IP must be 127.0.0.1 if PROCESSes and JMS SERVER
are running on the same host, IS THIS RIGHT?)

The first time GUARD is killed in STEP 2, LOG of all these four process
print: 
2016-02-02 08:55:19,398 WARN
[org.apache.activemq.transport.failover.FailoverTransport] - Transport
(tcp://127.0.0.1:61616) failed, reason:  java.net.SocketException:
Connection reset, attempting to automatically reconnect
2016-02-02 08:55:23,692 INFO
[org.apache.activemq.transport.failover.FailoverTransport] - Successfully
reconnected to tcp://QH-20151209WEVY:61616

The second time GUARD is killed in STEP 4, LOG of abnormal PROCESSes print
message as usual: 
2016-02-02 08:55:17,690 INFO [com.nm.server.comm.pm.ProcessManager] - send
alive msg to guard.
These abnormal PROCESSes haven't detected the ActiveMQ has been restarted.

Thanks for replying to me.




--
View this message in context: http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711p4706766.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: Connection lost when ActiveMQ shutdown and restart twice

Posted by Tim Bain <tb...@alumni.duke.edu>.
And there are no relevant line in the logs of any of those four processes?
(I don't care about watchdog since it doesn't connect to ActiveMQ.)  What
about at DEBUG?

What connection URI are your clients using?
On Feb 1, 2016 12:01 AM, "Aaron Hu" <li...@qq.com> wrote:

> Scenario:
> I have five process: guard, watchdog, mdp, ems and security. Each process's
> work is as follows:
> Gurad: start an ActiveMQ server; monitor process watchdog, mdp, ems and
> security, restart these process if they shutdown abnormally.
> Watchdog: monitor process guard, restart it if it shutdown abnormally.
> Mdp, Ems and Security: send alive message to process guard every 5 seconds.
>
> Question:
> Step 1: start all these process. (Now process guard can receive alive
> message sent by other processes.)
>
> Step 2: kill process guard. (Then process ems, mdp and security loss
> connection with ActiveMQ, after a short time, process guard is restarted by
> process watchdog. Now mdp, ems and security reconnect to ActiveMQ because
> of
> failover function.)
>
> Step 3: kill process mdp, ems and security. (Then after a short time, these
> processes are restarted by process guard. Now process guard can receive
> alive message sent by these new started processes.)
>
> Step 4: kill process guard the SECOND time. (This time when process guard
> is
> restarted by process watchdog, NOT ALL the processes mdp, ems and security
> can detect this restart event, for example, process ems reconnect to
> ActiveMQ and process Guard can receive alive message sent by process ems.
> But the other two process CAN'T detect that process guard has restarted and
> still send alive message to the former connection. )
>
> HOW COULD THIS HAPPEN?
> (ActiveMQ 5.10, JDK 1.6)
>
>
>
>
> --
> View this message in context:
> http://activemq.2283324.n4.nabble.com/Connection-lost-when-ActiveMQ-shutdown-and-restart-twice-tp4706711.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>