You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by pminearo <pe...@skycreek.com> on 2014/08/05 20:51:25 UTC

JMS to JMS Bridge Connection

We are running into an issue with ActiveMQ JMS to JMS Bridge.

On Server A we have a stand alone instance of ActiveMQ with 4 queues:

queue_1_inbound
queue_2_inbound
queue_3_inbound
queue_outbound

On Server B we have 3 embedded ActiveMQ instances, each in their own JVM.
JVM 1 will have an inbound queue bridge to queue_1_inbound and an outbound
queue bridge to queue_outbound. JVM 2 will have an inbound queue bridge to
queue_2_inbound and an outbound queue bridge to queue_outbound. JVM 3 will
have an inbound queue bridge to queue_3_inbound and an outbound queue bridge
to queue_outbound. Each embedded broker is set up exactly the same except
for the inbound queue name and the TransportConnector ports.

What is happening is at some point in time we get an EOFException being
logged and the JMS Bridge connectors are not resetting properly. So the
following message is logged every minute or so.

We checked the Advisory messages coming through and there was no indication
of a problem. Digging through the code, I noticed that on TCPTransport, if
an Exception is thrown; the run() method logs the error and then exits. I
guess a connection is made based on the last log message, but the bridges
are not reset. And so it keeps disconnecting, and reconnecting. However,
when looking at Sever A's admin console; the connection never returns. A
restart of the application is required to re-establish the connection and
bridges.

If I am understanding the code correctly, this seems like a bug. If the
TCPTransport catches an Exception, shouldn't it thrown the exception out and
cause the JMS Bridge connector to re-establish the connections and bridges?

This problem is hard for me to reproduce in any other environment, but our
production environment (go figure).

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

Hmm, the inactivity timer is closing the connection:

2014-07-25 17:37:52,373 | DEBUG | Transport Connection to: tcp://[IP
ADDRESS]:4507 failed: org.apache.activemq.transport.InactivityIOException:
Channel was inactive for too (>30000) long: tcp://[IP ADDRESS]:4507 |
org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
InactivityMonitor Worker
org.apache.activemq.transport.InactivityIOException: Channel was inactive
for too (>30000) long: tcp://v:4507
 at
org.apache.activemq.transport.AbstractInactivityMonitor$4.run(AbstractInactivityMonitor.java:215)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
 at java.lang.Thread.run(Thread.java:662)
2014-07-25 17:37:52,374 | DEBUG | Unregistering MBean
org.apache.activemq:type=Broker,brokerName=serverABroker,connector=clientConnectors,connectorName=[Connector
Name],connectionViewType=clientId,connectionName=queue_1_inbound |
org.apache.activemq.broker.jmx.ManagementContext | ActiveMQ
InactivityMonitor Worker


Let me ask straight-out - does this cause a a problem for the application? 
Or, are these log messages and brief interruptions to flow the only real
concern?  I'm trying to make sure I'm trying to solve the right problem.




--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684140.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

Drilling down into the code further, the readInt() method calls the read()
method 4 times in TCPBufferedInput.  

Unfortunately, I do not have a copy of Server A's log file for this time.  I
do have a copy from earlier in the day.  Not sure if this will help, but
these messages seemed to come up around the same time.  



Server A is managed by a different group, so I might be able to get them to
get us log files with information around the same time.  But it may take a
bit to coordinate efforts.
 



--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684139.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

Here is the code throwing the Exception:

        if (!sizePrefixDisabled) {

            int size = bytesIn.readInt();

It appears to be right at the beginning of an openwire packet.  Very odd if
both sides of the connection are talking openwire, unless the connection is
simply being dropped.

What does the log of the other side of the connection report?




--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684131.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

There are 3 scenarios where ActiveMQ behaved as designed:

1. Outbound Bridge disabled. This always worked.
2. With Outbound Bridge enabled, it intermittently worked for specific
embedded brokers and not for others, then it would flip on us. Same configs
an all.
3. With Outbound Bridge enabled, we moved Broker A to a new server that is 1
network hop to Broker B. Instead of 2 network hops. This has worked for the
past few days.

In wanting to do some more testing on the TCP Dump, we moved Broker A back
to its original Server which is 2 network hops to Broker B, and the problem
is not showing up. Like I mentioned before, it is a hard problem to
reproduce. It will work for awhile, and then all of a sudden it will stop
working. But once it does happen, it keeps happening until some random time
that it starts working. I know this is sounding crazy. But there is no
rhyme or reason for ActiveMQ to stop working. It just does, and we do not
understand it. Could it be a race condition? Maybe. Could someone be doing
something to the network and we are unaware of it? Maybe. Could be a number
of reasons.

Currently, we are monitoring the Brokers and waiting to see if it happens
again. If/when it happens again, we will get some more TCP Dump info, turn
on TRACE logging not only at the application level, but also on the
TCPTransport. Basically, try to get as much info as we can. I will post
back to this forum when we do.

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684283.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

It's possible. Although, I didn't see any prefetch=0 use in your configs
(could be I didn't look hard enough since I didn't suspect it). Prefetch=0
uses different code paths for delivery of messages to clients - clients must
poll the server, and the server synchronously returns a message, with
prefetch=0. Without prefetch=0, the server pushes a number of messages
asynchronously to the client, up the the prefetch value, and then waits
until acknowledgements bring down the prefetch queue size before pushing
more.

One more question for you - when the solution works, does that include or
exclude the outbound bridge? I'm confused on that front because of the
statement that the outbound bridge never worked together with the statement
that the solution works with the change in network configuration.

In order to diagnose an issue like this, some clarity in terms of "what does
ActiveMQ do that is unexpected?" or "what doesn't ActiveMQ do that is
expected" is needed - at the level of the API interactions. Since the
bridge is "in a way" internal to ActiveMQ, it may seem tricky, but it is
possible to think of the bridge as separate - and even run it in a separate
JVM.

At this point, it sounds like your efforts to track down the problem are
moving in the right direction. If I were working on it, I would be looking
to enable debug logging, add log statements, grab stack traces at the time
the problem is observed, and other means of getting more details which
explain the lack of expected message flow.

Also consider that the exact same broker topology working with only a change
to the network sounds like an issue outside of ActiveMQ rather than one
inside, although there are possibilities inside the broker - especially race
conditions.

Hope this helps.

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684280.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

I noticed there is another Topic in the forums that sounds similar to our
issue.

http://activemq.2283324.n4.nabble.com/ActiveMq-consumer-intermittently-hanging-after-reconnect-td4684226.html

Could we be experiencing the same problem.  The JMS Bridge basically creates
a Consumer and Producer.



--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684277.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

Good question, difficult answer:

Once we saw that the issue continued no matter where the Broker Bs resided,
we started looking at the configuration more in depth and leave the Broker
Bs where they were on Server B.  We did make some changes to the configs for
all of the Broker B's like setting the TCP Keep Alive, or setting the
maxInactivity, etc.  None of these seem to fix the problem.  There was
always a Broker B not working.

During testing we always had Broker A on Server A, except for the "Same
Server" test.  So, the network was the same during these tests.  The even
more baffling part to all of this is that at first embedded Broker B2 and B3
were not working and embedded Broker B1 was.  Then one day during testing,
the situation flipped.  Embedded Broker B1 stop working and embedded Broker
B2 and B3 started working.  No changes to the configuration at that time. 
And we would see windows of time when something would work and then just
stop working.   

Before Server A was on a network that was 2 hops from Server B.  We moved
Broker A to a new server, call it Server H, which is one hop from Server B. 
Since, we did that all embedded Broker B's have been working.  However,
since we have seen windows of success; we are skeptical that this is the
fix.  








--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684207.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

Good troubleshooting.  What's different between the working and non-working
setups?  Only the network?



--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684205.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

As far as I know, transactions are not being used.  We did not set any
configurations to use transactions even with the Camel Routes.

As far as other scenarios, we have tried the following (Broker A is always a
stand alone instance):

*Same Server*
Broker A and embedded Broker B[1,2,3] are running on Server B in different
JVMs.  We could not replicate the problem at this time.

*Stand alone*
Broker B is set up as a stand alone instance on Server B with all of the
brides configured to Broker A on Server A.  The problem was replicated.

*Different Servers, same network*
We moved Broker A to a different server, let's call it Server C.  Replicated
the problem.
We then moved embedded Broker B1 to a different server, let's call it Server
D.  Replicated the problem.

*Different Servers, different network*
Embedded Broker B[1,2,3] are set up in our test environment on Server T. 
Could not replicate the problem.








--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684204.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

For keep-alive packets, I was referring to the inactivity monitor and not TCP
keep-alive packets.

The documentation is correct - the default is not to enable TCP keep-alive
packets on the socket.

I don't think there is any conflict between the two.

One thought is coming to mind here after re-reading the posts.  Is it
possible a transaction is involved here?  Is JTA in-use on the JVMs with
these embedded brokers?  If there are transactions, that could explain the
failure to send messages and why only a certain number of messages are ever
consumed.  To clarify, sends in a transaction only take effect after the
commit, and receives are only acknowledged on commit.

Note that the connection factory setup in the posted activemq configs are
not explicitly using transactions, which is good.

Transactions could also explain an apparent lack of resumed flow on
reconnect - the consumer would receive the same messages again on every
reconnect and would stop at the same point.

One way to tell - try the same setup with a stand-alone broker using the
configs provided to see if it has the same problems.



--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684193.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

Broker B which creates the Bridges is giving the EOFExceptions. But which
part of Broker B we do not know. We suspect it is the Producer for the
Outbound Bridge. The reason is two fold:

1. Comment out the Outbound Bridge and everything else works as designed.
2. The messages are never pushed from Broker B to Broker A over the Outbound
Bridge, even on startup.

I want to make sure we are talking the same language. According to the
documentation on these 2 pages:

http://activemq.apache.org/tcp-transport-reference.html
http://activemq.apache.org/configuring-wire-formats.html

keepAlive packets are off by default
connectionTimeout is set to 30000 ms
soTimeout is set to 0
maxInactivityDuration is set to 30000 ms

First, is the documentation not up to date?

When I refer to keepAlive, I am referring to the TCP Keep Alive setting.
But I am a little confused about how the connectionTimeout, soTimeout, and
maxInactivityDuration all play together. We are using the default settings,
could these be tripping over each other?

Also, once we loose the connection and the bridges are stopped, we will get
the connection back and loose it subsequently. However, neither the inbound
nor the outbound bridges are reset. So, we have to restart the app in order
for the bridge to be reset. Essentially, we get this EOFException and
ActiveMQ can not recover. A restart is mandatory.

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684192.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

If the drop-outs occur while there is heavy traffic flowing over the bridge,
keep-alive packets are not even part of the problem. Keep-alive packets are
only sent when the connection is idle for a period.

EOF Exception means "End-Of-File" on input, not a bad value read. In this
case it means the underlying socket closed while the transport was waiting
for traffic (messages, acknowledgements, keep-alive packets, etc); the side
of the connection reporting the error unexpectedly lost the connection
coming from the other side. Since the inactivity message appears, I suspect
that leads to the EOF exception - one side decides the other is idle for too
long and just drops the connection, then the other side complains the
connection dropped unexpectedly.

Which side is giving the EOF exception? The producing or consuming side of
the bridge?

Inactivity checking is on by default. Both sides of the connection perform
Inactivity checks and send keep-alive packets. They are enabled by default.
It is possible to disable them through settings (url-encoded parameters on
the client side; not sure about the server-side); I do not recommend
disabling them though - the checking is not the problem.

One thing to check after a reconnect: see if the consumer to the
destinations going over the bridge return on the configured destinations.
The failover transport should automatically recreate them, but it's worth
verifying. Note that's secondary to determining the cause of the inactivity
timeouts since the timeouts point to a tangible, likely external, issue, but
still valuable.

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684151.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

This does help, thanks!  But doesn't explain why would we get this behavior
while pushing over a large number of messages (example - 500).  The
connection is lost and the EOFException is logged.  A couple of questions:

1. Does the keepAlive flag need to be on the TransportConnector for ServerA,
or part of the ActiveMQConnectionFactory URI, or both?
2. What about the EOFException.  The way I understand it is the exception is
happening because the TCPBufferedInputStream is returning a bad value for
the first 4 bytes (aka - command) before the message has been completely
sent over.  We did do a TCP dump and did not see any missing packets or
wholes in the packet transmissions.  Could the Inactivity Monitor not be
picking up the fact that information is being sent across the wire, the
keep-alives are being dropped, and the Inactivity not seeing keep-alives is
disconnecting the Bridge while messages are being transmitted? 



 





--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684150.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

That helps a lot.

The inactivity timer fires when there is no communication between the
brokers for more than the timeout period (apparently 30 seconds); keep-alive
packets are used to prevent inactivity timeouts. So, the fact that
inactivity timeouts occur means that either the keep-alives are not doing
the job (turned off, interval is too great, bug, etc) or there is a network
issue preventing keep-alives from being delivered in a timely manner (packet
drops; excessive delays; etc).

I would pull out wireshark or tcpdump and monitor the traffic. If the
problem is hard to reproduce and takes a lot of time to recreate, this could
be tough.

The following debug message in the inactivity monitor can help to confirm
send of keep-alive packets:

no message sent since last write check, sending a KeepAliveInfo

One data point for consideration: this is an area of code that every user of
ActiveMQ that uses the openwire transport over TCP (a very high percentage)
uses continually; given that, it seems likely the broker code is solid at
this point. That's not meant to dismiss your report - it is just something
to consider.

Another possibility comes to mind, although it doesn't fit the symptom
description too closely -- if the JVM on the keep-alive-sending side of the
connection is performing Garbage Collection excessively, it's possible
delays in keep-alive sends could occur. The debug message mentioned above
would help to confirm the timing of the sends.

Hope this helps.

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684149.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

It is not easily reproducible via a test case.

More Info:

- The inbound bridge would transfer over messages up to a point. Meaning if
we have 40 messages they would all get transferred, but if we had 500
messages not all would be transferred. However, the Queue's Pending Message
Count on Server A would not reflect the transfer. It is almost if Server B
did not get a chance to acknowledge that all the messages have been
transferred. And no messages were sent back via the Outbound Bridge no
matter how many messages there were.
- On restart, the Outbound Bridge would not transfer over any messages in
the outbound queue.
- When commenting out the Outbound Queue Connector, the Bridge worked and
did not get an EOFException. Messages were transferred completely via the
Inbound Bridge and the Message counts were accurate.
- We tried setting up a Standalone instance for Server B to connect to
Server A. This did not resolve the problem.
- Originally, the Bridges in JVM 2 and 3 were not working properly. Then
the roles reversed and the Bridges in JVM 1 were not working properly.
Supposedly, nothing changed and the switch happened.
- We moved the Brokers to different machines on the same network; and the
situation did not resolve itself. Until, the following.
- The network goes over a VPN which had 2 hops between Server A Broker and
Server B Broker. We have since moved Server A Broker to be 1 hop from
Server B Broker. The Bridge seems to be working ok for now, but we have no
idea why. So, we are not confident the issue has been resolved.

We did turn on TRACE logging level for 'org.apache.activemq'. Here is what
came out for ServerB

This message would be logged whether we were processing JMS Messages or just
sitting idle. So, why would we timeout when we are transferring messages
from Server A Broker to Server B Broker?

As far as it affecting the application, it wouldn't keep other functionality
from happening within the Application. However, since the Inbound Bridge
was not reconnected properly; the client consumer on the Inbound Bridge
would not pick up any messages. The Messages would just accumulate on
Server A.

One other piece to the puzzle is the Consumer of the queue_[1,2,3]_inbound
and the Producer of queue_outbound is a Camel route. The Camel Route works
fine when the connections are working properly, or the Outbound Queue
Connector is not used as noted above.

--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684142.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

Is it easily reproducible?

What symptoms have been observed?  It sounds like message flow stops across
the bridges, and the EOF exception is mentioned.  Anything else?

Turning debugging logging on for Region may help (try setting
org.apache.activemq.broker.region to DEBUG in log4j.properties).

To clarify one consideration - the failover transport should be isolating
the JMS bridge from lost connections.



--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684135.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by pminearo <pe...@skycreek.com>.

Yes, both Brokers are running in 5.10.0.

Not sure if it happens after the problem starts or before.  The log message
posted is what we get.  So, we are not sure if the EOFException is causing
the Connectors to disconnect.  Or if the EOFException is being caused by the
Connectors being disconnected.



--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684134.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.

Re: JMS to JMS Bridge Connection

Posted by artnaseef <ar...@artnaseef.com>.

The configuration is properly using the failover transport, and the logs show
reconnects.  So, the TCP transport exception-handling is not the problem.

Does the EOF exception occur every time after the problem starts?  The stack
trace suggests a problem at the level of the OpenWire protocol.  Perhaps
there is a bug caused by specific messages.  One way to tell would be to
drain messages after the problem occurs and see if that helps at all
(assuming remaining messages do not exhibit the same problem).

Are all the brokers running 5.10.0?




--
View this message in context: http://activemq.2283324.n4.nabble.com/JMS-to-JMS-Bridge-Connection-tp4684129p4684130.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.