You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by bbuzzard <Bi...@bnsflogistics.com> on 2017/09/25 15:29:15 UTC

Receiving NetworkBroker seems to be stuck

I'm using ActiveMQ-5.5.1 with a centralized broker feeding approximately
twenty other brokers via NetworkBridges.  All of the brokers except for one
is working perfectly and have been for years.  We recently moved our Data
Warehouse (DW) to the cloud and that broker seems to hang up and stop
communicating four or five times a day.

I've used JMX to remotely monitor the centralized broker (HUB) and the DW
broker.  The HUB continues to move files to/from all of the brokers except
for the DW.  The HUB, via JMX, reports that the DW NetworkBridge is down,
but the DW broker says the NetworkBridge is up.

I turned on transport tracing for both the HUB and the DW brokers and I can
clearly see the KeepAlive messages going to the DW broker and the responses
coming back until the HUB reports the NetworkBridge to the DW is down.  My
JMX connection to the HUB continues to work and Heap and Nonheap usage seem
well within design limits, but the JMX connection to the DW returns a
timeout.  

I then tried logging into the DW (Linux box) and tried to run TOP.  If took
almost a minute for the letters T, O, P, to echo back which suggested to me
that the box was under heavy cpu load.  Just prior to the timeout, the DW
JMX connection showed that Heap and Nonheap were within design limits.  May
supervisor asked two very valid questions: "How do I know if the DW Broker
did or did not use up heap if I cannot see heap usage via JMX?" and "Could
GC be stuck?".  We also noticed that all ActiveMQ logging ceases while the
broker is hung.

The DW broker is supposed to run continuously.  The DW itself instantiates
several very large one shot processes every ten minutes and I suspect that
this is what is causing the DW broker and JMX to hang.

Does anyone have experience troubleshooting a problem like this?  What
should I do to prove that the problem is either the ActiveMQ broker or the
processed that the DW is instantiating?  If someone has seen this problem
and fixed it, how did you fix it?

The only way I found to fix the hung broker is to execute an activemq
restart that times out after thirty seconds and then does a kill on the pid.






--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Receiving NetworkBroker seems to be stuck

Posted by Tim Bain <tb...@alumni.duke.edu>.
It's possible to run top in batch mode and then pipe that to a file, which
would let you test your theory that CPU usage is high.

It's also possible to configure the JVM to output GC activity to a log
file, which would let you see if heavy GC activity is the root cause of any
unresponsiveness.

If you can predict when the DW broker will become unresponsive, you could
attach JVisualVM before that time and then use the CPU Sampler (don't use
the CPU Profiler on a production server!) to try to determine what the
broker is doing during the time immediately before/during the period of
unresponsiveness. But you have to be able to predict reasonably accurately
when it will happen.

Also, what GC algorithm are you using?

Tim

On Mon, Sep 25, 2017 at 9:29 AM, bbuzzard <Bi...@bnsflogistics.com>
wrote:

> I'm using ActiveMQ-5.5.1 with a centralized broker feeding approximately
> twenty other brokers via NetworkBridges.  All of the brokers except for one
> is working perfectly and have been for years.  We recently moved our Data
> Warehouse (DW) to the cloud and that broker seems to hang up and stop
> communicating four or five times a day.
>
> I've used JMX to remotely monitor the centralized broker (HUB) and the DW
> broker.  The HUB continues to move files to/from all of the brokers except
> for the DW.  The HUB, via JMX, reports that the DW NetworkBridge is down,
> but the DW broker says the NetworkBridge is up.
>
> I turned on transport tracing for both the HUB and the DW brokers and I can
> clearly see the KeepAlive messages going to the DW broker and the responses
> coming back until the HUB reports the NetworkBridge to the DW is down.  My
> JMX connection to the HUB continues to work and Heap and Nonheap usage seem
> well within design limits, but the JMX connection to the DW returns a
> timeout.
>
> I then tried logging into the DW (Linux box) and tried to run TOP.  If took
> almost a minute for the letters T, O, P, to echo back which suggested to me
> that the box was under heavy cpu load.  Just prior to the timeout, the DW
> JMX connection showed that Heap and Nonheap were within design limits.  May
> supervisor asked two very valid questions: "How do I know if the DW Broker
> did or did not use up heap if I cannot see heap usage via JMX?" and "Could
> GC be stuck?".  We also noticed that all ActiveMQ logging ceases while the
> broker is hung.
>
> The DW broker is supposed to run continuously.  The DW itself instantiates
> several very large one shot processes every ten minutes and I suspect that
> this is what is causing the DW broker and JMX to hang.
>
> Does anyone have experience troubleshooting a problem like this?  What
> should I do to prove that the problem is either the ActiveMQ broker or the
> processed that the DW is instantiating?  If someone has seen this problem
> and fixed it, how did you fix it?
>
> The only way I found to fix the hung broker is to execute an activemq
> restart that times out after thirty seconds and then does a kill on the
> pid.
>
>
>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805
> .html
>