You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Kulidan <sc...@goodrich.com> on 2010/08/09 21:51:31 UTC

All AMQ traffic stops (Network of Brokers)

The issue we are having is that after a certain amount of time (days) ALL
activeMQ traffic comes to a halt.  No errors appear in any of the logs (we
do not have debug logging enabled at this time).  When this 'event' happens
it appears all produces are blocking regardless of which node they are
connected to or what topic.

We use a network of brokers (currently 5) located around the globe.  Each
node has a connection to the other 4 nodes.

An example of this configuration:
<networkConnectors>
    <networkConnector name="Site2" uri="static://(tcp://site1:61616)"/>
    <networkConnector name="Site3" uri="static://(tcp://site2:61616)"/>
    <networkConnector name="Site4" uri="static://(tcp://site3:61616)"/>
    <networkConnector name="Site5" uri="static://(tcp://site4:61616)"/>
</networkConnectors>

Clients located near each of these sites connect to the closest broker and
are able to get data that is produced by clients connected to any of the
other sites.  When the traffic comes to a halt it appears we have to reboot
all 5 servers to fix the issue.  For example clients of site1 will report
the issue and typically the solution is the simply reboot that AMQ but in
this scenario this does not help - we end up resetting all 5 servers before
the problem is cleared.

What could be the cause of this?  If one of the brokers were getting data it
has no consumers for, could this cause it to fill up and block all traffic? 
What most surprises me about this is it does not halt traffic for a specific
topic of queue but all topics.


Thanks for any clues.



-- 
View this message in context: http://old.nabble.com/All-AMQ-traffic-stops-%28Network-of-Brokers%29-tp29391515p29391515.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.


Re: All AMQ traffic stops (Network of Brokers)

Posted by Dirk Fröhner <di...@email.de>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Afternoon,

when you experience this next time, take thread dumps of the brokers. I
had to deal with this before when we had <I know who> restarting some
client applications a lot of times in a row with kill -9, but the
following behaviour can also be caused by other issues around the
network infrastructure.

The result was that the OS did not detect and report the lost TCP
sessions to the Java blocking I/O before a certain timeout (can be e.g.
30 minutes) and certain threads remained in
SocketOutputStream.socketWrite0, e.g. in a "BrokerService" thread.

Unfortunately, there is always a monitor involved here that "ActiveMQ
Transport" threads are waiting for. So the overall outcome is that all
threads related to any kind of transport are either blocked on object
monitors or are waiting to return from socketWrite0 for quite a while
and from the outside this looks like a totally paralyzed broker.

If that is the reason for you, you can try with adding
	?transport.soTimeout=10000&transport.soWriteTimeout=15000
to the URI of the transport connector in your broker config (the values
are only examples, just use what you think makes sense for you - it's
all millis, of course). You can search for this in JIRA to get more info.

If it is something different for you, still thread dumps is what you
definitely need to find the cause.

Hope that helps.

Glück auf,
Dirk


On 10/08/10 12:27, Gary Tully wrote:
> I think debug logging may be needed to understand this.
> 
> On 9 August 2010 20:51, Kulidan <sc...@goodrich.com> wrote:
>>
>> The issue we are having is that after a certain amount of time (days) ALL
>> activeMQ traffic comes to a halt.  No errors appear in any of the logs (we
>> do not have debug logging enabled at this time).  When this 'event' happens
>> it appears all produces are blocking regardless of which node they are
>> connected to or what topic.
>>
>> We use a network of brokers (currently 5) located around the globe.  Each
>> node has a connection to the other 4 nodes.
>>
>> An example of this configuration:
>> <networkConnectors>
>>    <networkConnector name="Site2" uri="static://(tcp://site1:61616)"/>
>>    <networkConnector name="Site3" uri="static://(tcp://site2:61616)"/>
>>    <networkConnector name="Site4" uri="static://(tcp://site3:61616)"/>
>>    <networkConnector name="Site5" uri="static://(tcp://site4:61616)"/>
>> </networkConnectors>
>>
>> Clients located near each of these sites connect to the closest broker and
>> are able to get data that is produced by clients connected to any of the
>> other sites.  When the traffic comes to a halt it appears we have to reboot
>> all 5 servers to fix the issue.  For example clients of site1 will report
>> the issue and typically the solution is the simply reboot that AMQ but in
>> this scenario this does not help - we end up resetting all 5 servers before
>> the problem is cleared.
>>
>> What could be the cause of this?  If one of the brokers were getting data it
>> has no consumers for, could this cause it to fill up and block all traffic?
>> What most surprises me about this is it does not halt traffic for a specific
>> topic of queue but all topics.
>>
>>
>> Thanks for any clues.
>>
>>
>>
>> --
>> View this message in context: http://old.nabble.com/All-AMQ-traffic-stops-%28Network-of-Brokers%29-tp29391515p29391515.html
>> Sent from the ActiveMQ - User mailing list archive at Nabble.com.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxhbckACgkQAZVmLZMecmwnRwCdEmaxdHeztGZjNzjS7u+Aamkx
1bUAn1nYnj9ZXr/2d7kVkObiRC5jPFHo
=xqeS
-----END PGP SIGNATURE-----

Re: All AMQ traffic stops (Network of Brokers)

Posted by Gary Tully <ga...@gmail.com>.
I think debug logging may be needed to understand this.

On 9 August 2010 20:51, Kulidan <sc...@goodrich.com> wrote:
>
> The issue we are having is that after a certain amount of time (days) ALL
> activeMQ traffic comes to a halt.  No errors appear in any of the logs (we
> do not have debug logging enabled at this time).  When this 'event' happens
> it appears all produces are blocking regardless of which node they are
> connected to or what topic.
>
> We use a network of brokers (currently 5) located around the globe.  Each
> node has a connection to the other 4 nodes.
>
> An example of this configuration:
> <networkConnectors>
>    <networkConnector name="Site2" uri="static://(tcp://site1:61616)"/>
>    <networkConnector name="Site3" uri="static://(tcp://site2:61616)"/>
>    <networkConnector name="Site4" uri="static://(tcp://site3:61616)"/>
>    <networkConnector name="Site5" uri="static://(tcp://site4:61616)"/>
> </networkConnectors>
>
> Clients located near each of these sites connect to the closest broker and
> are able to get data that is produced by clients connected to any of the
> other sites.  When the traffic comes to a halt it appears we have to reboot
> all 5 servers to fix the issue.  For example clients of site1 will report
> the issue and typically the solution is the simply reboot that AMQ but in
> this scenario this does not help - we end up resetting all 5 servers before
> the problem is cleared.
>
> What could be the cause of this?  If one of the brokers were getting data it
> has no consumers for, could this cause it to fill up and block all traffic?
> What most surprises me about this is it does not halt traffic for a specific
> topic of queue but all topics.
>
>
> Thanks for any clues.
>
>
>
> --
> View this message in context: http://old.nabble.com/All-AMQ-traffic-stops-%28Network-of-Brokers%29-tp29391515p29391515.html
> Sent from the ActiveMQ - User mailing list archive at Nabble.com.
>
>



-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com