You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by "Hendley, Sam" <Sa...@sensus.com> on 2014/12/30 23:46:20 UTC

Exceeding MemoryUsage causes Network Connector connections to stop

Hello ActiveMQ community:

TL;DR: I now think this is really a mis-configuration on our part but it took quite a lot of digging before we nailed the issue, I am reporting this to save others time in the future.

We are running a "store and forward network of brokers" where each broker is connected to all other brokers (full mesh). Our applications connect only to their local broker. Under load we would occasionally see a broker just "disappear" from the rest of the cluster and all of the work would end up on the remaining nodes. We were having trouble isolating the fault since our overall system wasn't handling this gracefully and was causing other traffic making cause and effect difficult to trace down.

I set out to reproduce the failure we were having in as small of a case as I could. The result is at: https://github.com/samhendley/activemq-bug-reports where I document the experiment more fully. I wasn't able to get a 100% reproduction, best I could do was get to about 50% of the runs on my machine failing. This makes me believe it is probably a race condition, but I wasn't able to find any obvious smoking guns.

In short I found that if the overall broker MemoryUsage is exceeded (because producer flow control is off) then sometimes the network connectors between the brokers would become stuck. If I enabled producer flow control or increased the configured max memory the issue was no longer reproducible.

It looks like we can reconfigure our production systems to workaround this problem but should I file a bug for this? A silent failure like this is really not fun to run to diagnose on a large scale system.

Sam

>From github page:

Bug description:

If the configured MemoryStore limit is large enough to stay below 100% while the requestor application is dumping messages into the broker network the tests passes successfully. If however the memory usage on the brokers goes larger than 100% (in this case peaking around 600% of 100 Mb) the network connectors sometimes become "stuck". Stuck in this case means there are messages enqueued on one or both of the "server" brokers but the messages are not being dequeued or forwarded by the network connector back to the "client" broker.

This issues doesn't happen with every run with a small memory size but in my tests it generally failed about 50% of the times I tried running it. You may have to run it a few times before getting it to fail. On one failure JMX showed that 417k responses had been generated on server1 but only 363k had been dequeued for transmission to the client broker. In that test run the other server had correctly handled the other 583k requests.

When it does fail there is nothing in the log that indicates anything is amiss. I would have expected to see some sort of log message to indicate that the network connector has been throttled (if indeed that is what is happening). This same test done with a single broker always passes which leads me to believe it really is a problem with the network connectors.



Re: Exceeding MemoryUsage causes Network Connector connections to stop

Posted by artnaseef <ar...@artnaseef.com>.
Without producer-flow-control, the memory limits have no meaning.  That's how
they take effect - by blocking producers.

So, without producer flow control, the symptoms described can happen
normally as there is nothing to prevent the JVM from running out of memory.

When a broker drops out-of-service from the network, how is its JVM state? 
Is it in GC purgatory?



--
View this message in context: http://activemq.2283324.n4.nabble.com/Exceeding-MemoryUsage-causes-Network-Connector-connections-to-stop-tp4689349p4689416.html
Sent from the ActiveMQ - User mailing list archive at Nabble.com.