You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Thiago Veronezi <th...@veronezi.org> on 2018/02/15 12:03:11 UTC

Producer Flow Control active but server still facing OOM issues

Hi, ActiveMQ community,

I'm actively working on a documentation for "out of memory" protection on
ActiveMQ. Recently I was working on this POC project where I stressed a
default broker configuration with 1.000.000 messages with 20KB payload
each, where each message took 1 second to be consumed. It caused the
"Pending Messages" numbers go up pretty fast.

My understanding is that AMQ, out of the box, has the "Producer Flow
Control" feature activated for all Topics and Queues; and it has
"usedMemory" threshold set as 70% of 512MB. Still, with the load I used, I
saw OOM issues. The 1.000.000 messages actually killed the server.

In my tests, I use several threads and nodes to send all the 1.000.000
messages in parallel. That means I have several connections to the broker.
Once I used the sendFailIfNoSpace="true" option, the OOM issues ceased; The
consumers were able to catch up, And the broker survived. One thing that I
noticed is that even when the "Pending messages" number reached 0, it took
some time for the server to allow new producer connections again.

Questions:

* Is it possible that AMQ doesn't count the memory used by each active
connection as variable to the final used memory calculation?

* Is there any configuration where we set a refresh rate so the server
notices faster when the memory is below the maximum threshold again?

* Is the use of sendFailIfNoSpace="true" the ultimate solution for OOM
issues? Is this something I can advise a customer to use so he is 99.9%
guaranteed to not have OOM crashes?

Thanks,
Thiago.

Ps.: I think this is my first message here. :)

Re: Producer Flow Control active but server still facing OOM issues

Posted by Thiago Veronezi <th...@veronezi.org>.

Hi Tim,

Thanks for the follow up. I haven't touch this issue for a week. I will
need to study this further. I will get back to you once I'm done.

For now, this:

> How big of a heap did you try?
4G

> Are you able to share the test driver you were using so someone could try to
reproduce and then analyze the OOM behavior you were seeing?
Not sure. I will need approval first.

> how many queues were they spread across
I had one queue on one amq instance. I'm not sure I got the question right.

> were you using the JMS API directly or using a third-party library such
as Spring or Camel to facilitate the publishing
Using JavaEE. This I can share. :)

@Resource(name = "MyJmsConnectionFactory")
private ConnectionFactory connectionFactory;

// this goes in a loop, 1.000.000 times
            try (final Connection connection =
connectionFactory.createConnection()) {
                connection.start();
                try (final Session session = connection.createSession(true,
Session.AUTO_ACKNOWLEDGE)) {
                    try (final MessageProducer producer =
session.createProducer(vector)) {
                        final Message msg = session.createMessage();
                        msg.setStringProperty("key", key);
                        msg.setObjectProperty("payload", payload);
                        msg.setIntProperty("delayMessageProcess",
delayMessageProcess);
                        msg.setBooleanProperty("messageThrowsException",
messageThrowsException);
                        msg.setJMSDeliveryMode(deliveryMode);
                        try {
                            producer.send(msg);
                            return true;
                        } catch (JMSException e) {
                            // The connection with active mq is broken. Try
it again with another
                            // "Connection" object in the next "for" loop.
                            LOG.log(Level.WARNING, "Producer unable to send
message. -> " + e.getMessage());
                        }
                    }
                } catch (ResourceAllocationException e) {
                    LOG.log(Level.WARNING, "JMS Session failed. [OOM] -> "
+ e.getMessage());
                } catch (JMSException e) {
                    LOG.log(Level.WARNING, "JMS Session failed. -> " +
e.getMessage());
                }
            } catch (JMSException e) {
                LOG.log(Level.WARNING, "JMS Connection failed. -> " +
e.getMessage());
            }

The current code will have have only 8 connections at one time. The
previous version would trigger the 1.000.000 connections all at once. With
the improvement, it is harder to get an OOM exception with KahaDB. But if
we give it enough pending messages, it eventually goes out of memory. The
same does not happen with external SQL servers.

> we'd want to profile why memory is used when persistent messages are
simply sitting in the persistence store.
I will send you this.

Thanks!
Thiago.




On Wed, Feb 21, 2018 at 3:05 AM, Tim Bain <tb...@alumni.duke.edu> wrote:

> I just read your last response more closely and realized you said you had
> tried larger -Xmx values with no difference. How big of a heap did you try?
>
> Are you able to share the test driver you were using so someone could try
> to reproduce and then analyze the OOM behavior you were seeing? If not, can
> you give more details about how you were publishing the messages (how many
> queues were they spread across, were you using the JMS API directly or
> using a third-party library such as Spring or Camel to facilitate the
> publishing, etc.)?
>
> Also, if your client code was no longer creating one connection per message
> (as I understood you to say in your last message), did the problem still
> occur? I'd expect that the maximumConnections=1000 setting wouldn't matter
> if you were only using one connection (or some small number of
> connections), and I'd be curious if that eliminates both the OOM behavior
> and the connection-refused behavior. If it does, that says that the problem
> is in the connection-management code, not in the choice of persistent
> store; if not, then KahaDB likely is indeed to blame and we'd want to
> profile why memory is used when persistent messages are simply sitting in
> the persistence store.
>
> Tim
>
> On Fri, Feb 16, 2018 at 4:13 PM, Tim Bain <tb...@alumni.duke.edu> wrote:
>
> > 512 MB isn't very much memory for an ActiveMQ broker. I'm used to seeing
> > more like 2 GB, 4 GB, sometimes more when people describe how big their
> > heap is. If 512 MB works with Postgres, and you're good with using
> > Postgres, that's fine, but the general consensus is that KahaDB has had
> > more testing and is more performant than the SQL data store, so I
> > personally would run KahaDB even if it meant I had to use a heap larger
> > than 512 MB. So you might consider testing whether your scenario works
> with
> > KahaDB with a larger heap, and if so whether you want to use Postgres or
> > KahaDB.
> >
> > Tim
> >
> > On Feb 16, 2018 7:49 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:
> >
> > Hi Tim,
> >
> > Thanks for your time. I managed to make the Broker very stable after
> moving
> > away from KahaDB and Limiting the number of active connections at one
> time.
> > This is what I think happened.
> >
> > * KahaDB competes with the broker for JVM resources. It does not matter
> how
> > much of memory I reserve with "-Xmx", KahaDB eats it all up when dealing
> > with the 1.000.000 messages.
> > * My client code was wrong. I was creating connections in parallel like
> > crazy.
> >
> > Just after sending the last message, I've got a OOM exception, even with
> > the sendFailIfNoSpace activated. The "maximumConnections=1000" was too
> much
> > for my 512MB, plus KahaDB was using almost all of it.
> >
> > I need to check again the case where the memory goes back to normal but
> > activemq keeps denying new connections for some time. It was probably
> > something wrong with my client code. I will check this out as soon as I
> > can. I will post the news here.
> >
> > The good news is that the AMQ+Postgres combination handles the 1.000.000
> > persistent messages like a charm.
> >
> > []s,
> > Thiago.
> >
> >
> >
> > On Fri, Feb 16, 2018 at 12:12 PM, Tim Bain <tb...@alumni.duke.edu>
> wrote:
> >
> > > This is only a partial answer (I'll try to get time this weekend to
> > answer
> > > the parts I don't have time for now), but I want to get you something
> to
> > > start with.
> > >
> > > On Feb 15, 2018 5:03 AM, "Thiago Veronezi" <th...@veronezi.org>
> wrote:
> > >
> > > Hi, ActiveMQ community,
> > >
> > > I'm actively working on a documentation for "out of memory" protection
> on
> > > ActiveMQ. Recently I was working on this POC project where I stressed a
> > > default broker configuration with 1.000.000 messages with 20KB payload
> > > each, where each message took 1 second to be consumed. It caused the
> > > "Pending Messages" numbers go up pretty fast.
> > >
> > >
> > > Are these persistent or non-persistent messages? How large (capacity)
> is
> > > your persistent store and your temp store?
> > >
> > > My understanding is that AMQ, out of the box, has the "Producer Flow
> > > Control" feature activated for all Topics and Queues; and it has
> > > "usedMemory" threshold set as 70% of 512MB.
> > >
> > >
> > > Did PFC kick in? You'd see it in the broker's logs.
> > >
> > > Still, with the load I used, I
> > > saw OOM issues. The 1.000.000 messages actually killed the server.
> > >
> > > In my tests, I use several threads and nodes to send all the 1.000.000
> > > messages in parallel. That means I have several connections to the
> > broker.
> > > Once I used the sendFailIfNoSpace="true" option, the OOM issues ceased;
> > The
> > > consumers were able to catch up, And the broker survived. One thing
> that
> > I
> > > noticed is that even when the "Pending messages" number reached 0, it
> > took
> > > some time for the server to allow new producer connections again.
> > >
> > >
> > > When it didn't allow new producer connections, what was the symptom?
> > >
> > > Questions:
> > >
> > > * Is it possible that AMQ doesn't count the memory used by each active
> > > connection as variable to the final used memory calculation?
> > >
> > >
> > > Yes. Those limits are solely on the memory message store (used for
> > > non-persistent messages and for paging in persistent messages from the
> > > persistent store), so it's possible to OOM even though you don't exceed
> > > those limits.
> > >
> > > * Is there any configuration where we set a refresh rate so the server
> > > notices faster when the memory is below the maximum threshold again?
> > >
> > >
> > > To the best of my knowledge, the metrics are captured instantaneously
> by
> > > modifying an object in memory, not via a periodic poll, so I think
> > > something else is going on. I'll come back to this.
> > >
> > > * Is the use of sendFailIfNoSpace="true" the ultimate solution for OOM
> > > issues? Is this something I can advise a customer to use so he is 99.9%
> > > guaranteed to not have OOM crashes?
> > >
> > >
> > > No. SendFailIfNoSpace just means that the client won't wait forever on
> a
> > > send. The only reason you're not seeing OOMs when you used it is
> because
> > > you're not retrying when you catch it.
> > >
> > > Thanks,
> > > Thiago.
> > >
> > > Ps.: I think this is my first message here. :)
> > >
> >
> >
> >
>

Re: Producer Flow Control active but server still facing OOM issues

Posted by Tim Bain <tb...@alumni.duke.edu>.

I just read your last response more closely and realized you said you had
tried larger -Xmx values with no difference. How big of a heap did you try?

Are you able to share the test driver you were using so someone could try
to reproduce and then analyze the OOM behavior you were seeing? If not, can
you give more details about how you were publishing the messages (how many
queues were they spread across, were you using the JMS API directly or
using a third-party library such as Spring or Camel to facilitate the
publishing, etc.)?

Also, if your client code was no longer creating one connection per message
(as I understood you to say in your last message), did the problem still
occur? I'd expect that the maximumConnections=1000 setting wouldn't matter
if you were only using one connection (or some small number of
connections), and I'd be curious if that eliminates both the OOM behavior
and the connection-refused behavior. If it does, that says that the problem
is in the connection-management code, not in the choice of persistent
store; if not, then KahaDB likely is indeed to blame and we'd want to
profile why memory is used when persistent messages are simply sitting in
the persistence store.

Tim

On Fri, Feb 16, 2018 at 4:13 PM, Tim Bain <tb...@alumni.duke.edu> wrote:

> 512 MB isn't very much memory for an ActiveMQ broker. I'm used to seeing
> more like 2 GB, 4 GB, sometimes more when people describe how big their
> heap is. If 512 MB works with Postgres, and you're good with using
> Postgres, that's fine, but the general consensus is that KahaDB has had
> more testing and is more performant than the SQL data store, so I
> personally would run KahaDB even if it meant I had to use a heap larger
> than 512 MB. So you might consider testing whether your scenario works with
> KahaDB with a larger heap, and if so whether you want to use Postgres or
> KahaDB.
>
> Tim
>
> On Feb 16, 2018 7:49 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:
>
> Hi Tim,
>
> Thanks for your time. I managed to make the Broker very stable after moving
> away from KahaDB and Limiting the number of active connections at one time.
> This is what I think happened.
>
> * KahaDB competes with the broker for JVM resources. It does not matter how
> much of memory I reserve with "-Xmx", KahaDB eats it all up when dealing
> with the 1.000.000 messages.
> * My client code was wrong. I was creating connections in parallel like
> crazy.
>
> Just after sending the last message, I've got a OOM exception, even with
> the sendFailIfNoSpace activated. The "maximumConnections=1000" was too much
> for my 512MB, plus KahaDB was using almost all of it.
>
> I need to check again the case where the memory goes back to normal but
> activemq keeps denying new connections for some time. It was probably
> something wrong with my client code. I will check this out as soon as I
> can. I will post the news here.
>
> The good news is that the AMQ+Postgres combination handles the 1.000.000
> persistent messages like a charm.
>
> []s,
> Thiago.
>
>
>
> On Fri, Feb 16, 2018 at 12:12 PM, Tim Bain <tb...@alumni.duke.edu> wrote:
>
> > This is only a partial answer (I'll try to get time this weekend to
> answer
> > the parts I don't have time for now), but I want to get you something to
> > start with.
> >
> > On Feb 15, 2018 5:03 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:
> >
> > Hi, ActiveMQ community,
> >
> > I'm actively working on a documentation for "out of memory" protection on
> > ActiveMQ. Recently I was working on this POC project where I stressed a
> > default broker configuration with 1.000.000 messages with 20KB payload
> > each, where each message took 1 second to be consumed. It caused the
> > "Pending Messages" numbers go up pretty fast.
> >
> >
> > Are these persistent or non-persistent messages? How large (capacity) is
> > your persistent store and your temp store?
> >
> > My understanding is that AMQ, out of the box, has the "Producer Flow
> > Control" feature activated for all Topics and Queues; and it has
> > "usedMemory" threshold set as 70% of 512MB.
> >
> >
> > Did PFC kick in? You'd see it in the broker's logs.
> >
> > Still, with the load I used, I
> > saw OOM issues. The 1.000.000 messages actually killed the server.
> >
> > In my tests, I use several threads and nodes to send all the 1.000.000
> > messages in parallel. That means I have several connections to the
> broker.
> > Once I used the sendFailIfNoSpace="true" option, the OOM issues ceased;
> The
> > consumers were able to catch up, And the broker survived. One thing that
> I
> > noticed is that even when the "Pending messages" number reached 0, it
> took
> > some time for the server to allow new producer connections again.
> >
> >
> > When it didn't allow new producer connections, what was the symptom?
> >
> > Questions:
> >
> > * Is it possible that AMQ doesn't count the memory used by each active
> > connection as variable to the final used memory calculation?
> >
> >
> > Yes. Those limits are solely on the memory message store (used for
> > non-persistent messages and for paging in persistent messages from the
> > persistent store), so it's possible to OOM even though you don't exceed
> > those limits.
> >
> > * Is there any configuration where we set a refresh rate so the server
> > notices faster when the memory is below the maximum threshold again?
> >
> >
> > To the best of my knowledge, the metrics are captured instantaneously by
> > modifying an object in memory, not via a periodic poll, so I think
> > something else is going on. I'll come back to this.
> >
> > * Is the use of sendFailIfNoSpace="true" the ultimate solution for OOM
> > issues? Is this something I can advise a customer to use so he is 99.9%
> > guaranteed to not have OOM crashes?
> >
> >
> > No. SendFailIfNoSpace just means that the client won't wait forever on a
> > send. The only reason you're not seeing OOMs when you used it is because
> > you're not retrying when you catch it.
> >
> > Thanks,
> > Thiago.
> >
> > Ps.: I think this is my first message here. :)
> >
>
>
>

Re: Producer Flow Control active but server still facing OOM issues

Posted by Tim Bain <tb...@alumni.duke.edu>.

512 MB isn't very much memory for an ActiveMQ broker. I'm used to seeing
more like 2 GB, 4 GB, sometimes more when people describe how big their
heap is. If 512 MB works with Postgres, and you're good with using
Postgres, that's fine, but the general consensus is that KahaDB has had
more testing and is more performant than the SQL data store, so I
personally would run KahaDB even if it meant I had to use a heap larger
than 512 MB. So you might consider testing whether your scenario works with
KahaDB with a larger heap, and if so whether you want to use Postgres or
KahaDB.

Tim

On Feb 16, 2018 7:49 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:

Hi Tim,

Thanks for your time. I managed to make the Broker very stable after moving
away from KahaDB and Limiting the number of active connections at one time.
This is what I think happened.

* KahaDB competes with the broker for JVM resources. It does not matter how
much of memory I reserve with "-Xmx", KahaDB eats it all up when dealing
with the 1.000.000 messages.
* My client code was wrong. I was creating connections in parallel like
crazy.

Just after sending the last message, I've got a OOM exception, even with
the sendFailIfNoSpace activated. The "maximumConnections=1000" was too much
for my 512MB, plus KahaDB was using almost all of it.

I need to check again the case where the memory goes back to normal but
activemq keeps denying new connections for some time. It was probably
something wrong with my client code. I will check this out as soon as I
can. I will post the news here.

The good news is that the AMQ+Postgres combination handles the 1.000.000
persistent messages like a charm.

[]s,
Thiago.

On Fri, Feb 16, 2018 at 12:12 PM, Tim Bain <tb...@alumni.duke.edu> wrote:

> This is only a partial answer (I'll try to get time this weekend to answer
> the parts I don't have time for now), but I want to get you something to
> start with.
>
> On Feb 15, 2018 5:03 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:
>
> Hi, ActiveMQ community,
>
> I'm actively working on a documentation for "out of memory" protection on
> ActiveMQ. Recently I was working on this POC project where I stressed a
> default broker configuration with 1.000.000 messages with 20KB payload
> each, where each message took 1 second to be consumed. It caused the
> "Pending Messages" numbers go up pretty fast.
>
>
> Are these persistent or non-persistent messages? How large (capacity) is
> your persistent store and your temp store?
>
> My understanding is that AMQ, out of the box, has the "Producer Flow
> Control" feature activated for all Topics and Queues; and it has
> "usedMemory" threshold set as 70% of 512MB.
>
>
> Did PFC kick in? You'd see it in the broker's logs.
>
> Still, with the load I used, I
> saw OOM issues. The 1.000.000 messages actually killed the server.
>
> In my tests, I use several threads and nodes to send all the 1.000.000
> messages in parallel. That means I have several connections to the broker.
> Once I used the sendFailIfNoSpace="true" option, the OOM issues ceased;
The
> consumers were able to catch up, And the broker survived. One thing that I
> noticed is that even when the "Pending messages" number reached 0, it took
> some time for the server to allow new producer connections again.
>
>
> When it didn't allow new producer connections, what was the symptom?
>
> Questions:
>
> * Is it possible that AMQ doesn't count the memory used by each active
> connection as variable to the final used memory calculation?
>
>
> Yes. Those limits are solely on the memory message store (used for
> non-persistent messages and for paging in persistent messages from the
> persistent store), so it's possible to OOM even though you don't exceed
> those limits.
>
> * Is there any configuration where we set a refresh rate so the server
> notices faster when the memory is below the maximum threshold again?
>
>
> To the best of my knowledge, the metrics are captured instantaneously by
> modifying an object in memory, not via a periodic poll, so I think
> something else is going on. I'll come back to this.
>
> * Is the use of sendFailIfNoSpace="true" the ultimate solution for OOM
> issues? Is this something I can advise a customer to use so he is 99.9%
> guaranteed to not have OOM crashes?
>
>
> No. SendFailIfNoSpace just means that the client won't wait forever on a
> send. The only reason you're not seeing OOMs when you used it is because
> you're not retrying when you catch it.
>
> Thanks,
> Thiago.
>
> Ps.: I think this is my first message here. :)
>

Re: Producer Flow Control active but server still facing OOM issues

Posted by Thiago Veronezi <th...@veronezi.org>.

Hi Tim,

Thanks for your time. I managed to make the Broker very stable after moving
away from KahaDB and Limiting the number of active connections at one time.
This is what I think happened.

* KahaDB competes with the broker for JVM resources. It does not matter how
much of memory I reserve with "-Xmx", KahaDB eats it all up when dealing
with the 1.000.000 messages.
* My client code was wrong. I was creating connections in parallel like
crazy.

Just after sending the last message, I've got a OOM exception, even with
the sendFailIfNoSpace activated. The "maximumConnections=1000" was too much
for my 512MB, plus KahaDB was using almost all of it.

I need to check again the case where the memory goes back to normal but
activemq keeps denying new connections for some time. It was probably
something wrong with my client code. I will check this out as soon as I
can. I will post the news here.

The good news is that the AMQ+Postgres combination handles the 1.000.000
persistent messages like a charm.

[]s,
Thiago.



On Fri, Feb 16, 2018 at 12:12 PM, Tim Bain <tb...@alumni.duke.edu> wrote:

> This is only a partial answer (I'll try to get time this weekend to answer
> the parts I don't have time for now), but I want to get you something to
> start with.
>
> On Feb 15, 2018 5:03 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:
>
> Hi, ActiveMQ community,
>
> I'm actively working on a documentation for "out of memory" protection on
> ActiveMQ. Recently I was working on this POC project where I stressed a
> default broker configuration with 1.000.000 messages with 20KB payload
> each, where each message took 1 second to be consumed. It caused the
> "Pending Messages" numbers go up pretty fast.
>
>
> Are these persistent or non-persistent messages? How large (capacity) is
> your persistent store and your temp store?
>
> My understanding is that AMQ, out of the box, has the "Producer Flow
> Control" feature activated for all Topics and Queues; and it has
> "usedMemory" threshold set as 70% of 512MB.
>
>
> Did PFC kick in? You'd see it in the broker's logs.
>
> Still, with the load I used, I
> saw OOM issues. The 1.000.000 messages actually killed the server.
>
> In my tests, I use several threads and nodes to send all the 1.000.000
> messages in parallel. That means I have several connections to the broker.
> Once I used the sendFailIfNoSpace="true" option, the OOM issues ceased; The
> consumers were able to catch up, And the broker survived. One thing that I
> noticed is that even when the "Pending messages" number reached 0, it took
> some time for the server to allow new producer connections again.
>
>
> When it didn't allow new producer connections, what was the symptom?
>
> Questions:
>
> * Is it possible that AMQ doesn't count the memory used by each active
> connection as variable to the final used memory calculation?
>
>
> Yes. Those limits are solely on the memory message store (used for
> non-persistent messages and for paging in persistent messages from the
> persistent store), so it's possible to OOM even though you don't exceed
> those limits.
>
> * Is there any configuration where we set a refresh rate so the server
> notices faster when the memory is below the maximum threshold again?
>
>
> To the best of my knowledge, the metrics are captured instantaneously by
> modifying an object in memory, not via a periodic poll, so I think
> something else is going on. I'll come back to this.
>
> * Is the use of sendFailIfNoSpace="true" the ultimate solution for OOM
> issues? Is this something I can advise a customer to use so he is 99.9%
> guaranteed to not have OOM crashes?
>
>
> No. SendFailIfNoSpace just means that the client won't wait forever on a
> send. The only reason you're not seeing OOMs when you used it is because
> you're not retrying when you catch it.
>
> Thanks,
> Thiago.
>
> Ps.: I think this is my first message here. :)
>

Re: Producer Flow Control active but server still facing OOM issues

Posted by Tim Bain <tb...@alumni.duke.edu>.

This is only a partial answer (I'll try to get time this weekend to answer
the parts I don't have time for now), but I want to get you something to
start with.

On Feb 15, 2018 5:03 AM, "Thiago Veronezi" <th...@veronezi.org> wrote:

Hi, ActiveMQ community,

I'm actively working on a documentation for "out of memory" protection on
ActiveMQ. Recently I was working on this POC project where I stressed a
default broker configuration with 1.000.000 messages with 20KB payload
each, where each message took 1 second to be consumed. It caused the
"Pending Messages" numbers go up pretty fast.


Are these persistent or non-persistent messages? How large (capacity) is
your persistent store and your temp store?

My understanding is that AMQ, out of the box, has the "Producer Flow
Control" feature activated for all Topics and Queues; and it has
"usedMemory" threshold set as 70% of 512MB.


Did PFC kick in? You'd see it in the broker's logs.

Still, with the load I used, I
saw OOM issues. The 1.000.000 messages actually killed the server.

In my tests, I use several threads and nodes to send all the 1.000.000
messages in parallel. That means I have several connections to the broker.
Once I used the sendFailIfNoSpace="true" option, the OOM issues ceased; The
consumers were able to catch up, And the broker survived. One thing that I
noticed is that even when the "Pending messages" number reached 0, it took
some time for the server to allow new producer connections again.


When it didn't allow new producer connections, what was the symptom?

Questions:

* Is it possible that AMQ doesn't count the memory used by each active
connection as variable to the final used memory calculation?


Yes. Those limits are solely on the memory message store (used for
non-persistent messages and for paging in persistent messages from the
persistent store), so it's possible to OOM even though you don't exceed
those limits.

* Is there any configuration where we set a refresh rate so the server
notices faster when the memory is below the maximum threshold again?


To the best of my knowledge, the metrics are captured instantaneously by
modifying an object in memory, not via a periodic poll, so I think
something else is going on. I'll come back to this.

* Is the use of sendFailIfNoSpace="true" the ultimate solution for OOM
issues? Is this something I can advise a customer to use so he is 99.9%
guaranteed to not have OOM crashes?


No. SendFailIfNoSpace just means that the client won't wait forever on a
send. The only reason you're not seeing OOMs when you used it is because
you're not retrying when you catch it.

Thanks,
Thiago.

Ps.: I think this is my first message here. :)