You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by neon18 <ne...@nngo.net> on 2018/01/05 16:03:00 UTC

Async Writer Thread Shutdown error adding a message on 5.15.2 broker

We recently upgraded to ActiveMQ server 5.15.2 from 5.12.2 and when putting
some messages onto a Queue, we get the following error

ERROR | Caught an Exception adding a message: ActiveMQObjectMessage
{commandId = 2705, ...} first to FilePendingMessageCursor |
org.apache.acivemq.broker.region.cursors.FIlePendingMEssageCursor | ActiveMQ
Transport: tcp://....
java.io.IOException: Async Writer Thread Shutdown
  at
org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.enqueue(DataFileAppender.java:164)[activemq-kahadb-store-5.15.2.jar:5.15.2]
  at
org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.storeItem(DataFileAppender.java:129)[activemq-kahadb-store-5.15.2.jar:5.15.2]
  at
org.apache.activemq.store.kahadb.disk.journal.Journal.write(Journal.java:958)[activemq-kahadb-store-5.15.2.jar:5.15.2]
  at
org.apache.activemq.store.kahadb.plist.PListStoreImpl.write(PListStoreImpl.java:444)[activemq-kahadb-store-5.15.2.jar:5.15.2]
  at
org.apache.activemq.store.kahadb.plist.PListImpl.write(PListImpl.java:105)[activemq-kahadb-store-5.15.2.jar:5.15.2]
  at
org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.tryAddMessageLast(FilePendingMessageCursor.java:237)[activemq-broker-5.15.2.jar:5.15.2]
  at
org.apache.activemq.broker.region.cursors.StoreQueueCursor.tryAddMessageLast(StoreQueueCursor.java:100)[activemq-broker-5.15.2.jar:5.15.2]
  at
org.apache.activemq.broker.region.Queue.tryCursorAdd(Queue.java:1868)[activemq-broker-5.15.2.jar:5.15.2]
...

When we were using ActiveMQ 5.12.2 server we did not have this issue.

Any ideas on why this is happening (there are lots of the above errors in
the activemq.log) and how to resolve it?

Eventually, the 5.15.2 server logs a WARN | Transport Connection to
tcp://... failed java.io.Exception: Unexpected error occurred:
java.lang.OutOfMemoryError: Java heap space |
org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
Transport: tcp:///...

The client that submits to the queue is still using the
activemq-client-5.12.2.jar (do we need to also upgrade the client to use the
matching version as the server?

Regards,

Neon






--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by neon18 <ne...@nngo.net>.
I checked with IT guys and the disk I/O is not the issue, it can handle the
data rate just fine.

I did not have time in finding a way to slow the producers.

Building from 5.15.x/3 branch from source is not an option at this time. I
also believe [AMQ-6815 KahaDB checkpoint needs to fail fast in the event of
journal write failure] is not the culprit, althrough it might explain some
of the other errors that I see in the activemq.log
https://issues.apache.org/jira/browse/AMQ-6815

Does ActiveMQ baseline have any load test that maybe we can simulate this
scenario: e.g. send decent amount of messages to a non-persistent queue
(with a relatively small JVM max heap) to test this under ActiveMQ 5.14.x vs
5.15.x? That would be better to have some type of regression test for this
scenario inside ActiveMQ source code.

I have more info from another experiment I did today. I changed
systemUsage.memoryUsage to 1%. I see an INFO Usage Manager Memory Limit ..
reached on queue://... Producers will be throttled to the rate at which
messages are removed from this destination to prevent flooding it... a
couple times. Then ~200ms afterward, the tmp_storage initialized INFO log.
On the web console, I see Memory percent used to be 99% and 100% and Temp
percent used 3%. Eventually, the broker starting having errors in the Async
Writer Thread Shutdown. 

I did a similar test under ActiveMQ 5.14.5 broker (memoryUsage is at default
70%), and here the Memory used says under 70%, of particular note is that
the 5.14.5 broker Memory percent used drops alot, from 70% to 25% when it
flushes non-persistent message to the Temp storage (I can see Temp used
increase). I recall a similar scenario running under ActiveMQ 5.12.2, where
the Memory used decreases dramatically when it flushes non-persistent
message to the Temp storage on disk.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
On Jan 10, 2018 12:42 PM, "neon18" <ne...@nngo.net> wrote:

We run the broker with max heap of 4G and initial of 1G (-Xmx4G -Xms1G).
We use non-persistent messages on these particular queues (3 of them in this
test).
The number of messages sent to the broker in my last "flood gate" test was
around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this flood of
messages, the producers send messages at a much much lower rate. I have
pretty much the factory default activemq.xml with
systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on these 3
queues.


Thanks for this info. 4GB max heap is fine, I wanted to make sure you
weren't running with something too small like 256MB.

8k msgs/min is 125/sec. How big are they? And what storage type (local
SCSI-attached SSD, NFS share, AWS EBS volume, whatever) are you using for
the data directory, and what is its throughput capacity? I was originally
thinking you filled 4GB of RAM in 6 seconds instead of 6 minutes, and I was
thinking that your storage solution might not be capable of accepting over
5Gbps. Even so, you describe running through 2GB of RAM in 20 seconds
(under the 50% scenario) and 3.2GB in 40 seconds (under the 20% scenario),
which both equate to over 500Mbps. So I'd like to pull this thread to make
sure your storage really can accept writes at the rate you're asking it to
do them.

So I have seen two different scenarios when lots of non-persistent messages
are put on queue.
1. Async Writer Thread Shutdown errors (with no prior warnings/errors), then
OutOfMemoryErrors
2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later WARN:
IOException: OutOfMemoryError: GC overhead limit exceeded ... ActiveMQ
Transport: tcp:..., then that repeats and other errors follow. Also there is
no warnings/errors prior to the tmp_storage init info log msg. FYI: the web
console was responsive until I saw the tmp_storage initialized (KahaDB) INFO
msg (~4.5 minutes into my test), then it stops responding. The last count of
messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
broker. Under the 5.14.5 broker, I was able to see the flood of ~40K msgs
added to the 3 queues in ~6 minutes.

In more controlled testing in the past 2 days, where I clear the AMQ_DATA
dir before each test run, I have not seen issue #1 (Async Writer Thread
Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
seconds after KahaDB tmp_storage is initialized, then the web console stop
responding and lots of OoMerrors and other errors in the activemq.log.


For now, let's focus on issue #2 and hope that #1 was somehow a secondary
failure caused by #2 (so fixing #2 fixes #1). Once we have #2 fixed, we can
address #1 if it reoccurs.

Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
OutOfMemoryErrors with this same load or higher load vs running under
ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might be an
issue with throttling the producers of the queue when the JVM hit's the
configured memoryUsage (default of 70%).

Running on that thought, I did another test with
systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing (except that
the OoM error occurs about 20 seconds after the tmp_storage init info log.

So, I ran the test again with systemUsage/memoryUsage to 20%, same thing,
except the OoM error occurs about 40 seconds after the tmp_storage init info
log. This time, I also monitored the memory percent used and temp memory
percent used via the web console. right after I see the tmp_storage init
info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later memUsed=56%
tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh failed
and of course in the activemq.log I see the OutOfMemoryErrors and other
warnings and errors appearing in the log.


If you send your messages slower but the same total number, does everything
work as expected?

Also, I grepped my old logs for "Journal failed" and did see some results,
but they happen after the a few OutOfMemoryErrors, so I did not include them
in this thread.


If the OOME is thrown from within that code, those could even be caused by
the OOME. As long as the OOME happened first, we can ignore them as you've
suggested. Just be aware that they indicate that the writer thread is dead
(and they caise it to kill itself, IIRC), so if they happen first then they
could explain why it's not running, so watch for that.

I can pretty reliably recreate the problem in about 6 minutes (with a clean
amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under 5.14.5 or
5.12.2 brokers.


Is building the 5.15.3 branch from source to see if the fix Gary referred
an option? I'm not proposing you use that in operations, just that you try
it in a test environment to see if it fixes the problem.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
Great, thanks for confirming that, and kudos to Gary for recognizing the
root cause.

Tim

On Feb 26, 2018 2:34 PM, "neon18" <ne...@nngo.net> wrote:

Sorry for the delay, I finally found some time to try it out. I ran test
with
ActiveMQ 5.15.3 broker and that seems to resolve the OutOfMemoryError (OOME)
issue when a whole bunch of messages are put on the non-persistent queue
that exists in ActiveMQ 5.15.2 but not under ActiveMQ 5.14.5.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by neon18 <ne...@nngo.net>.
Sorry for the delay, I finally found some time to try it out. I ran test with
ActiveMQ 5.15.3 broker and that seems to resolve the OutOfMemoryError (OOME)
issue when a whole bunch of messages are put on the non-persistent queue
that exists in ActiveMQ 5.15.2 but not under ActiveMQ 5.14.5.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
Neon,

Since 5.15.3 was just released and you'd no longer need to compile from
source, would it be possible to upgrade to it to see if Gary's hunch about
the root cause of the problem was correct and the problem is fixed in
5.15.3?

Tim

On Tue, Jan 16, 2018 at 4:02 AM, Gary Tully <ga...@gmail.com> wrote:

> Hi TIm,
> the regression caused references to batches of serialised messages destined
> for the journal to be retained by the page cache, potentially leading to an
> OOM.
> The temp store used to cursor non persistent messages uses the same journal
> logic and would be effected in the same way.
> The trigger for my hunch was that it only ocurred on 5.12.x
>
> On Fri, 12 Jan 2018 at 13:38 Tim Bain <tb...@alumni.duke.edu> wrote:
>
> > Gary,
> >
> > Can you elaborate on what about this situation looks like that problem?
> > Because based on the information given so far, it looked to me like the
> OP
> > was simply overdriving the storage to which the temp files are written.
> >
> > Tim
> >
> > On Jan 12, 2018 4:04 AM, "Gary Tully" <ga...@gmail.com> wrote:
> >
> > > this looks like an instance of the regression from
> > > https://issues.apache.org/jira/browse/AMQ-6815
> > >
> > > On Wed, 10 Jan 2018 at 19:42 neon18 <ne...@nngo.net> wrote:
> > >
> > > > We run the broker with max heap of 4G and initial of 1G (-Xmx4G
> > -Xms1G).
> > > > We use non-persistent messages on these particular queues (3 of them
> in
> > > > this
> > > > test).
> > > > The number of messages sent to the broker in my last "flood gate"
> test
> > > was
> > > > around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this
> flood
> > > of
> > > > messages, the producers send messages at a much much lower rate. I
> have
> > > > pretty much the factory default activemq.xml with
> > > > systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on
> > > these 3
> > > > queues.
> > > >
> > > > So I have seen two different scenarios when lots of non-persistent
> > > messages
> > > > are put on queue.
> > > > 1. Async Writer Thread Shutdown errors (with no prior
> warnings/errors),
> > > > then
> > > > OutOfMemoryErrors
> > > > 2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later
> WARN:
> > > > IOException: OutOfMemoryError: GC overhead limit exceeded ...
> ActiveMQ
> > > > Transport: tcp:..., then that repeats and other errors follow. Also
> > there
> > > > is
> > > > no warnings/errors prior to the tmp_storage init info log msg. FYI:
> the
> > > web
> > > > console was responsive until I saw the tmp_storage initialized
> (KahaDB)
> > > > INFO
> > > > msg (~4.5 minutes into my test), then it stops responding. The last
> > count
> > > > of
> > > > messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
> > > > broker. Under the 5.14.5 broker, I was able to see the flood of ~40K
> > msgs
> > > > added to the 3 queues in ~6 minutes.
> > > >
> > > > In more controlled testing in the past 2 days, where I clear the
> > AMQ_DATA
> > > > dir before each test run, I have not seen issue #1 (Async Writer
> Thread
> > > > Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
> > > > seconds after KahaDB tmp_storage is initialized, then the web console
> > > stop
> > > > responding and lots of OoMerrors and other errors in the
> activemq.log.
> > > >
> > > > Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
> > > > OutOfMemoryErrors with this same load or higher load vs running under
> > > > ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might
> > be
> > > an
> > > > issue with throttling the producers of the queue when the JVM hit's
> the
> > > > configured memoryUsage (default of 70%).
> > > >
> > > > Running on that thought, I did another test with
> > > > systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing
> (except
> > > that
> > > > the OoM error occurs about 20 seconds after the tmp_storage init info
> > > log.
> > > >
> > > > So, I ran the test again with systemUsage/memoryUsage to 20%, same
> > thing,
> > > > except the OoM error occurs about 40 seconds after the tmp_storage
> init
> > > > info
> > > > log. This time, I also monitored the memory percent used and temp
> > memory
> > > > percent used via the web console. right after I see the tmp_storage
> > init
> > > > info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later
> > > memUsed=56%
> > > > tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh
> > > failed
> > > > and of course in the activemq.log I see the OutOfMemoryErrors and
> other
> > > > warnings and errors appearing in the log.
> > > >
> > > > Also, I grepped my old logs for "Journal failed" and did see some
> > > results,
> > > > but they happen after the a few OutOfMemoryErrors, so I did not
> include
> > > > them
> > > > in this thread.
> > > >
> > > > I can pretty reliably recreate the problem in about 6 minutes (with a
> > > clean
> > > > amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under
> 5.14.5
> > > or
> > > > 5.12.2 brokers.
> > > >
> > > > Regards,
> > > >
> > > > Neon
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from:
> > > > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
> > > >
> > >
> >
>

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Gary Tully <ga...@gmail.com>.
Hi TIm,
the regression caused references to batches of serialised messages destined
for the journal to be retained by the page cache, potentially leading to an
OOM.
The temp store used to cursor non persistent messages uses the same journal
logic and would be effected in the same way.
The trigger for my hunch was that it only ocurred on 5.12.x

On Fri, 12 Jan 2018 at 13:38 Tim Bain <tb...@alumni.duke.edu> wrote:

> Gary,
>
> Can you elaborate on what about this situation looks like that problem?
> Because based on the information given so far, it looked to me like the OP
> was simply overdriving the storage to which the temp files are written.
>
> Tim
>
> On Jan 12, 2018 4:04 AM, "Gary Tully" <ga...@gmail.com> wrote:
>
> > this looks like an instance of the regression from
> > https://issues.apache.org/jira/browse/AMQ-6815
> >
> > On Wed, 10 Jan 2018 at 19:42 neon18 <ne...@nngo.net> wrote:
> >
> > > We run the broker with max heap of 4G and initial of 1G (-Xmx4G
> -Xms1G).
> > > We use non-persistent messages on these particular queues (3 of them in
> > > this
> > > test).
> > > The number of messages sent to the broker in my last "flood gate" test
> > was
> > > around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this flood
> > of
> > > messages, the producers send messages at a much much lower rate. I have
> > > pretty much the factory default activemq.xml with
> > > systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on
> > these 3
> > > queues.
> > >
> > > So I have seen two different scenarios when lots of non-persistent
> > messages
> > > are put on queue.
> > > 1. Async Writer Thread Shutdown errors (with no prior warnings/errors),
> > > then
> > > OutOfMemoryErrors
> > > 2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later WARN:
> > > IOException: OutOfMemoryError: GC overhead limit exceeded ... ActiveMQ
> > > Transport: tcp:..., then that repeats and other errors follow. Also
> there
> > > is
> > > no warnings/errors prior to the tmp_storage init info log msg. FYI: the
> > web
> > > console was responsive until I saw the tmp_storage initialized (KahaDB)
> > > INFO
> > > msg (~4.5 minutes into my test), then it stops responding. The last
> count
> > > of
> > > messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
> > > broker. Under the 5.14.5 broker, I was able to see the flood of ~40K
> msgs
> > > added to the 3 queues in ~6 minutes.
> > >
> > > In more controlled testing in the past 2 days, where I clear the
> AMQ_DATA
> > > dir before each test run, I have not seen issue #1 (Async Writer Thread
> > > Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
> > > seconds after KahaDB tmp_storage is initialized, then the web console
> > stop
> > > responding and lots of OoMerrors and other errors in the activemq.log.
> > >
> > > Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
> > > OutOfMemoryErrors with this same load or higher load vs running under
> > > ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might
> be
> > an
> > > issue with throttling the producers of the queue when the JVM hit's the
> > > configured memoryUsage (default of 70%).
> > >
> > > Running on that thought, I did another test with
> > > systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing (except
> > that
> > > the OoM error occurs about 20 seconds after the tmp_storage init info
> > log.
> > >
> > > So, I ran the test again with systemUsage/memoryUsage to 20%, same
> thing,
> > > except the OoM error occurs about 40 seconds after the tmp_storage init
> > > info
> > > log. This time, I also monitored the memory percent used and temp
> memory
> > > percent used via the web console. right after I see the tmp_storage
> init
> > > info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later
> > memUsed=56%
> > > tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh
> > failed
> > > and of course in the activemq.log I see the OutOfMemoryErrors and other
> > > warnings and errors appearing in the log.
> > >
> > > Also, I grepped my old logs for "Journal failed" and did see some
> > results,
> > > but they happen after the a few OutOfMemoryErrors, so I did not include
> > > them
> > > in this thread.
> > >
> > > I can pretty reliably recreate the problem in about 6 minutes (with a
> > clean
> > > amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under 5.14.5
> > or
> > > 5.12.2 brokers.
> > >
> > > Regards,
> > >
> > > Neon
> > >
> > >
> > >
> > > --
> > > Sent from:
> > > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
> > >
> >
>

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
Gary,

Can you elaborate on what about this situation looks like that problem?
Because based on the information given so far, it looked to me like the OP
was simply overdriving the storage to which the temp files are written.

Tim

On Jan 12, 2018 4:04 AM, "Gary Tully" <ga...@gmail.com> wrote:

> this looks like an instance of the regression from
> https://issues.apache.org/jira/browse/AMQ-6815
>
> On Wed, 10 Jan 2018 at 19:42 neon18 <ne...@nngo.net> wrote:
>
> > We run the broker with max heap of 4G and initial of 1G (-Xmx4G -Xms1G).
> > We use non-persistent messages on these particular queues (3 of them in
> > this
> > test).
> > The number of messages sent to the broker in my last "flood gate" test
> was
> > around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this flood
> of
> > messages, the producers send messages at a much much lower rate. I have
> > pretty much the factory default activemq.xml with
> > systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on
> these 3
> > queues.
> >
> > So I have seen two different scenarios when lots of non-persistent
> messages
> > are put on queue.
> > 1. Async Writer Thread Shutdown errors (with no prior warnings/errors),
> > then
> > OutOfMemoryErrors
> > 2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later WARN:
> > IOException: OutOfMemoryError: GC overhead limit exceeded ... ActiveMQ
> > Transport: tcp:..., then that repeats and other errors follow. Also there
> > is
> > no warnings/errors prior to the tmp_storage init info log msg. FYI: the
> web
> > console was responsive until I saw the tmp_storage initialized (KahaDB)
> > INFO
> > msg (~4.5 minutes into my test), then it stops responding. The last count
> > of
> > messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
> > broker. Under the 5.14.5 broker, I was able to see the flood of ~40K msgs
> > added to the 3 queues in ~6 minutes.
> >
> > In more controlled testing in the past 2 days, where I clear the AMQ_DATA
> > dir before each test run, I have not seen issue #1 (Async Writer Thread
> > Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
> > seconds after KahaDB tmp_storage is initialized, then the web console
> stop
> > responding and lots of OoMerrors and other errors in the activemq.log.
> >
> > Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
> > OutOfMemoryErrors with this same load or higher load vs running under
> > ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might be
> an
> > issue with throttling the producers of the queue when the JVM hit's the
> > configured memoryUsage (default of 70%).
> >
> > Running on that thought, I did another test with
> > systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing (except
> that
> > the OoM error occurs about 20 seconds after the tmp_storage init info
> log.
> >
> > So, I ran the test again with systemUsage/memoryUsage to 20%, same thing,
> > except the OoM error occurs about 40 seconds after the tmp_storage init
> > info
> > log. This time, I also monitored the memory percent used and temp memory
> > percent used via the web console. right after I see the tmp_storage init
> > info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later
> memUsed=56%
> > tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh
> failed
> > and of course in the activemq.log I see the OutOfMemoryErrors and other
> > warnings and errors appearing in the log.
> >
> > Also, I grepped my old logs for "Journal failed" and did see some
> results,
> > but they happen after the a few OutOfMemoryErrors, so I did not include
> > them
> > in this thread.
> >
> > I can pretty reliably recreate the problem in about 6 minutes (with a
> clean
> > amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under 5.14.5
> or
> > 5.12.2 brokers.
> >
> > Regards,
> >
> > Neon
> >
> >
> >
> > --
> > Sent from:
> > http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
> >
>

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Gary Tully <ga...@gmail.com>.
this looks like an instance of the regression from
https://issues.apache.org/jira/browse/AMQ-6815

On Wed, 10 Jan 2018 at 19:42 neon18 <ne...@nngo.net> wrote:

> We run the broker with max heap of 4G and initial of 1G (-Xmx4G -Xms1G).
> We use non-persistent messages on these particular queues (3 of them in
> this
> test).
> The number of messages sent to the broker in my last "flood gate" test was
> around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this flood of
> messages, the producers send messages at a much much lower rate. I have
> pretty much the factory default activemq.xml with
> systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on these 3
> queues.
>
> So I have seen two different scenarios when lots of non-persistent messages
> are put on queue.
> 1. Async Writer Thread Shutdown errors (with no prior warnings/errors),
> then
> OutOfMemoryErrors
> 2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later WARN:
> IOException: OutOfMemoryError: GC overhead limit exceeded ... ActiveMQ
> Transport: tcp:..., then that repeats and other errors follow. Also there
> is
> no warnings/errors prior to the tmp_storage init info log msg. FYI: the web
> console was responsive until I saw the tmp_storage initialized (KahaDB)
> INFO
> msg (~4.5 minutes into my test), then it stops responding. The last count
> of
> messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
> broker. Under the 5.14.5 broker, I was able to see the flood of ~40K msgs
> added to the 3 queues in ~6 minutes.
>
> In more controlled testing in the past 2 days, where I clear the AMQ_DATA
> dir before each test run, I have not seen issue #1 (Async Writer Thread
> Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
> seconds after KahaDB tmp_storage is initialized, then the web console stop
> responding and lots of OoMerrors and other errors in the activemq.log.
>
> Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
> OutOfMemoryErrors with this same load or higher load vs running under
> ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might be an
> issue with throttling the producers of the queue when the JVM hit's the
> configured memoryUsage (default of 70%).
>
> Running on that thought, I did another test with
> systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing (except that
> the OoM error occurs about 20 seconds after the tmp_storage init info log.
>
> So, I ran the test again with systemUsage/memoryUsage to 20%, same thing,
> except the OoM error occurs about 40 seconds after the tmp_storage init
> info
> log. This time, I also monitored the memory percent used and temp memory
> percent used via the web console. right after I see the tmp_storage init
> info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later memUsed=56%
> tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh failed
> and of course in the activemq.log I see the OutOfMemoryErrors and other
> warnings and errors appearing in the log.
>
> Also, I grepped my old logs for "Journal failed" and did see some results,
> but they happen after the a few OutOfMemoryErrors, so I did not include
> them
> in this thread.
>
> I can pretty reliably recreate the problem in about 6 minutes (with a clean
> amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under 5.14.5 or
> 5.12.2 brokers.
>
> Regards,
>
> Neon
>
>
>
> --
> Sent from:
> http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
>

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by neon18 <ne...@nngo.net>.
We run the broker with max heap of 4G and initial of 1G (-Xmx4G -Xms1G).
We use non-persistent messages on these particular queues (3 of them in this
test).
The number of messages sent to the broker in my last "flood gate" test was
around 40,000 (40k) in 5 minutes or about 8K msgs/min. After this flood of
messages, the producers send messages at a much much lower rate. I have
pretty much the factory default activemq.xml with
systemUsage/memoryUsage/percentOfJvmHeap=70 and queuePrefetch=20 on these 3
queues.

So I have seen two different scenarios when lots of non-persistent messages
are put on queue.
1. Async Writer Thread Shutdown errors (with no prior warnings/errors), then
OutOfMemoryErrors
2. INFO PListStore: ...tmp_storage initialized, ~10 seconds later WARN:
IOException: OutOfMemoryError: GC overhead limit exceeded ... ActiveMQ
Transport: tcp:..., then that repeats and other errors follow. Also there is
no warnings/errors prior to the tmp_storage init info log msg. FYI: the web
console was responsive until I saw the tmp_storage initialized (KahaDB) INFO
msg (~4.5 minutes into my test), then it stops responding. The last count of
messages on queues via web console is ~30K msgs under ActiveMQ 5.15.2
broker. Under the 5.14.5 broker, I was able to see the flood of ~40K msgs
added to the 3 queues in ~6 minutes.

In more controlled testing in the past 2 days, where I clear the AMQ_DATA
dir before each test run, I have not seen issue #1 (Async Writer Thread
Shutdown / OutOfMemoryError). I see issue #2 (OutOfMemoryError) a few
seconds after KahaDB tmp_storage is initialized, then the web console stop
responding and lots of OoMerrors and other errors in the activemq.log.

Running with ActiveMQ 5.14.5 and 5.12.2 brokers, we do not have any
OutOfMemoryErrors with this same load or higher load vs running under
ActiveMQ 5.15.2. Running with 5.15.2 broker it seems like there might be an
issue with throttling the producers of the queue when the JVM hit's the
configured memoryUsage (default of 70%).

Running on that thought, I did another test with
systemUsage/memoryUsage/percentOfJvmHeap=50 %, but same thing (except that
the OoM error occurs about 20 seconds after the tmp_storage init info log.

So, I ran the test again with systemUsage/memoryUsage to 20%, same thing,
except the OoM error occurs about 40 seconds after the tmp_storage init info
log. This time, I also monitored the memory percent used and temp memory
percent used via the web console. right after I see the tmp_storage init
info log, I can see memUsed=39% tempUsed=1%, ~10 seconds later memUsed=56%
tempUsed=2%, ~10 seconds later memUsed=69% tempUsed=2%, next refresh failed
and of course in the activemq.log I see the OutOfMemoryErrors and other
warnings and errors appearing in the log.

Also, I grepped my old logs for "Journal failed" and did see some results,
but they happen after the a few OutOfMemoryErrors, so I did not include them
in this thread.

I can pretty reliably recreate the problem in about 6 minutes (with a clean
amq_data_dir) running ActiveMQ 5.15.2 broker and no issues under 5.14.5 or
5.12.2 brokers.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
Hmmm, I think I may have misinterpreted a few of the things you've written,
so I want to take a step back and recalibrate my understanding of the
behavior you're seeing.

On Tue, Jan 9, 2018 at 1:04 PM, neon18 <ne...@nngo.net> wrote:

> > Do you get the "thread shut down" message on the first message enqueued,
> or
> > only after a certain number of messages are sent?
> I believe, after a large number of of messages have been enqueued, is when
> it enters the OutOfMemoryError state. In another test run, I did not see
> the
> "Async Writer Thread Shutdown error", just an OutOfMemoryError.
> i.e. I see an INFO PListStore:[.../data/localhost/tmp_storage] initialized
> ... | ActiveMQ BrokerService.worker.1 message
> after about 15 seconds (with no longs in between), I see a WARN Trasnport
> Conection to: tcp://... failed: java.io.IOException: Unexpected error
> occurred: java.lang.OutOfMemoryError: Java heap space |
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
> Transport: tcp:///...
>

So I thought I had understood that you started sending messages, and at
some point you started seeing "thread shutdown" messages in the logs, and
only after a large number of those, you finally saw an OutOfMemoryError.
That series of events seems like a straightforward problem: the thread that
would be relieving memory pressure isn't running, so it doesn't relieve
that pressure, and eventually you run out of memory. Simple,
straightforward, and the only question is why the thread isn't running when
the time comes.

But the description you just laid out 1) says that the "thread shutdown"
message doesn't necessarily occur at all under some scenarios, and 2)
doesn't clearly state the ordering of when the OOM occurs relative to the
"thread shutdown" error when they do both occur, which calls into question
my understanding that the "thread shutdown" messages were the cause of the
OOM and not its result. So first, can you please clarify the timing of the
OOM vs. the "thread shutdown" errors when they both occur? And second, we
may have to look at other possibilities for why you're running out of
memory; I'll try to poke at some of those further down in this response.


> So in this scenario, maybe it is taking longer to move the in-memory
> message
> to the temp storage as too many messages are coming in under 5.15 broker
> where as the 5.14 and 5.12 broker's might be "blocking" or throttling
> incoming messages to queue to allow for messages to be moved to temp
> storage?
>

I don't see any obvious new code paths that explicitly remove any
throttling, but I don't claim that the quick look I took this evening is
fully complete.

I do see a code path that's new in 5.15.x that would allow the thread to be
shut down if another error (either an OOM or something else) occurred while
writing messages to disk, but that code path would result in a WARN message
in the logs that starts with "Journal failed while writing at:" and your
earlier response said that you saw no other errors in the logs. So if what
you wrote earlier is accurate, I don't think this new code path could
explain what you're seeing.


> > The temp store does indeed use KahaDB as its store type. Do you see
> KahaDB
> > data files (with a .log extension) in that directory?
> Yes
> > Are they recreated if you stop the broker, delete them, and start the
> > broker? And are any bytes
> > written to them after the broker's initial startup?
> I have not tried this, should the message producer for the queue be on and
> putting stuff into the it?
>

My goal was simply to confirm that bytes were being written to the files
when messages were being enqueued, eliminating any bytes that might have
been written as part of the process of creating those files with no
messages in them, since there might be a small amount of overhead bytes in
an "empty" file. So I'd suggest you do the steps I listed without any
messages being enqueued, give the broker a few seconds (or even minutes,
why not) to let everything stabilize, then start producing messages, and
see whether bytes are eventually written to the files after that point.


> > It seems as though either the thread that dumps messages out of memory
> and
> > into the temp store isn't started at all, or is dying at some point, and
> > it
> > would be useful to know which it is. Can you attach JConsole to the
> broker
> > process as soon as possible after it starts (before sending any messages)
> > and look for the Async Writer Thread in the list of threads? If it's
> > there,
> > then try to figure out when it dies (e.g. after the first message is
> > sent).
> When I startup broker w/o any producer, there is no "Async Writer Thread",
> only an "ActiveMQ Data File Writer" thread
>

That's the one you want. When I wrote my previous response I didn't have
access to the source code, so I didn't have a way to confirm the exact name
of the thread, but what you quoted is the thread in question.


> After about 100 messages send to the non-persistent queue (the messages are
> all still in memory), there is still no "Async Writer Thread", , only an
> "ActiveMQ Data File Writer" thread and all is fine (e.g. consumers can
> de-queue messages).
> Only when I open the flood gates and push lots of messages on the
> non-persistent queue does the broker crumble with the OutOfMemoryError then
> dies in the 5.15 broker.
>

Once you do that, at what point does the "ActiveMQ Data File Writer" thread
die? Before or after you get an OOM?

Also, you say "open the flood gates." Are you trying to push an
unreasonable rate of messages at your broker? For example, are you trying
to push messages at the broker faster than (or nearly as fast as) the disk
on which your temp store is located is capable of writing them? If you send
the same number of messages without consuming them but at half the rate, or
a tenth of the rate, does the broker handle that number of messages without
OOMing?

And I say "unreasonable" not to imply that you might not have a need to
support that messaging rate, but just to imply that the computing solution
(covering both hardware and software) that you've chosen might not be
scoped for the rates you're looking for. So the rate is unreasonable for
the choices you've made (or your coworkers have made) to date, but not
necessarily that it's not one that could be met.

Finally, can you give some rough ideas of the scale you're working at? How
big of a JVM heap are you using? How big are your messages, and how many of
them are received per second? How many messages might pile up before being
consumed? What kind of storage are you writing the temp store to, and how
fast is it? Since some of the things you've written in your most recent
response sound like this might turn out to be a performance
problem/question, please provide details about the performance of the
hardware you're using as well as the scope/scale of the workload you're
trying to run.

Tim

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by neon18 <ne...@nngo.net>.
> Do you get the "thread shut down" message on the first message enqueued, or 
> only after a certain number of messages are sent? 
I believe, after a large number of of messages have been enqueued, is when
it enters the OutOfMemoryError state. In another test run, I did not see the
"Async Writer Thread Shutdown error", just an OutOfMemoryError.
i.e. I see an INFO PListStore:[.../data/localhost/tmp_storage] initialized
... | ActiveMQ BrokerService.worker.1 message
after about 15 seconds (with no longs in between), I see a WARN Trasnport
Conection to: tcp://... failed: java.io.IOException: Unexpected error
occurred: java.lang.OutOfMemoryError: Java heap space |
org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
Transport: tcp:///...
So in this scenario, maybe it is taking longer to move the in-memory message
to the temp storage as too many messages are coming in under 5.15 broker
where as the 5.14 and 5.12 broker's might be "blocking" or throttling
incoming messages to queue to allow for messages to be moved to temp
storage? 


> Have you set any options in your activemq.xml that might change the 
> broker's behavior regarding the temp store? If you can, please post your 
> configuration. 
I am pretty much using the factory stock activemq.xml, with an increased
maximumConnections and a few slow workoff queues with queuePrefetch=1


> The temp store does indeed use KahaDB as its store type. Do you see KahaDB 
> data files (with a .log extension) in that directory?
Yes
> Are they recreated if you stop the broker, delete them, and start the
> broker? And are any bytes 
> written to them after the broker's initial startup? 
I have not tried this, should the message producer for the queue be on and
putting stuff into the it?


> It seems as though either the thread that dumps messages out of memory and 
> into the temp store isn't started at all, or is dying at some point, and
> it 
> would be useful to know which it is. Can you attach JConsole to the broker 
> process as soon as possible after it starts (before sending any messages) 
> and look for the Async Writer Thread in the list of threads? If it's
> there, 
> then try to figure out when it dies (e.g. after the first message is
> sent). 
When I startup broker w/o any producer, there is no "Async Writer Thread",
only an "ActiveMQ Data File Writer" thread
After about 100 messages send to the non-persistent queue (the messages are
all still in memory), there is still no "Async Writer Thread", , only an
"ActiveMQ Data File Writer" thread and all is fine (e.g. consumers can
de-queue messages).
Only when I open the flood gates and push lots of messages on the
non-persistent queue does the broker crumble with the OutOfMemoryError then
dies in the 5.15 broker.

Thanks for looking into this issue with me.

Sincerely,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
OK, this is a good start, thanks for working to narrow down the variables a
bit.

Do you get the "thread shut down" message on the first message enqueued, or
only after a certain number of messages are sent?

Have you set any options in your activemq.xml that might change the
broker's behavior regarding the temp store? If you can, please post your
configuration.

On Jan 8, 2018 7:57 AM, "neon18" <ne...@nngo.net> wrote:

I have tried out a few things to reduce the variables, one step at a time.

1. I upgrade the ActiveMQ client libraries to 5.15.2 to match the server.
This results in same issue (OutOfMemoryError on ActiveMQ 5.15.2 server).

2. I change the memoryUsage percentOfJvmHeap from 70% to 50%, restarted the
server but same. Issue.

3. I downgraded to ActiveMQ 5.14.5 server, and all is good (also when I
downgraded to ActiveMQ 5.12.2 - all is good)

There are not other errors in activemq.log.

I am using a non-persistent queue, but when memory hits the default the
memory usage percent of 70%, , the server *should* flush to the temp
storage. So I believe ActiveMQ 5.15 has a bug when flushing in-memory
(non-persistent) messages to the temp storage under
<amq_data_dir>/localhost/tmp_storage. I am guessing the temp storage is
using KahaDB underneath).


The temp store does indeed use KahaDB as its store type. Do you see KahaDB
data files (with a .log extension) in that directory? Are they recreated if
you stop the broker, delete them, and start the broker? And are any bytes
written to them after the broker's initial startup?

It seems as though either the thread that dumps messages out of memory and
into the temp store isn't started at all, or is dying at some point, and it
would be useful to know which it is. Can you attach JConsole to the broker
process as soon as possible after it starts (before sending any messages)
and look for the Async Writer Thread in the list of threads? If it's there,
then try to figure out when it dies (e.g. after the first message is sent).

With no messages (or with a few messages before JVM max memory), ActiveMQ
5.15.2 server is fine, it responds on the Web console, producer and consumer
clients can add to queue and dequeue messages fine. After the
OutOfMemoryError, the server (broker) no longer responds to any client and
the web console is NOT responsive.


All bets are off after an OOM in a Java process; the only safe option is to
restart the process. So that doesn't surprise me.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by neon18 <ne...@nngo.net>.
I have tried out a few things to reduce the variables, one step at a time.

1. I upgrade the ActiveMQ client libraries to 5.15.2 to match the server.
This results in same issue (OutOfMemoryError on ActiveMQ 5.15.2 server).

2. I change the memoryUsage percentOfJvmHeap from 70% to 50%, restarted the
server but same. Issue.

3. I downgraded to ActiveMQ 5.14.5 server, and all is good (also when I
downgraded to ActiveMQ 5.12.2 - all is good)

There are not other errors in activemq.log.

I am using a non-persistent queue, but when memory hits the default the
memory usage percent of 70%, , the server *should* flush to the temp
storage. So I believe ActiveMQ 5.15 has a bug when flushing in-memory
(non-persistent) messages to the temp storage under
<amq_data_dir>/localhost/tmp_storage. I am guessing the temp storage is
using KahaDB underneath).

With no messages (or with a few messages before JVM max memory), ActiveMQ
5.15.2 server is fine, it responds on the Web console, producer and consumer
clients can add to queue and dequeue messages fine. After the
OutOfMemoryError, the server (broker) no longer responds to any client and
the web console is NOT responsive.

Regards,

Neon



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Async Writer Thread Shutdown error adding a message on 5.15.2 broker

Posted by Tim Bain <tb...@alumni.duke.edu>.
I'm not aware of any incompatibility between a 5.12.2 client and a 5.15.2
broker, but we don't explicitly test compatibility so it's possible. But I
can't think of an obvious mechanism for an older client to cause a store
writer thread in the broker to exit, so a compatibility problem isn't where
I'd start looking.

The thread that's shut down writes content to disk in KahaDB. Presumably
this is a persistent message and you're using KahaDB as your persistence
store implementation. Are there any errors in the broker's logs prior to
the one you quoted, which might indicate problems with KahaDB or the writer
thread? Can you put messages on the queue via the web console? And if you
disconnect all clients and then restart the broker, can you put messages on
the queue via the web console? Let's make sure your KahaDB confirmation is
healthy with no clients at all, before introducing the extra variable of an
external producer (let alone one on an earlier version of ActiveMQ).

Tim

On Jan 5, 2018 9:03 AM, "neon18" <ne...@nngo.net> wrote:

> We recently upgraded to ActiveMQ server 5.15.2 from 5.12.2 and when putting
> some messages onto a Queue, we get the following error
>
> ERROR | Caught an Exception adding a message: ActiveMQObjectMessage
> {commandId = 2705, ...} first to FilePendingMessageCursor |
> org.apache.acivemq.broker.region.cursors.FIlePendingMEssageCursor |
> ActiveMQ
> Transport: tcp://....
> java.io.IOException: Async Writer Thread Shutdown
>   at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.enqueue(
> DataFileAppender.java:164)[activemq-kahadb-store-5.15.2.jar:5.15.2]
>   at
> org.apache.activemq.store.kahadb.disk.journal.DataFileAppender.storeItem(
> DataFileAppender.java:129)[activemq-kahadb-store-5.15.2.jar:5.15.2]
>   at
> org.apache.activemq.store.kahadb.disk.journal.Journal.
> write(Journal.java:958)[activemq-kahadb-store-5.15.2.jar:5.15.2]
>   at
> org.apache.activemq.store.kahadb.plist.PListStoreImpl.
> write(PListStoreImpl.java:444)[activemq-kahadb-store-5.15.2.jar:5.15.2]
>   at
> org.apache.activemq.store.kahadb.plist.PListImpl.write(
> PListImpl.java:105)[activemq-kahadb-store-5.15.2.jar:5.15.2]
>   at
> org.apache.activemq.broker.region.cursors.FilePendingMessageCursor.
> tryAddMessageLast(FilePendingMessageCursor.java:
> 237)[activemq-broker-5.15.2.jar:5.15.2]
>   at
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.
> tryAddMessageLast(StoreQueueCursor.java:100)[activemq-broker-5.15.2.jar:5.
> 15.2]
>   at
> org.apache.activemq.broker.region.Queue.tryCursorAdd(
> Queue.java:1868)[activemq-broker-5.15.2.jar:5.15.2]
> ...
>
> When we were using ActiveMQ 5.12.2 server we did not have this issue.
>
> Any ideas on why this is happening (there are lots of the above errors in
> the activemq.log) and how to resolve it?
>
> Eventually, the 5.15.2 server logs a WARN | Transport Connection to
> tcp://... failed java.io.Exception: Unexpected error occurred:
> java.lang.OutOfMemoryError: Java heap space |
> org.apache.activemq.broker.TransportConnection.Transport | ActiveMQ
> Transport: tcp:///...
>
> The client that submits to the queue is still using the
> activemq-client-5.12.2.jar (do we need to also upgrade the client to use
> the
> matching version as the server?
>
> Regards,
>
> Neon
>
>
>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>