You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Raffi <ra...@gmail.com> on 2017/09/26 12:51:26 UTC
Kahadb index updates taking too much time on ActiveMQ 5.11
In production with ActiveMQ 5.11 with multi-kahadb per destination on NFSv4
M|S (Linux 7.1), we're seeing the following log messages intermittently
during operations:
*Slow KahaDB access: Journal append took: 2 ms, Index update took 16901 ms*
Also, on startup, we see the following in our kahadb logs; this condition
severely impacts start up time, sometimes taking up to 90m for the broker to
start, and that's for relatively small persistent store (< 5Gb). We have not
adjusted indexCacheSize, that will be our first attempt to resolve this; we
also plan to benchmark NFSv4 file system performance. Aside from that, is
there any other broker config
2017-09-25 14:35:27,352 [AMQ-1-thread-1 ] DEBUG ListIndex
- loading
2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovering from the journal @1:76785
2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovering from the journal @1:76785
2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovery replayed 1 operations from the journal in 0.012 seconds.
2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovery replayed 1 operations from the journal in 0.012 seconds.
2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopping async queue tasks
2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopping async topic tasks
2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopped KahaDB
2017-09-25 14:35:27,855 [AMQ-1-thread-1 ] DEBUG KahaDBStore
- Store OpenWire version configured as: 6
2017-09-25 14:35:30,234 [AMQ-1-thread-1 ] TRACE Journal
- Startup took: 2372 ms
2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO MessageDatabase
- KahaDB is version 5
2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO MessageDatabase
- KahaDB is version 5
2017-09-25 14:35:30,281 [AMQ-1-thread-1 ] DEBUG ListIndex
- loading
2017-09-25 14:35:30,282 [AMQ-1-thread-1 ] DEBUG ListIndex
- loading
2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovering from the journal @1:70835
2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovering from the journal @1:70835
2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovery replayed 1 operations from the journal in 0.012 seconds.
2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovery replayed 1 operations from the journal in 0.012 seconds.
2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopping async queue tasks
2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopping async topic tasks
2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopped KahaDB
2017-09-25 14:35:30,785 [AMQ-1-thread-1 ] DEBUG KahaDBStore
- Store OpenWire version configured as: 6
2017-09-25 14:35:33,023 [AMQ-1-thread-1 ] TRACE Journal
- Startup took: 2231 ms
2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO MessageDatabase
- KahaDB is version 5
2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO MessageDatabase
- KahaDB is version 5
2017-09-25 14:35:33,065 [AMQ-1-thread-1 ] DEBUG ListIndex
- loading
2017-09-25 14:35:33,066 [AMQ-1-thread-1 ] DEBUG ListIndex
- loading
2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovering from the journal @1:55603
2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovering from the journal @1:55603
2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovery replayed 1 operations from the journal in 0.013 seconds.
2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO MessageDatabase
- Recovery replayed 1 operations from the journal in 0.013 seconds.
2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopping async queue tasks
2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopping async topic tasks
2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO KahaDBStore
- Stopped KahaDB
--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Tim Bain <tb...@alumni.duke.edu>.
I'm glad you got it working, and from the synonyms I'm not surprised that
NFS ended up being the root cause.
Tim
On Oct 19, 2017 6:24 AM, "Devlin" <rb...@ofiglobal.com> wrote:
> Thank you, Tim.
>
> We finally figured out the issue; NFS "noac" option was killing
> performance.
>
> (vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,
> proto=tcp,port=0,timeo=20,retrans=2)
>
> We joked about it afterwards, no "air conditioning" after a long summer can
> kill anything :-)
>
> Thanks again for your insight and pointers, will come in handy!
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Devlin <rb...@ofiglobal.com>.
Thank you, Tim.
We finally figured out the issue; NFS "noac" option was killing performance.
(vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,proto=tcp,port=0,timeo=20,retrans=2)
We joked about it afterwards, no "air conditioning" after a long summer can
kill anything :-)
Thanks again for your insight and pointers, will come in handy!
--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Tim Bain <tb...@alumni.duke.edu>.
Sorry for the delay in responding.
I've got a number of follow-up questions/suggestions:
1. Can you take screenshots sorted by all four non-percentage columns
(Self Time, Self Time (CPU), Total Time, Total Time (CPU))? The ones with
CPU measure CPU time, while the ones without measure total elapsed time (of
which some portion is CPU and some portion is things like sleeps and I/O
waits).
2. Can you please take a screenshot of the Monitor tab after ActiveMQ
finishes starting up? In particular I want to make sure that GC CPU usage
looks reasonable.
3. From your screenshot entitled cpu-total-time.jpg, it's clear that
there's a very significant amount of CPU time being spent calling
org.apache.karaf.management.internal.MBeanInvocationHandler.invoke(), which
is JMX-related. We need to figure out if this is relevant or is a red
herring. To do that, please find that method in the Call Tree tab (you
should be able to right-click on it in the Hot Spots tab and select
something like "Find in Call Tree") and see if you can tell what's calling
it and what it's calling and whether those calls are in any way related to
ActiveMQ. I suspect that it might actually be related to the use of
VisualVM, but we need to make sure before we write it off.
4. From your screenshot entitled cpu-self-time.jpg (which is in fact
Self Time, *not* Self Time (CPU) as you said), it's clear that there's a
significant amount of time being spent in
org.apache.activemq.store.kahadb.MessageDatabase$3.run() as well as in
various other ActiveMQ-related methods. You'll need to find those methods
in the Call Tree tab and walk down the call stack looking for where the
majority of the time (I'd focus on elapsed time rather than just CPU,
because disk I/O may be part of your problem) is being spent. You can
ignore any thread that doesn't eventually land you in something related to
KahaDB or files; not every thread will be relevant to your slow startup
time, so ignore the ones that aren't. It looks like a lot of time is being
spent in RecoverableRandomAccessFile.readInt(), readFully(), and
readByte(), so I suspect that those will be a large portion of where your
time is being spent, but you may discover that there are other places where
time is being spent that weren't visible from the screenshots you've shown
so far.
5. If the time really is primarily spent in those three read*() methods,
they simply call the equivalent methods in
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java
while wrapped in try/catch statements, so either you're really spending
that long reading the bytes from the files and doing shift-and-add
operations on the read bytes (in which case you should probably examine
both your NFS performance and your belief that you're only reading 5GB of
data) or you're spending a ton of time doing exception-handling (in which
case we may need to get a debugger in there to figure out what the
exceptions are).
I think those things will give us a good set of next steps to get closer to
determining what's actually going on.
Tim
On Wed, Oct 11, 2017 at 7:04 PM, Devlin <rb...@ofiglobal.com> wrote:
> Samples for cpu self-time, total-time, and memory usage:
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Devlin <rb...@ofiglobal.com>.
Samples for cpu self-time, total-time, and memory usage:
<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg>
<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg>
<http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg>
--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Tim Bain <tb...@alumni.duke.edu>.
I've not seen anyone describe this set of symptoms, so I don't have
anything to suggest right from the outset, but I may be able to help you
figure out what's going on.
Could you attach a CPU sampler such as JVisualVM to the broker process and
capture CPU sampling (not profiling if this is an operational broker!) data
to determine where the broker is spending its time?
On Tue, Sep 26, 2017 at 6:51 AM, Raffi <ra...@gmail.com> wrote:
>
> In production with ActiveMQ 5.11 with multi-kahadb per destination on NFSv4
> M|S (Linux 7.1), we're seeing the following log messages intermittently
> during operations:
>
> *Slow KahaDB access: Journal append took: 2 ms, Index update took 16901 ms*
>
> Also, on startup, we see the following in our kahadb logs; this condition
> severely impacts start up time, sometimes taking up to 90m for the broker
> to
> start, and that's for relatively small persistent store (< 5Gb). We have
> not
> adjusted indexCacheSize, that will be our first attempt to resolve this; we
> also plan to benchmark NFSv4 file system performance. Aside from that, is
> there any other broker config
>
> 2017-09-25 14:35:27,352 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
> 2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovering from the journal @1:76785
> 2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovering from the journal @1:76785
> 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
> 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
> 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopping async queue tasks
> 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopping async topic tasks
> 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopped KahaDB
> 2017-09-25 14:35:27,855 [AMQ-1-thread-1 ] DEBUG KahaDBStore
> - Store OpenWire version configured as: 6
> 2017-09-25 14:35:30,234 [AMQ-1-thread-1 ] TRACE Journal
> - Startup took: 2372 ms
> 2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO MessageDatabase
> - KahaDB is version 5
> 2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO MessageDatabase
> - KahaDB is version 5
> 2017-09-25 14:35:30,281 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
> 2017-09-25 14:35:30,282 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
> 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovering from the journal @1:70835
> 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovering from the journal @1:70835
> 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
> 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
> 2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopping async queue tasks
> 2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopping async topic tasks
> 2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopped KahaDB
> 2017-09-25 14:35:30,785 [AMQ-1-thread-1 ] DEBUG KahaDBStore
> - Store OpenWire version configured as: 6
> 2017-09-25 14:35:33,023 [AMQ-1-thread-1 ] TRACE Journal
> - Startup took: 2231 ms
> 2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO MessageDatabase
> - KahaDB is version 5
> 2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO MessageDatabase
> - KahaDB is version 5
> 2017-09-25 14:35:33,065 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
> 2017-09-25 14:35:33,066 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
> 2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovering from the journal @1:55603
> 2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovering from the journal @1:55603
> 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.013 seconds.
> 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.013 seconds.
> 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopping async queue tasks
> 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopping async topic tasks
> 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO KahaDBStore
> - Stopped KahaDB
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>