You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Devlin <rb...@ofiglobal.com> on 2017/10/12 01:04:50 UTC
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Samples for cpu self-time, total-time, and memory usage:
<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg>
<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg>
<http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg>
--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Tim Bain <tb...@alumni.duke.edu>.
I'm glad you got it working, and from the synonyms I'm not surprised that
NFS ended up being the root cause.
Tim
On Oct 19, 2017 6:24 AM, "Devlin" <rb...@ofiglobal.com> wrote:
> Thank you, Tim.
>
> We finally figured out the issue; NFS "noac" option was killing
> performance.
>
> (vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,
> proto=tcp,port=0,timeo=20,retrans=2)
>
> We joked about it afterwards, no "air conditioning" after a long summer can
> kill anything :-)
>
> Thanks again for your insight and pointers, will come in handy!
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Devlin <rb...@ofiglobal.com>.
Thank you, Tim.
We finally figured out the issue; NFS "noac" option was killing performance.
(vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,proto=tcp,port=0,timeo=20,retrans=2)
We joked about it afterwards, no "air conditioning" after a long summer can
kill anything :-)
Thanks again for your insight and pointers, will come in handy!
--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html
Re: Kahadb index updates taking too much time on ActiveMQ 5.11
Posted by Tim Bain <tb...@alumni.duke.edu>.
Sorry for the delay in responding.
I've got a number of follow-up questions/suggestions:
1. Can you take screenshots sorted by all four non-percentage columns
(Self Time, Self Time (CPU), Total Time, Total Time (CPU))? The ones with
CPU measure CPU time, while the ones without measure total elapsed time (of
which some portion is CPU and some portion is things like sleeps and I/O
waits).
2. Can you please take a screenshot of the Monitor tab after ActiveMQ
finishes starting up? In particular I want to make sure that GC CPU usage
looks reasonable.
3. From your screenshot entitled cpu-total-time.jpg, it's clear that
there's a very significant amount of CPU time being spent calling
org.apache.karaf.management.internal.MBeanInvocationHandler.invoke(), which
is JMX-related. We need to figure out if this is relevant or is a red
herring. To do that, please find that method in the Call Tree tab (you
should be able to right-click on it in the Hot Spots tab and select
something like "Find in Call Tree") and see if you can tell what's calling
it and what it's calling and whether those calls are in any way related to
ActiveMQ. I suspect that it might actually be related to the use of
VisualVM, but we need to make sure before we write it off.
4. From your screenshot entitled cpu-self-time.jpg (which is in fact
Self Time, *not* Self Time (CPU) as you said), it's clear that there's a
significant amount of time being spent in
org.apache.activemq.store.kahadb.MessageDatabase$3.run() as well as in
various other ActiveMQ-related methods. You'll need to find those methods
in the Call Tree tab and walk down the call stack looking for where the
majority of the time (I'd focus on elapsed time rather than just CPU,
because disk I/O may be part of your problem) is being spent. You can
ignore any thread that doesn't eventually land you in something related to
KahaDB or files; not every thread will be relevant to your slow startup
time, so ignore the ones that aren't. It looks like a lot of time is being
spent in RecoverableRandomAccessFile.readInt(), readFully(), and
readByte(), so I suspect that those will be a large portion of where your
time is being spent, but you may discover that there are other places where
time is being spent that weren't visible from the screenshots you've shown
so far.
5. If the time really is primarily spent in those three read*() methods,
they simply call the equivalent methods in
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java
while wrapped in try/catch statements, so either you're really spending
that long reading the bytes from the files and doing shift-and-add
operations on the read bytes (in which case you should probably examine
both your NFS performance and your belief that you're only reading 5GB of
data) or you're spending a ton of time doing exception-handling (in which
case we may need to get a debugger in there to figure out what the
exceptions are).
I think those things will give us a good set of next steps to get closer to
determining what's actually going on.
Tim
On Wed, Oct 11, 2017 at 7:04 PM, Devlin <rb...@ofiglobal.com> wrote:
> Samples for cpu self-time, total-time, and memory usage:
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>