You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Devlin <rb...@ofiglobal.com> on 2017/10/12 01:04:50 UTC

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Samples for cpu self-time, total-time, and memory usage:

<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg> 

<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg> 

<http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg> 



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Tim Bain <tb...@alumni.duke.edu>.

I'm glad you got it working, and from the synonyms I'm not surprised that
NFS ended up being the root cause.

Tim

On Oct 19, 2017 6:24 AM, "Devlin" <rb...@ofiglobal.com> wrote:

> Thank you, Tim.
>
> We finally figured out the issue; NFS "noac" option was killing
> performance.
>
> (vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,
> proto=tcp,port=0,timeo=20,retrans=2)
>
> We joked about it afterwards, no "air conditioning" after a long summer can
> kill anything :-)
>
> Thanks again for your insight and pointers, will come in handy!
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Devlin <rb...@ofiglobal.com>.

Thank you, Tim.

We finally figured out the issue; NFS "noac" option was killing performance.

(vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,proto=tcp,port=0,timeo=20,retrans=2)

We joked about it afterwards, no "air conditioning" after a long summer can
kill anything :-)

Thanks again for your insight and pointers, will come in handy!



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Tim Bain <tb...@alumni.duke.edu>.

Sorry for the delay in responding.

I've got a number of follow-up questions/suggestions:

   1. Can you take screenshots sorted by all four non-percentage columns
   (Self Time, Self Time (CPU), Total Time, Total Time (CPU))? The ones with
   CPU measure CPU time, while the ones without measure total elapsed time (of
   which some portion is CPU and some portion is things like sleeps and I/O
   waits).
   2. Can you please take a screenshot of the Monitor tab after ActiveMQ
   finishes starting up? In particular I want to make sure that GC CPU usage
   looks reasonable.
   3. From your screenshot entitled cpu-total-time.jpg, it's clear that
   there's a very significant amount of CPU time being spent calling
   org.apache.karaf.management.internal.MBeanInvocationHandler.invoke(), which
   is JMX-related. We need to figure out if this is relevant or is a red
   herring. To do that, please find that method in the Call Tree tab (you
   should be able to right-click on it in the Hot Spots tab and select
   something like "Find in Call Tree") and see if you can tell what's calling
   it and what it's calling and whether those calls are in any way related to
   ActiveMQ. I suspect that it might actually be related to the use of
   VisualVM, but we need to make sure before we write it off.
   4. From your screenshot entitled cpu-self-time.jpg (which is in fact
   Self Time, *not* Self Time (CPU) as you said), it's clear that there's a
   significant amount of time being spent in
   org.apache.activemq.store.kahadb.MessageDatabase$3.run() as well as in
   various other ActiveMQ-related methods. You'll need to find those methods
   in the Call Tree tab and walk down the call stack looking for where the
   majority of the time (I'd focus on elapsed time rather than just CPU,
   because disk I/O may be part of your problem) is being spent. You can
   ignore any thread that doesn't eventually land you in something related to
   KahaDB or files; not every thread will be relevant to your slow startup
   time, so ignore the ones that aren't. It looks like a lot of time is being
   spent in RecoverableRandomAccessFile.readInt(), readFully(), and
   readByte(), so I suspect that those will be a large portion of where your
   time is being spent, but you may discover that there are other places where
   time is being spent that weren't visible from the screenshots you've shown
   so far.
   5. If the time really is primarily spent in those three read*() methods,
   they simply call the equivalent methods in
   http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java
   while wrapped in try/catch statements, so either you're really spending
   that long reading the bytes from the files and doing shift-and-add
   operations on the read bytes (in which case you should probably examine
   both your NFS performance and your belief that you're only reading 5GB of
   data) or you're spending a ton of time doing exception-handling (in which
   case we may need to get a debugger in there to figure out what the
   exceptions are).

I think those things will give us a good set of next steps to get closer to
determining what's actually going on.

Tim

On Wed, Oct 11, 2017 at 7:04 PM, Devlin <rb...@ofiglobal.com> wrote:

> Samples for cpu self-time, total-time, and memory usage:
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>