You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@activemq.apache.org by Raffi <ra...@gmail.com> on 2017/09/26 12:51:26 UTC

Kahadb index updates taking too much time on ActiveMQ 5.11

In production with ActiveMQ 5.11 with multi-kahadb per destination on NFSv4
M|S (Linux 7.1), we're seeing the following log messages intermittently
during operations:

*Slow KahaDB access: Journal append took: 2 ms, Index update took 16901 ms*

Also, on startup, we see the following in our kahadb logs; this condition
severely impacts start up time, sometimes taking up to 90m for the broker to
start, and that's for relatively small persistent store (< 5Gb). We have not
adjusted indexCacheSize, that will be our first attempt to resolve this; we
also plan to benchmark NFSv4 file system performance. Aside from that, is
there any other broker config  

2017-09-25 14:35:27,352 [AMQ-1-thread-1 ] DEBUG ListIndex                     
- loading
 2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovering from the journal @1:76785
 2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovering from the journal @1:76785
 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovery replayed 1 operations from the journal in 0.012 seconds.
 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovery replayed 1 operations from the journal in 0.012 seconds.
 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopping async queue tasks
 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopping async topic tasks
 2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopped KahaDB
 2017-09-25 14:35:27,855 [AMQ-1-thread-1 ] DEBUG KahaDBStore                   
- Store OpenWire version configured as: 6
 2017-09-25 14:35:30,234 [AMQ-1-thread-1 ] TRACE Journal                       
- Startup took: 2372 ms
 2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- KahaDB is version 5
 2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- KahaDB is version 5
 2017-09-25 14:35:30,281 [AMQ-1-thread-1 ] DEBUG ListIndex                     
- loading
 2017-09-25 14:35:30,282 [AMQ-1-thread-1 ] DEBUG ListIndex                     
- loading
 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovering from the journal @1:70835
 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovering from the journal @1:70835
 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovery replayed 1 operations from the journal in 0.012 seconds.
 2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovery replayed 1 operations from the journal in 0.012 seconds.
 2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopping async queue tasks
 2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopping async topic tasks
 2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopped KahaDB
 2017-09-25 14:35:30,785 [AMQ-1-thread-1 ] DEBUG KahaDBStore                   
- Store OpenWire version configured as: 6
 2017-09-25 14:35:33,023 [AMQ-1-thread-1 ] TRACE Journal                       
- Startup took: 2231 ms
 2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- KahaDB is version 5
 2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- KahaDB is version 5
 2017-09-25 14:35:33,065 [AMQ-1-thread-1 ] DEBUG ListIndex                     
- loading
 2017-09-25 14:35:33,066 [AMQ-1-thread-1 ] DEBUG ListIndex                     
- loading
 2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovering from the journal @1:55603
 2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovering from the journal @1:55603
 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovery replayed 1 operations from the journal in 0.013 seconds.
 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  MessageDatabase               
- Recovery replayed 1 operations from the journal in 0.013 seconds.
 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopping async queue tasks
 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopping async topic tasks
 2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  KahaDBStore                   
- Stopped KahaDB



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Tim Bain <tb...@alumni.duke.edu>.
I'm glad you got it working, and from the synonyms I'm not surprised that
NFS ended up being the root cause.

Tim

On Oct 19, 2017 6:24 AM, "Devlin" <rb...@ofiglobal.com> wrote:

> Thank you, Tim.
>
> We finally figured out the issue; NFS "noac" option was killing
> performance.
>
> (vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,
> proto=tcp,port=0,timeo=20,retrans=2)
>
> We joked about it afterwards, no "air conditioning" after a long summer can
> kill anything :-)
>
> Thanks again for your insight and pointers, will come in handy!
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Devlin <rb...@ofiglobal.com>.
Thank you, Tim.

We finally figured out the issue; NFS "noac" option was killing performance.

(vers=4.0,rsize=65536,wsize=65536,namlen=255,hard,noac,proto=tcp,port=0,timeo=20,retrans=2)

We joked about it afterwards, no "air conditioning" after a long summer can
kill anything :-)

Thanks again for your insight and pointers, will come in handy!



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Tim Bain <tb...@alumni.duke.edu>.
Sorry for the delay in responding.

I've got a number of follow-up questions/suggestions:

   1. Can you take screenshots sorted by all four non-percentage columns
   (Self Time, Self Time (CPU), Total Time, Total Time (CPU))? The ones with
   CPU measure CPU time, while the ones without measure total elapsed time (of
   which some portion is CPU and some portion is things like sleeps and I/O
   waits).
   2. Can you please take a screenshot of the Monitor tab after ActiveMQ
   finishes starting up? In particular I want to make sure that GC CPU usage
   looks reasonable.
   3. From your screenshot entitled cpu-total-time.jpg, it's clear that
   there's a very significant amount of CPU time being spent calling
   org.apache.karaf.management.internal.MBeanInvocationHandler.invoke(), which
   is JMX-related. We need to figure out if this is relevant or is a red
   herring. To do that, please find that method in the Call Tree tab (you
   should be able to right-click on it in the Hot Spots tab and select
   something like "Find in Call Tree") and see if you can tell what's calling
   it and what it's calling and whether those calls are in any way related to
   ActiveMQ. I suspect that it might actually be related to the use of
   VisualVM, but we need to make sure before we write it off.
   4. From your screenshot entitled cpu-self-time.jpg (which is in fact
   Self Time, *not* Self Time (CPU) as you said), it's clear that there's a
   significant amount of time being spent in
   org.apache.activemq.store.kahadb.MessageDatabase$3.run() as well as in
   various other ActiveMQ-related methods. You'll need to find those methods
   in the Call Tree tab and walk down the call stack looking for where the
   majority of the time (I'd focus on elapsed time rather than just CPU,
   because disk I/O may be part of your problem) is being spent. You can
   ignore any thread that doesn't eventually land you in something related to
   KahaDB or files; not every thread will be relevant to your slow startup
   time, so ignore the ones that aren't. It looks like a lot of time is being
   spent in RecoverableRandomAccessFile.readInt(), readFully(), and
   readByte(), so I suspect that those will be a large portion of where your
   time is being spent, but you may discover that there are other places where
   time is being spent that weren't visible from the screenshots you've shown
   so far.
   5. If the time really is primarily spent in those three read*() methods,
   they simply call the equivalent methods in
   http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/io/RandomAccessFile.java
   while wrapped in try/catch statements, so either you're really spending
   that long reading the bytes from the files and doing shift-and-add
   operations on the read bytes (in which case you should probably examine
   both your NFS performance and your belief that you're only reading 5GB of
   data) or you're spending a ton of time doing exception-handling (in which
   case we may need to get a debugger in there to figure out what the
   exceptions are).

I think those things will give us a good set of next steps to get closer to
determining what's actually going on.

Tim

On Wed, Oct 11, 2017 at 7:04 PM, Devlin <rb...@ofiglobal.com> wrote:

> Samples for cpu self-time, total-time, and memory usage:
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg>
>
> <http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg>
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Devlin <rb...@ofiglobal.com>.
Samples for cpu self-time, total-time, and memory usage:

<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-self-time.jpg> 

<http://activemq.2283324.n4.nabble.com/file/t376407/cpu-total-time.jpg> 

<http://activemq.2283324.n4.nabble.com/file/t376407/mem-usage.jpg> 



--
Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-f2341805.html

Re: Kahadb index updates taking too much time on ActiveMQ 5.11

Posted by Tim Bain <tb...@alumni.duke.edu>.
I've not seen anyone describe this set of symptoms, so I don't have
anything to suggest right from the outset, but I may be able to help you
figure out what's going on.

Could you attach a CPU sampler such as JVisualVM to the broker process and
capture CPU sampling (not profiling if this is an operational broker!) data
to determine where the broker is spending its time?

On Tue, Sep 26, 2017 at 6:51 AM, Raffi <ra...@gmail.com> wrote:

>
> In production with ActiveMQ 5.11 with multi-kahadb per destination on NFSv4
> M|S (Linux 7.1), we're seeing the following log messages intermittently
> during operations:
>
> *Slow KahaDB access: Journal append took: 2 ms, Index update took 16901 ms*
>
> Also, on startup, we see the following in our kahadb logs; this condition
> severely impacts start up time, sometimes taking up to 90m for the broker
> to
> start, and that's for relatively small persistent store (< 5Gb). We have
> not
> adjusted indexCacheSize, that will be our first attempt to resolve this; we
> also plan to benchmark NFSv4 file system performance. Aside from that, is
> there any other broker config
>
> 2017-09-25 14:35:27,352 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
>  2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovering from the journal @1:76785
>  2017-09-25 14:35:27,365 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovering from the journal @1:76785
>  2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
>  2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
>  2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopping async queue tasks
>  2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopping async topic tasks
>  2017-09-25 14:35:27,366 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopped KahaDB
>  2017-09-25 14:35:27,855 [AMQ-1-thread-1 ] DEBUG KahaDBStore
> - Store OpenWire version configured as: 6
>  2017-09-25 14:35:30,234 [AMQ-1-thread-1 ] TRACE Journal
> - Startup took: 2372 ms
>  2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - KahaDB is version 5
>  2017-09-25 14:35:30,276 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - KahaDB is version 5
>  2017-09-25 14:35:30,281 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
>  2017-09-25 14:35:30,282 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
>  2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovering from the journal @1:70835
>  2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovering from the journal @1:70835
>  2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
>  2017-09-25 14:35:30,296 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.012 seconds.
>  2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopping async queue tasks
>  2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopping async topic tasks
>  2017-09-25 14:35:30,297 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopped KahaDB
>  2017-09-25 14:35:30,785 [AMQ-1-thread-1 ] DEBUG KahaDBStore
> - Store OpenWire version configured as: 6
>  2017-09-25 14:35:33,023 [AMQ-1-thread-1 ] TRACE Journal
> - Startup took: 2231 ms
>  2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - KahaDB is version 5
>  2017-09-25 14:35:33,060 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - KahaDB is version 5
>  2017-09-25 14:35:33,065 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
>  2017-09-25 14:35:33,066 [AMQ-1-thread-1 ] DEBUG ListIndex
> - loading
>  2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovering from the journal @1:55603
>  2017-09-25 14:35:33,080 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovering from the journal @1:55603
>  2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.013 seconds.
>  2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  MessageDatabase
> - Recovery replayed 1 operations from the journal in 0.013 seconds.
>  2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopping async queue tasks
>  2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopping async topic tasks
>  2017-09-25 14:35:33,081 [AMQ-1-thread-1 ] INFO  KahaDBStore
> - Stopped KahaDB
>
>
>
> --
> Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-User-
> f2341805.html
>