You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Oleksandr Rudyy <or...@gmail.com> on 2017/05/04 09:44:51 UTC

Re: Java broker OOM due to DirectMemory

Hi Ramayan,

We attached to the QPID-7753 a patch with a work around for 6.0.x branch.
It triggers flow to disk based on direct memory consumption rather than
estimation of the space occupied by the message content. The flow to disk
should evacuate message content preventing running out of direct memory. We
already committed the changes into 6.0.x and 6.1.x branches. It will be
included into upcoming 6.0.7 and 6.1.3 releases.

Please try and test the patch in your environment.

We are still working at finishing of the fix for trunk.

Kind Regards,
Alex

On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com> wrote:

> Hi Ramayan,
>
> The high-level plan is currently as follows:
>  1) Periodically try to compact sparse direct memory buffers.
>  2) Increase accuracy of messages' direct memory usage estimation to more
> reliably trigger flow to disk.
>  3) Add an additional flow to disk trigger based on the amount of allocated
> direct memory.
>
> A little bit more details:
>  1) We plan on periodically checking the amount of direct memory usage and
> if it is above a
>     threshold (50%) we compare the sum of all queue sizes with the amount
> of allocated direct memory.
>     If the ratio falls below a certain threshold we trigger a compaction
> task which goes through all queues
>     and copy's a certain amount of old message buffers into new ones
> thereby freeing the old buffers so
>     that they can be returned to the buffer pool and be reused.
>
>  2) Currently we trigger flow to disk based on an estimate of how much
> memory the messages on the
>     queues consume. We had to use estimates because we did not have
> accurate size numbers for
>     message headers. By having accurate size information for message
> headers we can more reliably
>     enforce queue memory limits.
>
>  3) The flow to disk trigger based on message size had another problem
> which is more pertinent to the
>     current issue. We only considered the size of the messages and not how
> much memory we allocate
>     to store those messages. In the FIFO use case those numbers will be
> very close to each other but in
>     use cases like yours we can end up with sparse buffers and the numbers
> will diverge. Because of this
>     divergence we do not trigger flow to disk in time and the broker can go
> OOM.
>     To fix the issue we want to add an additional flow to disk trigger
> based on the amount of allocated direct
>     memory. This should prevent the broker from going OOM even if the
> compaction strategy outlined above
>     should fail for some reason (e.g., the compaction task cannot keep up
> with the arrival of new messages).
>
> Currently, there are patches for the above points but they suffer from some
> thread-safety issues that need to be addressed.
>
> I hope this description helps. Any feedback is, as always, welcome.
>
> Kind regards,
> Lorenz
>
>
>
> On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <ramayan.tiwari@gmail.com
> >
> wrote:
>
> > Hi Lorenz,
> >
> > Thanks so much for the patch. We have a perf test now to reproduce this
> > issue, so we did test with 256KB, 64KB and 4KB network byte buffer. None
> of
> > these configurations help with the issue (or give any more breathing
> room)
> > for our use case. We would like to share the perf analysis with the
> > community:
> >
> > https://docs.google.com/document/d/1Wc1e-id-
> WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> > U-RiM/edit?usp=sharing
> >
> > Feel free to comment on the doc if certain details are incorrect or if
> > there are questions.
> >
> > Since the short term solution doesn't help us, we are very interested in
> > getting some details on how the community plans to address this, a high
> > level description of the approach will be very helpful for us in order to
> > brainstorm our use cases along with this solution.
> >
> > - Ramayan
> >
> > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <qu...@gmail.com>
> > wrote:
> >
> > > Hello Ramayan,
> > >
> > > We are still working on a fix for this issue.
> > > In the mean time we had an idea to potentially workaround the issue
> until
> > > a proper fix is released.
> > >
> > > The idea is to decrease the qpid network buffer size the broker uses.
> > > While this still allows for sparsely populated buffers it would improve
> > > the overall occupancy ratio.
> > >
> > > Here are the steps to follow:
> > >  * ensure you are not using TLS
> > >  * apply the attached patch
> > >  * figure out the size of the largest messages you are sending
> (including
> > > header and some overhead)
> > >  * set the context variable "qpid.broker.networkBufferSize" to that
> > value
> > > but not smaller than 4096
> > >  * test
> > >
> > > Decreasing the qpid network buffer size automatically limits the
> maximum
> > > AMQP frame size.
> > > Since you are using a very old client we are not sure how well it copes
> > > with small frame sizes where it has to split a message across multiple
> > > frames.
> > > Therefore, to play it safe you should not set it smaller than the
> largest
> > > messages (+ header + overhead) you are sending.
> > > I do not know what message sizes you are sending but AMQP imposes the
> > > restriction that the framesize cannot be smaller than 4096 bytes.
> > > In the qpid broker the default currently is 256 kB.
> > >
> > > In the current state the broker does not allow setting the network
> buffer
> > > to values smaller than 64 kB to allow TLS frames to fit into one
> network
> > > buffer.
> > > I attached a patch to this mail that lowers that restriction to the
> limit
> > > imposed by AMQP (4096 Bytes).
> > > Obviously, you should not use this when using TLS.
> > >
> > >
> > > I hope this reduces the problems you are currently facing until we can
> > > complete the proper fix.
> > >
> > > Kind regards,
> > > Lorenz
> > >
> > >
> > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> > > > Thanks so much Keith and the team for finding the root cause. We are
> so
> > > > relieved that we fix the root cause shortly.
> > > >
> > > > Couple of things that I forgot to mention on the mitigation steps we
> > took
> > > > in the last incident:
> > > > 1) We triggered GC from JMX bean multiple times, it did not help in
> > > > reducing DM allocated.
> > > > 2) We also killed all the AMQP connections to the broker when DM was
> at
> > > > 80%. This did not help either. The way we killed connections - using
> > JMX
> > > > got list of all the open AMQP connections and called close from JMX
> > > mbean.
> > > >
> > > > I am hoping the above two are not related to root cause, but wanted
> to
> > > > bring it up in case this is relevant.
> > > >
> > > > Thanks
> > > > Ramayan
> > > >
> > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <ke...@gmail.com>
> wrote:
> > > >
> > > > >
> > > > > Hello Ramayan
> > > > >
> > > > > I believe I understand the root cause of the problem.  We have
> > > > > identified a flaw in the direct memory buffer management employed
> by
> > > > > Qpid Broker J which for some messaging use-cases can lead to the
> OOM
> > > > > direct you describe.   For the issue to manifest the producing
> > > > > application needs to use a single connection for the production of
> > > > > messages some of which are short-lived (i.e. are consumed quickly)
> > > > > whilst others remain on the queue for some time.  Priority queues,
> > > > > sorted queues and consumers utilising selectors that result in some
> > > > > messages being left of the queue could all produce this patten.
> The
> > > > > pattern leads to a sparsely occupied 256K net buffers which cannot
> be
> > > > > released or reused until every message that reference a 'chunk' of
> it
> > > > > is either consumed or flown to disk.   The problem was introduced
> > with
> > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> > > > >
> > > > > The flow to disk feature is not helping us here because its
> algorithm
> > > > > considers only the size of live messages on the queues. If the
> > > > > accumulative live size does not exceed the threshold, the messages
> > > > > aren't flown to disk. I speculate that when you observed that
> moving
> > > > > messages cause direct message usage to drop earlier today, your
> > > > > message movement cause a queue to go over threshold, cause message
> to
> > > > > be flown to disk and their direct memory references released.  The
> > > > > logs will confirm this is so.
> > > > >
> > > > > I have not identified an easy workaround at the moment.
>  Decreasing
> > > > > the flow to disk threshold and/or increasing available direct
> memory
> > > > > should alleviate and may be an acceptable short term workaround.
> If
> > > > > it were possible for publishing application to publish short lived
> > and
> > > > > long lived messages on two separate JMS connections this would
> avoid
> > > > > this defect.
> > > > >
> > > > > QPID-7753 tracks this issue and QPID-7754 is a related this
> problem.
> > > > > We intend to be working on these early next week and will be aiming
> > > > > for a fix that is back-portable to 6.0.
> > > > >
> > > > > Apologies that you have run into this defect and thanks for
> > reporting.
> > > > >
> > > > > Thanks, Keith
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> ramayan.tiwari@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > We have been monitoring the brokers everyday and today we found
> one
> > > > > instance
> > > > > >
> > > > > > where broker’s DM was constantly going up and was about to crash,
> > so
> > > we
> > > > > > experimented some mitigations, one of which caused the DM to come
> > > down.
> > > > > > Following are the details, which might help us understanding the
> > > issue:
> > > > > >
> > > > > > Traffic scenario:
> > > > > >
> > > > > > DM allocation had been constantly going up and was at 90%. There
> > > were two
> > > > > > queues which seemed to align with the theories that we had. Q1’s
> > > size had
> > > > > > been large right after the broker start and had slow consumption
> of
> > > > > > messages, queue size only reduced from 76MB to 75MB over a period
> > of
> > > > > 6hrs.
> > > > > >
> > > > > > Q2 on the other hand, started small and was gradually growing,
> > queue
> > > size
> > > > > > went from 7MB to 10MB in 6hrs. There were other queues with
> traffic
> > > > > during
> > > > > >
> > > > > > this time.
> > > > > >
> > > > > > Action taken:
> > > > > >
> > > > > > Moved all the messages from Q2 (since this was our original
> theory)
> > > to Q3
> > > > > > (already created but no messages in it). This did not help with
> the
> > > DM
> > > > > > growing up.
> > > > > > Moved all the messages from Q1 to Q4 (already created but no
> > > messages in
> > > > > > it). This reduced DM allocation from 93% to 31%.
> > > > > >
> > > > > > We have the heap dump and thread dump from when broker was 90% in
> > DM
> > > > > > allocation. We are going to analyze that to see if we can get
> some
> > > clue.
> > > > > We
> > > > > >
> > > > > > wanted to share this new information which might help in
> reasoning
> > > about
> > > > > the
> > > > > >
> > > > > > memory issue.
> > > > > >
> > > > > > - Ramayan
> > > > > >
> > > > > >
> > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> > > > > ramayan.tiwari@gmail.com>
> > > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hi Keith,
> > > > > > >
> > > > > > > Thanks so much for your response and digging into the issue.
> > Below
> > > are
> > > > > the
> > > > > >
> > > > > > >
> > > > > > > answer to your questions:
> > > > > > >
> > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't use 6.1
> > > where it
> > > > > > > was released because we need JMX support. Here is the
> destination
> > > > > format:
> > > > > >
> > > > > > >
> > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes : {
> > > arguments : {
> > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> > > > > > >
> > > > > > > 2) Our machines have 40 cores, which will make the number of
> > > threads to
> > > > > > > 80. This might not be an issue, because this will show up in
> the
> > > > > baseline DM
> > > > > >
> > > > > > >
> > > > > > > allocated, which is only 6% (of 4GB) when we just bring up the
> > > broker.
> > > > > > >
> > > > > > > 3) The only setting that we tuned WRT to DM is
> > flowToDiskThreshold,
> > > > > which
> > > > > >
> > > > > > >
> > > > > > > is set at 80% now.
> > > > > > >
> > > > > > > 4) Only one virtual host in the broker.
> > > > > > >
> > > > > > > 5) Most of our queues (99%) are priority, we also have 8-10
> > sorted
> > > > > queues.
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > 6) Yeah we are using the standard 0.16 client and not AMQP 1.0
> > > clients.
> > > > > > > The connection log line looks like:
> > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol
> Version
> > :
> > > 0-10
> > > > > :
> > > > > >
> > > > > > >
> > > > > > > Client ID : test : Client Version : 0.16 : Client Product :
> qpid
> > > > > > >
> > > > > > > We had another broker crashed about an hour back, we do see the
> > > same
> > > > > > > patterns:
> > > > > > > 1) There is a queue which is constantly growing, enqueue is
> > faster
> > > than
> > > > > > > dequeue on that queue for a long period of time.
> > > > > > > 2) Flow to disk didn't kick in at all.
> > > > > > >
> > > > > > > This graph shows memory growth (red line - heap, blue - DM
> > > allocated,
> > > > > > > yellow - DM used)
> > > > > > >
> > > > > > > https://drive.google.com/file/d/0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> > > > > view?usp=sharing
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > The below graph shows growth on a single queue (there are 10-12
> > > other
> > > > > > > queues with traffic as well, something large size than this
> > queue):
> > > > > > >
> > > > > > > https://drive.google.com/file/d/0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> > > > > view?usp=sharing
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Couple of questions:
> > > > > > > 1) Is there any developer level doc/design spec on how Qpid
> uses
> > > DM?
> > > > > > > 2) We are not getting heap dumps automatically when broker
> > crashes
> > > due
> > > > > to
> > > > > >
> > > > > > >
> > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone
> found a
> > > way
> > > > > to get
> > > > > >
> > > > > > >
> > > > > > > around this problem?
> > > > > > >
> > > > > > > Thanks
> > > > > > > Ramayan
> > > > > > >
> > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <keith.wall@gmail.com
> >
> > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Ramayan
> > > > > > > >
> > > > > > > > We have been discussing your problem here and have a couple
> of
> > > > > questions.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I have been experimenting with use-cases based on your
> > > descriptions
> > > > > > > > above, but so far, have been unsuccessful in reproducing a
> > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
> condition.
> > > The
> > > > > > > > direct memory usage reflects the expected model: it levels
> off
> > > when
> > > > > > > > the flow to disk threshold is reached and direct memory is
> > > release as
> > > > > > > > messages are consumed until the minimum size for caching of
> > > direct is
> > > > > > > > reached.
> > > > > > > >
> > > > > > > > 1] For clarity let me check: we believe when you say "patch
> to
> > > use
> > > > > > > > MultiQueueConsumer" you are referring to the patch attached
> to
> > > > > > > > QPID-7462 "Add experimental "pull" consumers to the broker"
> > and
> > > you
> > > > > > > > are using a combination of this "x-pull-only"  with the
> > standard
> > > > > > > > "x-multiqueue" feature.  Is this correct?
> > > > > > > >
> > > > > > > > 2] One idea we had here relates to the size of the
> virtualhost
> > IO
> > > > > > > > pool.   As you know from the documentation, the Broker
> > > caches/reuses
> > > > > > > > direct memory internally but the documentation fails to
> > mentions
> > > that
> > > > > > > > each pooled virtualhost IO thread also grabs a chunk (256K)
> of
> > > direct
> > > > > > > > memory from this cache.  By default the virtual host IO pool
> is
> > > sized
> > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() * 2,
> 64),
> > > so if
> > > > > > > > you have a machine with a very large number of cores, you may
> > > have a
> > > > > > > > surprising large amount of direct memory assigned to
> > virtualhost
> > > IO
> > > > > > > > threads.   Check the value of connectionThreadPoolSize on the
> > > > > > > > virtualhost
> > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
> > > virtualhostnodename>/<;
> > > > > virtualhostname>)
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > to see what value is in force.  What is it?  It is possible
> to
> > > tune
> > > > > > > > the pool size using context variable
> > > > > > > > virtualhost.connectionThreadPool.size.
> > > > > > > >
> > > > > > > > 3] Tell me if you are tuning the Broker in way beyond the
> > > direct/heap
> > > > > > > > memory settings you have told us about already.  For instance
> > > you are
> > > > > > > > changing any of the direct memory pooling settings
> > > > > > > > broker.directByteBufferPoolSize, default network buffer size
> > > > > > > > qpid.broker.networkBufferSize or applying any other
> > non-standard
> > > > > > > > settings?
> > > > > > > >
> > > > > > > > 4] How many virtual hosts do you have on the Broker?
> > > > > > > >
> > > > > > > > 5] What is the consumption pattern of the messages?  Do
> consume
> > > in a
> > > > > > > > strictly FIFO fashion or are you making use of message
> > selectors
> > > > > > > > or/and any of the out-of-order queue types (LVQs, priority
> > queue
> > > or
> > > > > > > > sorted queues)?
> > > > > > > >
> > > > > > > > 6] Is it just the 0.16 client involved in the application?
> >  Can
> > > I
> > > > > > > > check that you are not using any of the AMQP 1.0 clients
> > > > > > > > (org,apache.qpid:qpid-jms-client or
> > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software stack
> > (as
> > > either
> > > > > > > > consumers or producers)
> > > > > > > >
> > > > > > > > Hopefully the answers to these questions will get us closer
> to
> > a
> > > > > > > > reproduction.   If you are able to reliable reproduce it,
> > please
> > > share
> > > > > > > > the steps with us.
> > > > > > > >
> > > > > > > > Kind regards, Keith.
> > > > > > > >
> > > > > > > >
> > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> > > ramayan.tiwari@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > After a lot of log mining, we might have a way to explain
> the
> > > > > sustained
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > increased in DirectMemory allocation, the correlation seems
> > to
> > > be
> > > > > with
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > the
> > > > > > > > > growth in the size of a Queue that is getting consumed but
> at
> > > a much
> > > > > > > > > slower
> > > > > > > > > rate than producers putting messages on this queue.
> > > > > > > > >
> > > > > > > > > The pattern we see is that in each instance of broker
> crash,
> > > there is
> > > > > > > > > at
> > > > > > > > > least one queue (usually 1 queue) whose size kept growing
> > > steadily.
> > > > > > > > > It’d be
> > > > > > > > > of significant size but not the largest queue -- usually
> > there
> > > are
> > > > > > > > > multiple
> > > > > > > > > larger queues -- but it was different from other queues in
> > > that its
> > > > > > > > > size
> > > > > > > > > was growing steadily. The queue would also be moving, but
> its
> > > > > > > > > processing
> > > > > > > > > rate was not keeping up with the enqueue rate.
> > > > > > > > >
> > > > > > > > > Our theory that might be totally wrong: If a queue is
> moving
> > > the
> > > > > entire
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > time, maybe then the broker would keep reusing the same
> > buffer
> > > in
> > > > > > > > > direct
> > > > > > > > > memory for the queue, and keep on adding onto it at the end
> > to
> > > > > > > > > accommodate
> > > > > > > > > new messages. But because it’s active all the time and
> we’re
> > > pointing
> > > > > > > > > to
> > > > > > > > > the same buffer, space allocated for messages at the head
> of
> > > the
> > > > > > > > > queue/buffer doesn’t get reclaimed, even long after those
> > > messages
> > > > > have
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > been processed. Just a theory.
> > > > > > > > >
> > > > > > > > > We are also trying to reproduce this using some perf tests
> to
> > > enqueue
> > > > > > > > > with
> > > > > > > > > same pattern, will update with the findings.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Ramayan
> > > > > > > > >
> > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
> > > > > > > > > <ra...@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Another issue that we noticed is when broker goes OOM due
> > to
> > > direct
> > > > > > > > > > memory, it doesn't create heap dump (specified by "-XX:+
> > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM error is
> > > same as
> > > > > what
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > is
> > > > > > > > > > mentioned in the oracle JVM docs
> > > ("java.lang.OutOfMemoryError").
> > > > > > > > > >
> > > > > > > > > > Has anyone been able to find a way to get to heap dump
> for
> > > DM OOM?
> > > > > > > > > >
> > > > > > > > > > - Ramayan
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
> > > > > > > > > > <ramayan.tiwari@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Alex,
> > > > > > > > > > >
> > > > > > > > > > > Below are the flow to disk logs from broker having
> > > 3million+
> > > > > messages
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > at
> > > > > > > > > > > this time. We only have one virtual host. Time is in
> GMT.
> > > Looks
> > > > > like
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > flow
> > > > > > > > > > > to disk is active on the whole virtual host and not a
> > > queue level.
> > > > > > > > > > >
> > > > > > > > > > > When the same broker went OOM yesterday, I did not see
> > any
> > > flow to
> > > > > > > > > > > disk
> > > > > > > > > > > logs from when it was started until it crashed (crashed
> > > twice
> > > > > within
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 4hrs).
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3356539KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> memory
> > > use
> > > > > > > > > > > 3354866KB
> > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3358509KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> memory
> > > use
> > > > > > > > > > > 3353501KB
> > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3357544KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> memory
> > > use
> > > > > > > > > > > 3353236KB
> > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3356704KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> memory
> > > use
> > > > > > > > > > > 3353511KB
> > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3357948KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> memory
> > > use
> > > > > > > > > > > 3355310KB
> > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3365624KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> memory
> > > use
> > > > > > > > > > > 3355136KB
> > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> memory
> > > use
> > > > > > > > > > > 3358683KB
> > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > After production release (2days back), we have seen 4
> > > crashes in 3
> > > > > > > > > > > different brokers, this is the most pressing concern
> for
> > > us in
> > > > > > > > > > > decision if
> > > > > > > > > > > we should roll back to 0.32. Any help is greatly
> > > appreciated.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Ramayan
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy <
> > > orudyy@gmail.com
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Ramayan,
> > > > > > > > > > > > Thanks for the details. I would like to clarify
> whether
> > > flow to
> > > > > disk
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > was
> > > > > > > > > > > > triggered today for 3 million messages?
> > > > > > > > > > > >
> > > > > > > > > > > > The following logs are issued for flow to disk:
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > use
> > > > > > > > > > > > {0,number,#}KB
> > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory use
> > > > > > > > > > > > {0,number,#}KB within threshold {1,number,#.##}KB
> > > > > > > > > > > >
> > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
> > > > > ramayan.tiwari@gmail.com>
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for your response, here are the details:
> > > > > > > > > > > > >
> > > > > > > > > > > > > We use "direct" exchange, without persistence (we
> > > specify
> > > > > > > > > > > > NON_PERSISTENT
> > > > > > > > > > > > >
> > > > > > > > > > > > > that while sending from client) and use BDB store.
> We
> > > use JSON
> > > > > > > > > > > > > virtual
> > > > > > > > > > > > host
> > > > > > > > > > > > >
> > > > > > > > > > > > > type. We are not using SSL.
> > > > > > > > > > > > >
> > > > > > > > > > > > > When the broker went OOM, we had around 1.3 million
> > > messages
> > > > > with
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 100
> > > > > > > > > > > > bytes
> > > > > > > > > > > > >
> > > > > > > > > > > > > average message size. Direct memory allocation
> (value
> > > read from
> > > > > > > > > > > > > MBean)
> > > > > > > > > > > > kept
> > > > > > > > > > > > >
> > > > > > > > > > > > > going up, even though it wouldn't need more DM to
> > > store these
> > > > > many
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > messages. DM allocated persisted at 99% for about 3
> > > and half
> > > > > hours
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > before
> > > > > > > > > > > > >
> > > > > > > > > > > > > crashing.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Today, on the same broker we have 3 million
> messages
> > > (same
> > > > > message
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > size)
> > > > > > > > > > > > >
> > > > > > > > > > > > > and DM allocated is only at 8%. This seems like
> there
> > > is some
> > > > > > > > > > > > > issue
> > > > > > > > > > > > with
> > > > > > > > > > > > >
> > > > > > > > > > > > > de-allocation or a leak.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have uploaded the memory utilization graph here:
> > > > > > > > > > > > > https://drive.google.com/file/d/
> > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> > > > > > > > > > > > > view?usp=sharing
> > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used (sum
> of
> > > queue
> > > > > > > > > > > > > payload)
> > > > > > > > > > > > and Red
> > > > > > > > > > > > >
> > > > > > > > > > > > > is heap usage.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr Rudyy
> > > > > > > > > > > > > <or...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Ramayan,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Could please share with us the details of
> messaging
> > > use
> > > > > case(s)
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > which
> > > > > > > > > > > > > ended
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > up in OOM on broker side?
> > > > > > > > > > > > > > I would like to reproduce the issue on my local
> > > broker in
> > > > > order
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > to
> > > > > > > > > > > > fix
> > > > > > > > > > > > >
> > > > > > > > > > > > > it.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I would appreciate if you could provide as much
> > > details as
> > > > > > > > > > > > > > possible,
> > > > > > > > > > > > > > including, messaging topology, message
> persistence
> > > type,
> > > > > message
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > sizes,volumes, etc.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for keeping
> > > message
> > > > > content
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > receiving/sending data. Each plain connection
> > > utilizes 512K of
> > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > memory. Each SSL connection uses 1M of direct
> > > memory. Your
> > > > > > > > > > > > > > memory
> > > > > > > > > > > > > settings
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > look Ok to me.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari
> > > > > > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We are using Java broker 6.0.5, with patch to
> use
> > > > > > > > > > > > MultiQueueConsumer
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > feature. We just finished deploying to
> production
> > > and saw
> > > > > > > > > > > > > > > couple of
> > > > > > > > > > > > > > > instances of broker OOM due to running out of
> > > DirectMemory
> > > > > > > > > > > > > > > buffer
> > > > > > > > > > > > > > > (exceptions at the end of this email).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Here is our setup:
> > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g (this is
> > > opposite of
> > > > > > > > > > > > > > > what the
> > > > > > > > > > > > > > > recommendation is, however, for our use cause
> > > message
> > > > > payload
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > is
> > > > > > > > > > > > really
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > small ~400bytes and is way less than the per
> > > message
> > > > > overhead
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > 1KB).
> > > > > > > > > > > > >
> > > > > > > > > > > > > In
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > perf testing, we were able to put 2 million
> > > messages without
> > > > > > > > > > > > > > > any
> > > > > > > > > > > > > issues.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 2. ~400 connections to broker.
> > > > > > > > > > > > > > > 3. Each connection has 20 sessions and there is
> > > one multi
> > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > consumer
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > attached to each session, listening to around
> > 1000
> > > queues.
> > > > > > > > > > > > > > > 4. We are still using 0.16 client (I know).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > With the above setup, the baseline utilization
> > > (without any
> > > > > > > > > > > > messages)
> > > > > > > > > > > > >
> > > > > > > > > > > > > for
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > direct memory was around 230mb (with 410
> > > connection each
> > > > > > > > > > > > > > > taking
> > > > > > > > > > > > 500KB).
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Based on our understanding of broker memory
> > > allocation,
> > > > > > > > > > > > > > > message
> > > > > > > > > > > > payload
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > should be the only thing adding to direct
> memory
> > > utilization
> > > > > > > > > > > > > > > (on
> > > > > > > > > > > > top of
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > baseline), however, we are experiencing
> something
> > > completely
> > > > > > > > > > > > different.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > our last broker crash, we see that broker is
> > > constantly
> > > > > > > > > > > > > > > running
> > > > > > > > > > > > with
> > > > > > > > > > > > >
> > > > > > > > > > > > > 90%+
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > direct memory allocated, even when message
> > payload
> > > sum from
> > > > > > > > > > > > > > > all the
> > > > > > > > > > > > > > queues
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > is only 6-8% (these % are against available DM
> of
> > > 4gb).
> > > > > During
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > these
> > > > > > > > > > > > >
> > > > > > > > > > > > > high
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > DM usage period, heap usage was around 60% (of
> > > 12gb).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We would like some help in understanding what
> > > could be the
> > > > > > > > > > > > > > > reason
> > > > > > > > > > > > of
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > these
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > high DM allocations. Are there things other
> than
> > > message
> > > > > > > > > > > > > > > payload
> > > > > > > > > > > > and
> > > > > > > > > > > > >
> > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > connection, which use DM and could be
> > contributing
> > > to these
> > > > > > > > > > > > > > > high
> > > > > > > > > > > > usage?
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Another thing where we are puzzled is the
> > > de-allocation of
> > > > > DM
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > byte
> > > > > > > > > > > > > > buffers.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > From log mining of heap and DM utilization,
> > > de-allocation of
> > > > > > > > > > > > > > > DM
> > > > > > > > > > > > doesn't
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > correlate with heap GC. If anyone has seen any
> > > documentation
> > > > > > > > > > > > related to
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > this, it would be very helpful if you could
> share
> > > that.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *Exceptions*
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> memory
> > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(Bits.java:658)
> > > > > ~[na:1.8.0_40]
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > init>(DirectByteBuffer.java:
> > > > > 123)
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > allocateDirect(ByteBuffer.
> > > java:311)
> > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnectionPlainD
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > elegate.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> > > > > NonBlockingConnectionPlainDele
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > gate.java:93)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnectionPlainDele
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > gate.processData(
> NonBlockingConnectionPlainDele
> > > > > gate.java:60)
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnection.doRead(
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnection.doWork(
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NetworkConnectionScheduler.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > processConnection(NetworkConnectionScheduler.
> > > java:124)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.
> transport.SelectorThread$
> > > > > ConnectionPr
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > ocessor.
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > processConnection(SelectorThread.java:504)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.
> transport.SelectorThread$
> > > > > > > > > > > > > > > SelectionTask.performSelect(
> > > SelectorThread.java:337)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > org.apache.qpid.server.
> transport.SelectorThread$
> > > > > SelectionTask.run(
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > SelectorThread.java:87)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > java.util.concurrent.
> > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > java.util.concurrent.
> > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > *Second exception*
> > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> memory
> > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(Bits.java:658)
> > > > > ~[na:1.8.0_40]
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > init>(DirectByteBuffer.java:
> > > > > 123)
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > allocateDirect(ByteBuffer.
> > > java:311)
> > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnectionPlainDele
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > gate.<init>(NonBlockingConnectionPlainDele
> > > gate.java:45)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > NonBlockingConnection.
> > > > > > > > > > > > > > > setTransportEncryption(
> > NonBlockingConnection.java:
> > > 625)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnection.<init>(
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingNetworkTransport.
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > acceptSocketChannel(
> NonBlockingNetworkTransport.
> > > java:158)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.
> transport.SelectorThread$
> > > > > SelectionTas
> > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > k$1.run(
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > SelectorThread.java:191)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > java.util.concurrent.
> > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > java.util.concurrent.
> > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > ------------------------------------------------------------
> > > ---------
> > > > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > > > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > > > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > For additional commands, e-mail: users-help@qpid.apache.org
> > >
> >
>

Re: Java broker OOM due to DirectMemory

Posted by Keith W <ke...@gmail.com>.
On 25 May 2017 at 00:02, Ramayan Tiwari <ra...@gmail.com> wrote:
> Hi Keith,
>
> Thanks so much for the update and putting together the new version with
> enhancements to flowToDisk.

No problem at all.

Just to confirm, in order for me to get the
> broker with these fixes and also use MultiQueueConsumer, I will get 6.0.7
> version and apply QPID-7462 on it?

Yes, that's right.

> For the longer terms fix (periodically coalesce byte buffers) QPID-7753, is
> it possible to also backport that to 6.0.x branch?

We have considered that, but have decided no.   Firstly, the
coalescing work is a new approach and we need more time for testing
before we are sure it is ready for release.  Secondly, the code on
master has moved on substantially since the 6.0 branch was take (it is
now over two years).  Much of the patch would require rework.

> We are working of
> migrating all of our monitoring to use REST API, however, if its possible to
> put this fix in 6.0.x, that would be very helpful.
>
> Thanks
> Ramayan
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Re: Java broker OOM due to DirectMemory

Posted by Ramayan Tiwari <ra...@gmail.com>.
Hi Keith,

Thanks so much for the update and putting together the new version with
enhancements to flowToDisk. Just to confirm, in order for me to get the
broker with these fixes and also use MultiQueueConsumer, I will get 6.0.7
version and apply QPID-7462 on it?

For the longer terms fix (periodically coalesce byte buffers) QPID-7753, is
it possible to also backport that to 6.0.x branch? We are working of
migrating all of our monitoring to use REST API, however, if its possible
to put this fix in 6.0.x, that would be very helpful.

Thanks
Ramayan



On Wed, May 24, 2017 at 5:53 AM, Keith W <ke...@gmail.com> wrote:

> Hi Ramayan
>
> Further changes have been made on the 6.0.x branch that prevent the
> noisy flow to disk messages in the log.   The flow to disk on/off
> messages we had before didn't really fit and this resulted in the logs
> being spammed when either a queue was considered full or memory was
> under pressure.  Now you'll see a VHT-1008 periodically written to the
> logs when flow to disk has been working.  It reports the number of
> bytes that have been evacuated from memory.
>
> The actual mechanism used to prevent the OOM is the same as code you
> tested on May 5th: the flow to disk mechanism is triggered when the
> allocated direct memory size exceeds the flow to disk threshold.
> This evacuates messages from memory, flowing them to disk if
> necessary, until the allocated direct memory falls below the threshold
> again.  In doing so, some buffers that were previously sparse will be
> released.  Expect to see the direct memory graph level off, but it
> won't necessarily fall. This is expected.   In other words, the direct
> graph you shared on 5th does not surprise me.
>
> We plan to put out a 6.0.7/6.1.3 RCs out in the next few days which
> will include these changes.    The patch attached to QPID-7462 has
> been updated it will apply.
>
> In the longer term, for 7.0.0 we have already improved the Broker so
> that it actively manage the sparsity of buffers without falling back
> on flow to disk, Memory requirements for use cases such as yours
> should be much more reasonable.
>
> I know you currently have a dependency on the old JMX management
> interface.   I'd suggest you look at eliminating the dependency soon,
> so you are free to upgrade when the time comes.
>
> Kind regards, Keith Wall.
>
> On 24 May 2017 at 13:52, Keith W <ke...@gmail.com> wrote:
> > Hi Ramayan
> >
> > Further changes have been made on the 6.0.x branch that prevent the
> > noisy flow to disk messages in the log.   The flow to disk on/off
> > messages we had before didn't really fit and this resulted in the logs
> > being spammed when either a queue was considered full or memory was
> > under pressure.  Now you'll see a VHT-1008 periodically written to the
> > logs when flow to disk has been working.  It reports the number of
> > bytes that have been evacuated from memory.
> >
> > The actual mechanism used to prevent the OOM is the same as code you
> > tested on May 5th: the flow to disk mechanism is triggered when the
> > allocated direct memory size exceeds the flow to disk threshold.
> > This evacuates messages from memory, flowing them to disk if
> > necessary, until the allocated direct memory falls below the threshold
> > again.  In doing so, some buffers that were previously sparse will be
> > released.  Expect to see the direct memory graph level off, but it
> > won't necessarily fall. This is expected.   In other words, the direct
> > graph you shared on 5th does not surprise me.
> >
> > We plan to put out a 6.0.7/6.1.3 RCs out in the next few days which
> > will include these changes.    The patch attached to QPID-7462 has
> > been updated it will apply.
> >
> > In the longer term, for 7.0.0 we have already improved the Broker so
> > that it actively manage the sparsity of buffers without falling back
> > on flow to disk, Memory requirements for use cases such as yours
> > should be much more reasonable.
> >
> > I know you currently have a dependency on the old JMX management
> > interface.   I'd suggest you look at eliminating the dependency soon,
> > so you are free to upgrade when the time comes.
> >
> > Kind regards, Keith Wall.
> >
> >
> > On 16 May 2017 at 19:58, Ramayan Tiwari <ra...@gmail.com>
> wrote:
> >> Thanks Keith for the update.
> >>
> >> - Ramayan
> >>
> >> On Mon, May 15, 2017 at 2:35 AM, Keith W <ke...@gmail.com> wrote:
> >>>
> >>> Hi Ramayan
> >>>
> >>> We are still looking at our approach to the Broker's flow to disk
> >>> feature in light of the defect you highlighted.  We have some work in
> >>> flight this week investigating alternative approaches which I am
> >>> hoping will conclude by the end of week.  I should be able to update
> >>> you then.
> >>>
> >>> Thanks Keith
> >>>
> >>> On 12 May 2017 at 20:58, Ramayan Tiwari <ra...@gmail.com>
> wrote:
> >>> > Hi Alex,
> >>> >
> >>> > Any update on the fix for this?
> >>> > QPID-7753 is assigned a fix version for 7.0.0, I am hoping that the
> fix
> >>> > will also be back ported to 6.0.x.
> >>> >
> >>> > Thanks
> >>> > Ramayan
> >>> >
> >>> > On Mon, May 8, 2017 at 2:14 AM, Oleksandr Rudyy <or...@gmail.com>
> >>> > wrote:
> >>> >
> >>> >> Hi Ramayan,
> >>> >>
> >>> >> Thanks for testing the patch and providing a feedback.
> >>> >>
> >>> >> Regarding direct memory utilization, the Qpid Broker caches up to
> 256MB
> >>> >> of
> >>> >> direct memory internally in QpidByteBuffers. Thus, when testing the
> >>> >> Broker
> >>> >> with only 256MB of direct memory, the entire direct memory could be
> >>> >> cached
> >>> >> and it would look as if direct memory is never released.
> Potentially,
> >>> >> you
> >>> >> can reduce the number of buffers cached on broker by changing
> context
> >>> >> variable 'broker.directByteBufferPoolSize'. By default, it is set
> to
> >>> >> 1000.
> >>> >> With buffer size of 256K, it would give ~256M of cache.
> >>> >>
> >>> >> Regarding introducing lower and upper thresholds for 'flow to
> disk'. It
> >>> >> seems like a good idea and we will try to implement it early this
> week
> >>> >> on
> >>> >> trunk first.
> >>> >>
> >>> >> Kind Regards,
> >>> >> Alex
> >>> >>
> >>> >>
> >>> >> On 5 May 2017 at 23:49, Ramayan Tiwari <ra...@gmail.com>
> >>> >> wrote:
> >>> >>
> >>> >> > Hi Alex,
> >>> >> >
> >>> >> > Thanks for providing the patch. I verified the fix with same perf
> >>> >> > test,
> >>> >> and
> >>> >> > it does prevent broker from going OOM, however. DM utilization
> >>> >> > doesn't
> >>> >> get
> >>> >> > any better after hitting the threshold (where flow to disk is
> >>> >> > activated
> >>> >> > based on total used % across broker - graph in the link below).
> >>> >> >
> >>> >> > After hitting the final threshold, flow to disk activates and
> >>> >> > deactivates
> >>> >> > pretty frequently across all the queues. The reason seems to be
> >>> >> > because
> >>> >> > there is only one threshold currently to trigger flow to disk.
> Would
> >>> >> > it
> >>> >> > make sense to break this down to high and low threshold - so that
> >>> >> > once
> >>> >> flow
> >>> >> > to disk is active after hitting high threshold, it will be active
> >>> >> > until
> >>> >> the
> >>> >> > queue utilization (or broker DM allocation) reaches the low
> >>> >> > threshold.
> >>> >> >
> >>> >> > Graph and flow to disk logs are here:
> >>> >> > https://docs.google.com/document/d/1Wc1e-id-
> >>> >> WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> >>> >> > U-RiM/edit#heading=h.6400pltvjhy7
> >>> >> >
> >>> >> > Thanks
> >>> >> > Ramayan
> >>> >> >
> >>> >> > On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <orudyy@gmail.com
> >
> >>> >> wrote:
> >>> >> >
> >>> >> > > Hi Ramayan,
> >>> >> > >
> >>> >> > > We attached to the QPID-7753 a patch with a work around for
> 6.0.x
> >>> >> branch.
> >>> >> > > It triggers flow to disk based on direct memory consumption
> rather
> >>> >> > > than
> >>> >> > > estimation of the space occupied by the message content. The
> flow
> >>> >> > > to
> >>> >> disk
> >>> >> > > should evacuate message content preventing running out of direct
> >>> >> memory.
> >>> >> > We
> >>> >> > > already committed the changes into 6.0.x and 6.1.x branches. It
> >>> >> > > will be
> >>> >> > > included into upcoming 6.0.7 and 6.1.3 releases.
> >>> >> > >
> >>> >> > > Please try and test the patch in your environment.
> >>> >> > >
> >>> >> > > We are still working at finishing of the fix for trunk.
> >>> >> > >
> >>> >> > > Kind Regards,
> >>> >> > > Alex
> >>> >> > >
> >>> >> > > On 30 April 2017 at 15:45, Lorenz Quack <quack.lorenz@gmail.com
> >
> >>> >> wrote:
> >>> >> > >
> >>> >> > > > Hi Ramayan,
> >>> >> > > >
> >>> >> > > > The high-level plan is currently as follows:
> >>> >> > > >  1) Periodically try to compact sparse direct memory buffers.
> >>> >> > > >  2) Increase accuracy of messages' direct memory usage
> estimation
> >>> >> > > > to
> >>> >> > more
> >>> >> > > > reliably trigger flow to disk.
> >>> >> > > >  3) Add an additional flow to disk trigger based on the
> amount of
> >>> >> > > allocated
> >>> >> > > > direct memory.
> >>> >> > > >
> >>> >> > > > A little bit more details:
> >>> >> > > >  1) We plan on periodically checking the amount of direct
> memory
> >>> >> usage
> >>> >> > > and
> >>> >> > > > if it is above a
> >>> >> > > >     threshold (50%) we compare the sum of all queue sizes with
> >>> >> > > > the
> >>> >> > amount
> >>> >> > > > of allocated direct memory.
> >>> >> > > >     If the ratio falls below a certain threshold we trigger a
> >>> >> > compaction
> >>> >> > > > task which goes through all queues
> >>> >> > > >     and copy's a certain amount of old message buffers into
> new
> >>> >> > > > ones
> >>> >> > > > thereby freeing the old buffers so
> >>> >> > > >     that they can be returned to the buffer pool and be
> reused.
> >>> >> > > >
> >>> >> > > >  2) Currently we trigger flow to disk based on an estimate of
> how
> >>> >> much
> >>> >> > > > memory the messages on the
> >>> >> > > >     queues consume. We had to use estimates because we did not
> >>> >> > > > have
> >>> >> > > > accurate size numbers for
> >>> >> > > >     message headers. By having accurate size information for
> >>> >> > > > message
> >>> >> > > > headers we can more reliably
> >>> >> > > >     enforce queue memory limits.
> >>> >> > > >
> >>> >> > > >  3) The flow to disk trigger based on message size had another
> >>> >> problem
> >>> >> > > > which is more pertinent to the
> >>> >> > > >     current issue. We only considered the size of the messages
> >>> >> > > > and
> >>> >> not
> >>> >> > > how
> >>> >> > > > much memory we allocate
> >>> >> > > >     to store those messages. In the FIFO use case those
> numbers
> >>> >> > > > will
> >>> >> be
> >>> >> > > > very close to each other but in
> >>> >> > > >     use cases like yours we can end up with sparse buffers and
> >>> >> > > > the
> >>> >> > > numbers
> >>> >> > > > will diverge. Because of this
> >>> >> > > >     divergence we do not trigger flow to disk in time and the
> >>> >> > > > broker
> >>> >> > can
> >>> >> > > go
> >>> >> > > > OOM.
> >>> >> > > >     To fix the issue we want to add an additional flow to disk
> >>> >> trigger
> >>> >> > > > based on the amount of allocated direct
> >>> >> > > >     memory. This should prevent the broker from going OOM
> even if
> >>> >> > > > the
> >>> >> > > > compaction strategy outlined above
> >>> >> > > >     should fail for some reason (e.g., the compaction task
> cannot
> >>> >> keep
> >>> >> > up
> >>> >> > > > with the arrival of new messages).
> >>> >> > > >
> >>> >> > > > Currently, there are patches for the above points but they
> suffer
> >>> >> from
> >>> >> > > some
> >>> >> > > > thread-safety issues that need to be addressed.
> >>> >> > > >
> >>> >> > > > I hope this description helps. Any feedback is, as always,
> >>> >> > > > welcome.
> >>> >> > > >
> >>> >> > > > Kind regards,
> >>> >> > > > Lorenz
> >>> >> > > >
> >>> >> > > >
> >>> >> > > >
> >>> >> > > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
> >>> >> > > ramayan.tiwari@gmail.com
> >>> >> > > > >
> >>> >> > > > wrote:
> >>> >> > > >
> >>> >> > > > > Hi Lorenz,
> >>> >> > > > >
> >>> >> > > > > Thanks so much for the patch. We have a perf test now to
> >>> >> > > > > reproduce
> >>> >> > this
> >>> >> > > > > issue, so we did test with 256KB, 64KB and 4KB network byte
> >>> >> > > > > buffer.
> >>> >> > > None
> >>> >> > > > of
> >>> >> > > > > these configurations help with the issue (or give any more
> >>> >> breathing
> >>> >> > > > room)
> >>> >> > > > > for our use case. We would like to share the perf analysis
> with
> >>> >> > > > > the
> >>> >> > > > > community:
> >>> >> > > > >
> >>> >> > > > > https://docs.google.com/document/d/1Wc1e-id-
> >>> >> > > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> >>> >> > > > > U-RiM/edit?usp=sharing
> >>> >> > > > >
> >>> >> > > > > Feel free to comment on the doc if certain details are
> >>> >> > > > > incorrect or
> >>> >> > if
> >>> >> > > > > there are questions.
> >>> >> > > > >
> >>> >> > > > > Since the short term solution doesn't help us, we are very
> >>> >> interested
> >>> >> > > in
> >>> >> > > > > getting some details on how the community plans to address
> >>> >> > > > > this, a
> >>> >> > high
> >>> >> > > > > level description of the approach will be very helpful for
> us
> >>> >> > > > > in
> >>> >> > order
> >>> >> > > to
> >>> >> > > > > brainstorm our use cases along with this solution.
> >>> >> > > > >
> >>> >> > > > > - Ramayan
> >>> >> > > > >
> >>> >> > > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
> >>> >> > quack.lorenz@gmail.com>
> >>> >> > > > > wrote:
> >>> >> > > > >
> >>> >> > > > > > Hello Ramayan,
> >>> >> > > > > >
> >>> >> > > > > > We are still working on a fix for this issue.
> >>> >> > > > > > In the mean time we had an idea to potentially workaround
> the
> >>> >> issue
> >>> >> > > > until
> >>> >> > > > > > a proper fix is released.
> >>> >> > > > > >
> >>> >> > > > > > The idea is to decrease the qpid network buffer size the
> >>> >> > > > > > broker
> >>> >> > uses.
> >>> >> > > > > > While this still allows for sparsely populated buffers it
> >>> >> > > > > > would
> >>> >> > > improve
> >>> >> > > > > > the overall occupancy ratio.
> >>> >> > > > > >
> >>> >> > > > > > Here are the steps to follow:
> >>> >> > > > > >  * ensure you are not using TLS
> >>> >> > > > > >  * apply the attached patch
> >>> >> > > > > >  * figure out the size of the largest messages you are
> >>> >> > > > > > sending
> >>> >> > > > (including
> >>> >> > > > > > header and some overhead)
> >>> >> > > > > >  * set the context variable "qpid.broker.networkBufferSize
> "
> >>> >> > > > > > to
> >>> >> > that
> >>> >> > > > > value
> >>> >> > > > > > but not smaller than 4096
> >>> >> > > > > >  * test
> >>> >> > > > > >
> >>> >> > > > > > Decreasing the qpid network buffer size automatically
> limits
> >>> >> > > > > > the
> >>> >> > > > maximum
> >>> >> > > > > > AMQP frame size.
> >>> >> > > > > > Since you are using a very old client we are not sure how
> >>> >> > > > > > well it
> >>> >> > > copes
> >>> >> > > > > > with small frame sizes where it has to split a message
> across
> >>> >> > > multiple
> >>> >> > > > > > frames.
> >>> >> > > > > > Therefore, to play it safe you should not set it smaller
> than
> >>> >> > > > > > the
> >>> >> > > > largest
> >>> >> > > > > > messages (+ header + overhead) you are sending.
> >>> >> > > > > > I do not know what message sizes you are sending but AMQP
> >>> >> > > > > > imposes
> >>> >> > the
> >>> >> > > > > > restriction that the framesize cannot be smaller than 4096
> >>> >> > > > > > bytes.
> >>> >> > > > > > In the qpid broker the default currently is 256 kB.
> >>> >> > > > > >
> >>> >> > > > > > In the current state the broker does not allow setting the
> >>> >> network
> >>> >> > > > buffer
> >>> >> > > > > > to values smaller than 64 kB to allow TLS frames to fit
> into
> >>> >> > > > > > one
> >>> >> > > > network
> >>> >> > > > > > buffer.
> >>> >> > > > > > I attached a patch to this mail that lowers that
> restriction
> >>> >> > > > > > to
> >>> >> the
> >>> >> > > > limit
> >>> >> > > > > > imposed by AMQP (4096 Bytes).
> >>> >> > > > > > Obviously, you should not use this when using TLS.
> >>> >> > > > > >
> >>> >> > > > > >
> >>> >> > > > > > I hope this reduces the problems you are currently facing
> >>> >> > > > > > until
> >>> >> we
> >>> >> > > can
> >>> >> > > > > > complete the proper fix.
> >>> >> > > > > >
> >>> >> > > > > > Kind regards,
> >>> >> > > > > > Lorenz
> >>> >> > > > > >
> >>> >> > > > > >
> >>> >> > > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> >>> >> > > > > > > Thanks so much Keith and the team for finding the root
> >>> >> > > > > > > cause.
> >>> >> We
> >>> >> > > are
> >>> >> > > > so
> >>> >> > > > > > > relieved that we fix the root cause shortly.
> >>> >> > > > > > >
> >>> >> > > > > > > Couple of things that I forgot to mention on the
> mitigation
> >>> >> steps
> >>> >> > > we
> >>> >> > > > > took
> >>> >> > > > > > > in the last incident:
> >>> >> > > > > > > 1) We triggered GC from JMX bean multiple times, it did
> not
> >>> >> help
> >>> >> > in
> >>> >> > > > > > > reducing DM allocated.
> >>> >> > > > > > > 2) We also killed all the AMQP connections to the broker
> >>> >> > > > > > > when
> >>> >> DM
> >>> >> > > was
> >>> >> > > > at
> >>> >> > > > > > > 80%. This did not help either. The way we killed
> >>> >> > > > > > > connections -
> >>> >> > > using
> >>> >> > > > > JMX
> >>> >> > > > > > > got list of all the open AMQP connections and called
> close
> >>> >> > > > > > > from
> >>> >> > JMX
> >>> >> > > > > > mbean.
> >>> >> > > > > > >
> >>> >> > > > > > > I am hoping the above two are not related to root cause,
> >>> >> > > > > > > but
> >>> >> > wanted
> >>> >> > > > to
> >>> >> > > > > > > bring it up in case this is relevant.
> >>> >> > > > > > >
> >>> >> > > > > > > Thanks
> >>> >> > > > > > > Ramayan
> >>> >> > > > > > >
> >>> >> > > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W
> >>> >> > > > > > > <keith.wall@gmail.com
> >>> >> >
> >>> >> > > > wrote:
> >>> >> > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > > Hello Ramayan
> >>> >> > > > > > > >
> >>> >> > > > > > > > I believe I understand the root cause of the
> problem.  We
> >>> >> have
> >>> >> > > > > > > > identified a flaw in the direct memory buffer
> management
> >>> >> > employed
> >>> >> > > > by
> >>> >> > > > > > > > Qpid Broker J which for some messaging use-cases can
> lead
> >>> >> > > > > > > > to
> >>> >> > the
> >>> >> > > > OOM
> >>> >> > > > > > > > direct you describe.   For the issue to manifest the
> >>> >> producing
> >>> >> > > > > > > > application needs to use a single connection for the
> >>> >> production
> >>> >> > > of
> >>> >> > > > > > > > messages some of which are short-lived (i.e. are
> consumed
> >>> >> > > quickly)
> >>> >> > > > > > > > whilst others remain on the queue for some time.
> >>> >> > > > > > > > Priority
> >>> >> > > queues,
> >>> >> > > > > > > > sorted queues and consumers utilising selectors that
> >>> >> > > > > > > > result
> >>> >> in
> >>> >> > > some
> >>> >> > > > > > > > messages being left of the queue could all produce
> this
> >>> >> patten.
> >>> >> > > > The
> >>> >> > > > > > > > pattern leads to a sparsely occupied 256K net buffers
> >>> >> > > > > > > > which
> >>> >> > > cannot
> >>> >> > > > be
> >>> >> > > > > > > > released or reused until every message that reference
> a
> >>> >> 'chunk'
> >>> >> > > of
> >>> >> > > > it
> >>> >> > > > > > > > is either consumed or flown to disk.   The problem was
> >>> >> > introduced
> >>> >> > > > > with
> >>> >> > > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> >>> >> > > > > > > >
> >>> >> > > > > > > > The flow to disk feature is not helping us here
> because
> >>> >> > > > > > > > its
> >>> >> > > > algorithm
> >>> >> > > > > > > > considers only the size of live messages on the
> queues.
> >>> >> > > > > > > > If
> >>> >> the
> >>> >> > > > > > > > accumulative live size does not exceed the threshold,
> the
> >>> >> > > messages
> >>> >> > > > > > > > aren't flown to disk. I speculate that when you
> observed
> >>> >> > > > > > > > that
> >>> >> > > > moving
> >>> >> > > > > > > > messages cause direct message usage to drop earlier
> >>> >> > > > > > > > today,
> >>> >> your
> >>> >> > > > > > > > message movement cause a queue to go over threshold,
> >>> >> > > > > > > > cause
> >>> >> > > message
> >>> >> > > > to
> >>> >> > > > > > > > be flown to disk and their direct memory references
> >>> >> > > > > > > > released.
> >>> >> > > The
> >>> >> > > > > > > > logs will confirm this is so.
> >>> >> > > > > > > >
> >>> >> > > > > > > > I have not identified an easy workaround at the
> moment.
> >>> >> > > >  Decreasing
> >>> >> > > > > > > > the flow to disk threshold and/or increasing available
> >>> >> > > > > > > > direct
> >>> >> > > > memory
> >>> >> > > > > > > > should alleviate and may be an acceptable short term
> >>> >> > workaround.
> >>> >> > > > If
> >>> >> > > > > > > > it were possible for publishing application to publish
> >>> >> > > > > > > > short
> >>> >> > > lived
> >>> >> > > > > and
> >>> >> > > > > > > > long lived messages on two separate JMS connections
> this
> >>> >> would
> >>> >> > > > avoid
> >>> >> > > > > > > > this defect.
> >>> >> > > > > > > >
> >>> >> > > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related
> >>> >> > > > > > > > this
> >>> >> > > > problem.
> >>> >> > > > > > > > We intend to be working on these early next week and
> will
> >>> >> > > > > > > > be
> >>> >> > > aiming
> >>> >> > > > > > > > for a fix that is back-portable to 6.0.
> >>> >> > > > > > > >
> >>> >> > > > > > > > Apologies that you have run into this defect and
> thanks
> >>> >> > > > > > > > for
> >>> >> > > > > reporting.
> >>> >> > > > > > > >
> >>> >> > > > > > > > Thanks, Keith
> >>> >> > > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > >
> >>> >> > > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> >>> >> > > > ramayan.tiwari@gmail.com>
> >>> >> > > > > > > > wrote:
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > Hi All,
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > We have been monitoring the brokers everyday and
> today
> >>> >> > > > > > > > > we
> >>> >> > found
> >>> >> > > > one
> >>> >> > > > > > > > instance
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > where broker’s DM was constantly going up and was
> about
> >>> >> > > > > > > > > to
> >>> >> > > crash,
> >>> >> > > > > so
> >>> >> > > > > > we
> >>> >> > > > > > > > > experimented some mitigations, one of which caused
> the
> >>> >> > > > > > > > > DM
> >>> >> to
> >>> >> > > come
> >>> >> > > > > > down.
> >>> >> > > > > > > > > Following are the details, which might help us
> >>> >> understanding
> >>> >> > > the
> >>> >> > > > > > issue:
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > Traffic scenario:
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > DM allocation had been constantly going up and was
> at
> >>> >> > > > > > > > > 90%.
> >>> >> > > There
> >>> >> > > > > > were two
> >>> >> > > > > > > > > queues which seemed to align with the theories that
> we
> >>> >> > > > > > > > > had.
> >>> >> > > Q1’s
> >>> >> > > > > > size had
> >>> >> > > > > > > > > been large right after the broker start and had slow
> >>> >> > > consumption
> >>> >> > > > of
> >>> >> > > > > > > > > messages, queue size only reduced from 76MB to 75MB
> >>> >> > > > > > > > > over a
> >>> >> > > period
> >>> >> > > > > of
> >>> >> > > > > > > > 6hrs.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > Q2 on the other hand, started small and was
> gradually
> >>> >> > growing,
> >>> >> > > > > queue
> >>> >> > > > > > size
> >>> >> > > > > > > > > went from 7MB to 10MB in 6hrs. There were other
> queues
> >>> >> > > > > > > > > with
> >>> >> > > > traffic
> >>> >> > > > > > > > during
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > this time.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > Action taken:
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > Moved all the messages from Q2 (since this was our
> >>> >> > > > > > > > > original
> >>> >> > > > theory)
> >>> >> > > > > > to Q3
> >>> >> > > > > > > > > (already created but no messages in it). This did
> not
> >>> >> > > > > > > > > help
> >>> >> > with
> >>> >> > > > the
> >>> >> > > > > > DM
> >>> >> > > > > > > > > growing up.
> >>> >> > > > > > > > > Moved all the messages from Q1 to Q4 (already
> created
> >>> >> > > > > > > > > but
> >>> >> no
> >>> >> > > > > > messages in
> >>> >> > > > > > > > > it). This reduced DM allocation from 93% to 31%.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > We have the heap dump and thread dump from when
> broker
> >>> >> > > > > > > > > was
> >>> >> > 90%
> >>> >> > > in
> >>> >> > > > > DM
> >>> >> > > > > > > > > allocation. We are going to analyze that to see if
> we
> >>> >> > > > > > > > > can
> >>> >> get
> >>> >> > > > some
> >>> >> > > > > > clue.
> >>> >> > > > > > > > We
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > wanted to share this new information which might
> help
> >>> >> > > > > > > > > in
> >>> >> > > > reasoning
> >>> >> > > > > > about
> >>> >> > > > > > > > the
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > memory issue.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > - Ramayan
> >>> >> > > > > > > > >
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> >>> >> > > > > > > > ramayan.tiwari@gmail.com>
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > wrote:
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > Hi Keith,
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > Thanks so much for your response and digging into
> the
> >>> >> > issue.
> >>> >> > > > > Below
> >>> >> > > > > > are
> >>> >> > > > > > > > the
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > answer to your questions:
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We
> >>> >> > > > > > > > > > couldn't
> >>> >> use
> >>> >> > > 6.1
> >>> >> > > > > > where it
> >>> >> > > > > > > > > > was released because we need JMX support. Here is
> the
> >>> >> > > > destination
> >>> >> > > > > > > > format:
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > ""%s ; {node : { type : queue }, link : {
> >>> >> > > > > > > > > > x-subscribes :
> >>> >> {
> >>> >> > > > > > arguments : {
> >>> >> > > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > 2) Our machines have 40 cores, which will make the
> >>> >> > > > > > > > > > number
> >>> >> > of
> >>> >> > > > > > threads to
> >>> >> > > > > > > > > > 80. This might not be an issue, because this will
> >>> >> > > > > > > > > > show up
> >>> >> > in
> >>> >> > > > the
> >>> >> > > > > > > > baseline DM
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > allocated, which is only 6% (of 4GB) when we just
> >>> >> > > > > > > > > > bring
> >>> >> up
> >>> >> > > the
> >>> >> > > > > > broker.
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > 3) The only setting that we tuned WRT to DM is
> >>> >> > > > > flowToDiskThreshold,
> >>> >> > > > > > > > which
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > is set at 80% now.
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > 4) Only one virtual host in the broker.
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > 5) Most of our queues (99%) are priority, we also
> >>> >> > > > > > > > > > have
> >>> >> 8-10
> >>> >> > > > > sorted
> >>> >> > > > > > > > queues.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > 6) Yeah we are using the standard 0.16 client and
> not
> >>> >> AMQP
> >>> >> > > 1.0
> >>> >> > > > > > clients.
> >>> >> > > > > > > > > > The connection log line looks like:
> >>> >> > > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) :
> >>> >> > > > > > > > > > Protocol
> >>> >> > > > Version
> >>> >> > > > > :
> >>> >> > > > > > 0-10
> >>> >> > > > > > > > :
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > Client ID : test : Client Version : 0.16 : Client
> >>> >> Product :
> >>> >> > > > qpid
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > We had another broker crashed about an hour back,
> we
> >>> >> > > > > > > > > > do
> >>> >> see
> >>> >> > > the
> >>> >> > > > > > same
> >>> >> > > > > > > > > > patterns:
> >>> >> > > > > > > > > > 1) There is a queue which is constantly growing,
> >>> >> > > > > > > > > > enqueue
> >>> >> is
> >>> >> > > > > faster
> >>> >> > > > > > than
> >>> >> > > > > > > > > > dequeue on that queue for a long period of time.
> >>> >> > > > > > > > > > 2) Flow to disk didn't kick in at all.
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > This graph shows memory growth (red line - heap,
> blue
> >>> >> > > > > > > > > > -
> >>> >> DM
> >>> >> > > > > > allocated,
> >>> >> > > > > > > > > > yellow - DM used)
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > https://drive.google.com/file/d/
> >>> >> > > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> >>> >> > > > > > > > view?usp=sharing
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > The below graph shows growth on a single queue
> (there
> >>> >> > > > > > > > > > are
> >>> >> > > 10-12
> >>> >> > > > > > other
> >>> >> > > > > > > > > > queues with traffic as well, something large size
> >>> >> > > > > > > > > > than
> >>> >> this
> >>> >> > > > > queue):
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > https://drive.google.com/file/d/
> >>> >> > > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> >>> >> > > > > > > > view?usp=sharing
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > Couple of questions:
> >>> >> > > > > > > > > > 1) Is there any developer level doc/design spec on
> >>> >> > > > > > > > > > how
> >>> >> Qpid
> >>> >> > > > uses
> >>> >> > > > > > DM?
> >>> >> > > > > > > > > > 2) We are not getting heap dumps automatically
> when
> >>> >> broker
> >>> >> > > > > crashes
> >>> >> > > > > > due
> >>> >> > > > > > > > to
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has
> >>> >> > > > > > > > > > anyone
> >>> >> > > > found a
> >>> >> > > > > > way
> >>> >> > > > > > > > to get
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > around this problem?
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > Thanks
> >>> >> > > > > > > > > > Ramayan
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
> >>> >> > > keith.wall@gmail.com
> >>> >> > > > >
> >>> >> > > > > > wrote:
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > Hi Ramayan
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > We have been discussing your problem here and
> have
> >>> >> > > > > > > > > > > a
> >>> >> > couple
> >>> >> > > > of
> >>> >> > > > > > > > questions.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > I have been experimenting with use-cases based
> on
> >>> >> > > > > > > > > > > your
> >>> >> > > > > > descriptions
> >>> >> > > > > > > > > > > above, but so far, have been unsuccessful in
> >>> >> reproducing
> >>> >> > a
> >>> >> > > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer
> memory"
> >>> >> > > > condition.
> >>> >> > > > > > The
> >>> >> > > > > > > > > > > direct memory usage reflects the expected
> model: it
> >>> >> > levels
> >>> >> > > > off
> >>> >> > > > > > when
> >>> >> > > > > > > > > > > the flow to disk threshold is reached and direct
> >>> >> > > > > > > > > > > memory
> >>> >> > is
> >>> >> > > > > > release as
> >>> >> > > > > > > > > > > messages are consumed until the minimum size for
> >>> >> caching
> >>> >> > of
> >>> >> > > > > > direct is
> >>> >> > > > > > > > > > > reached.
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > 1] For clarity let me check: we believe when you
> >>> >> > > > > > > > > > > say
> >>> >> > "patch
> >>> >> > > > to
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > MultiQueueConsumer" you are referring to the
> patch
> >>> >> > attached
> >>> >> > > > to
> >>> >> > > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to
> the
> >>> >> > broker"
> >>> >> > > > > and
> >>> >> > > > > > you
> >>> >> > > > > > > > > > > are using a combination of this "x-pull-only"
> with
> >>> >> > > > > > > > > > > the
> >>> >> > > > > standard
> >>> >> > > > > > > > > > > "x-multiqueue" feature.  Is this correct?
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > 2] One idea we had here relates to the size of
> the
> >>> >> > > > virtualhost
> >>> >> > > > > IO
> >>> >> > > > > > > > > > > pool.   As you know from the documentation, the
> >>> >> > > > > > > > > > > Broker
> >>> >> > > > > > caches/reuses
> >>> >> > > > > > > > > > > direct memory internally but the documentation
> >>> >> > > > > > > > > > > fails to
> >>> >> > > > > mentions
> >>> >> > > > > > that
> >>> >> > > > > > > > > > > each pooled virtualhost IO thread also grabs a
> >>> >> > > > > > > > > > > chunk
> >>> >> > (256K)
> >>> >> > > > of
> >>> >> > > > > > direct
> >>> >> > > > > > > > > > > memory from this cache.  By default the virtual
> >>> >> > > > > > > > > > > host IO
> >>> >> > > pool
> >>> >> > > > is
> >>> >> > > > > > sized
> >>> >> > > > > > > > > > > Math.max(Runtime.getRuntime().
> availableProcessors()
> >>> >> > > > > > > > > > > *
> >>> >> 2,
> >>> >> > > > 64),
> >>> >> > > > > > so if
> >>> >> > > > > > > > > > > you have a machine with a very large number of
> >>> >> > > > > > > > > > > cores,
> >>> >> you
> >>> >> > > may
> >>> >> > > > > > have a
> >>> >> > > > > > > > > > > surprising large amount of direct memory
> assigned
> >>> >> > > > > > > > > > > to
> >>> >> > > > > virtualhost
> >>> >> > > > > > IO
> >>> >> > > > > > > > > > > threads.   Check the value of
> >>> >> > > > > > > > > > > connectionThreadPoolSize
> >>> >> on
> >>> >> > > the
> >>> >> > > > > > > > > > > virtualhost
> >>> >> > > > > > > > > > > (http://<server>:<port>/api/la
> test/virtualhost/<
> >>> >> > > > > > virtualhostnodename>/<;
> >>> >> > > > > > > > virtualhostname>)
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > to see what value is in force.  What is it?  It
> is
> >>> >> > possible
> >>> >> > > > to
> >>> >> > > > > > tune
> >>> >> > > > > > > > > > > the pool size using context variable
> >>> >> > > > > > > > > > > virtualhost.connectionThreadPool.size.
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > 3] Tell me if you are tuning the Broker in way
> >>> >> > > > > > > > > > > beyond
> >>> >> the
> >>> >> > > > > > direct/heap
> >>> >> > > > > > > > > > > memory settings you have told us about already.
> >>> >> > > > > > > > > > > For
> >>> >> > > instance
> >>> >> > > > > > you are
> >>> >> > > > > > > > > > > changing any of the direct memory pooling
> settings
> >>> >> > > > > > > > > > > broker.directByteBufferPoolSize, default
> network
> >>> >> buffer
> >>> >> > > size
> >>> >> > > > > > > > > > > qpid.broker.networkBufferSize or applying any
> other
> >>> >> > > > > non-standard
> >>> >> > > > > > > > > > > settings?
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > 4] How many virtual hosts do you have on the
> >>> >> > > > > > > > > > > Broker?
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > 5] What is the consumption pattern of the
> messages?
> >>> >> > > > > > > > > > > Do
> >>> >> > > > consume
> >>> >> > > > > > in a
> >>> >> > > > > > > > > > > strictly FIFO fashion or are you making use of
> >>> >> > > > > > > > > > > message
> >>> >> > > > > selectors
> >>> >> > > > > > > > > > > or/and any of the out-of-order queue types
> (LVQs,
> >>> >> > priority
> >>> >> > > > > queue
> >>> >> > > > > > or
> >>> >> > > > > > > > > > > sorted queues)?
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > 6] Is it just the 0.16 client involved in the
> >>> >> > application?
> >>> >> > > > >  Can
> >>> >> > > > > > I
> >>> >> > > > > > > > > > > check that you are not using any of the AMQP 1.0
> >>> >> clients
> >>> >> > > > > > > > > > > (org,apache.qpid:qpid-jms-client or
> >>> >> > > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the
> >>> >> > > > > > > > > > > software
> >>> >> > > stack
> >>> >> > > > > (as
> >>> >> > > > > > either
> >>> >> > > > > > > > > > > consumers or producers)
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > Hopefully the answers to these questions will
> get
> >>> >> > > > > > > > > > > us
> >>> >> > closer
> >>> >> > > > to
> >>> >> > > > > a
> >>> >> > > > > > > > > > > reproduction.   If you are able to reliable
> >>> >> > > > > > > > > > > reproduce
> >>> >> it,
> >>> >> > > > > please
> >>> >> > > > > > share
> >>> >> > > > > > > > > > > the steps with us.
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > Kind regards, Keith.
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> >>> >> > > > > > ramayan.tiwari@gmail.com>
> >>> >> > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > After a lot of log mining, we might have a
> way to
> >>> >> > explain
> >>> >> > > > the
> >>> >> > > > > > > > sustained
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > increased in DirectMemory allocation, the
> >>> >> > > > > > > > > > > > correlation
> >>> >> > > seems
> >>> >> > > > > to
> >>> >> > > > > > be
> >>> >> > > > > > > > with
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > the
> >>> >> > > > > > > > > > > > growth in the size of a Queue that is getting
> >>> >> consumed
> >>> >> > > but
> >>> >> > > > at
> >>> >> > > > > > a much
> >>> >> > > > > > > > > > > > slower
> >>> >> > > > > > > > > > > > rate than producers putting messages on this
> >>> >> > > > > > > > > > > > queue.
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > The pattern we see is that in each instance of
> >>> >> > > > > > > > > > > > broker
> >>> >> > > > crash,
> >>> >> > > > > > there is
> >>> >> > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > least one queue (usually 1 queue) whose size
> kept
> >>> >> > growing
> >>> >> > > > > > steadily.
> >>> >> > > > > > > > > > > > It’d be
> >>> >> > > > > > > > > > > > of significant size but not the largest queue
> --
> >>> >> > usually
> >>> >> > > > > there
> >>> >> > > > > > are
> >>> >> > > > > > > > > > > > multiple
> >>> >> > > > > > > > > > > > larger queues -- but it was different from
> other
> >>> >> queues
> >>> >> > > in
> >>> >> > > > > > that its
> >>> >> > > > > > > > > > > > size
> >>> >> > > > > > > > > > > > was growing steadily. The queue would also be
> >>> >> > > > > > > > > > > > moving,
> >>> >> > but
> >>> >> > > > its
> >>> >> > > > > > > > > > > > processing
> >>> >> > > > > > > > > > > > rate was not keeping up with the enqueue rate.
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > Our theory that might be totally wrong: If a
> >>> >> > > > > > > > > > > > queue is
> >>> >> > > > moving
> >>> >> > > > > > the
> >>> >> > > > > > > > entire
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > time, maybe then the broker would keep reusing
> >>> >> > > > > > > > > > > > the
> >>> >> same
> >>> >> > > > > buffer
> >>> >> > > > > > in
> >>> >> > > > > > > > > > > > direct
> >>> >> > > > > > > > > > > > memory for the queue, and keep on adding onto
> it
> >>> >> > > > > > > > > > > > at
> >>> >> the
> >>> >> > > end
> >>> >> > > > > to
> >>> >> > > > > > > > > > > > accommodate
> >>> >> > > > > > > > > > > > new messages. But because it’s active all the
> >>> >> > > > > > > > > > > > time
> >>> >> and
> >>> >> > > > we’re
> >>> >> > > > > > pointing
> >>> >> > > > > > > > > > > > to
> >>> >> > > > > > > > > > > > the same buffer, space allocated for messages
> at
> >>> >> > > > > > > > > > > > the
> >>> >> > head
> >>> >> > > > of
> >>> >> > > > > > the
> >>> >> > > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long
> >>> >> > > > > > > > > > > > after
> >>> >> > those
> >>> >> > > > > > messages
> >>> >> > > > > > > > have
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > been processed. Just a theory.
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > We are also trying to reproduce this using
> some
> >>> >> > > > > > > > > > > > perf
> >>> >> > > tests
> >>> >> > > > to
> >>> >> > > > > > enqueue
> >>> >> > > > > > > > > > > > with
> >>> >> > > > > > > > > > > > same pattern, will update with the findings.
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > Thanks
> >>> >> > > > > > > > > > > > Ramayan
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan
> Tiwari
> >>> >> > > > > > > > > > > > <ra...@gmail.com>
> >>> >> > > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > Another issue that we noticed is when broker
> >>> >> > > > > > > > > > > > > goes
> >>> >> OOM
> >>> >> > > due
> >>> >> > > > > to
> >>> >> > > > > > direct
> >>> >> > > > > > > > > > > > > memory, it doesn't create heap dump
> (specified
> >>> >> > > > > > > > > > > > > by
> >>> >> > > "-XX:+
> >>> >> > > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the
> OOM
> >>> >> error
> >>> >> > > is
> >>> >> > > > > > same as
> >>> >> > > > > > > > what
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > is
> >>> >> > > > > > > > > > > > > mentioned in the oracle JVM docs
> >>> >> > > > > > ("java.lang.OutOfMemoryError").
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > Has anyone been able to find a way to get to
> >>> >> > > > > > > > > > > > > heap
> >>> >> > dump
> >>> >> > > > for
> >>> >> > > > > > DM OOM?
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > - Ramayan
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan
> >>> >> > > > > > > > > > > > > Tiwari
> >>> >> > > > > > > > > > > > > <ramayan.tiwari@gmail.com
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > Alex,
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > Below are the flow to disk logs from
> broker
> >>> >> having
> >>> >> > > > > > 3million+
> >>> >> > > > > > > > messages
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > this time. We only have one virtual host.
> >>> >> > > > > > > > > > > > > > Time is
> >>> >> > in
> >>> >> > > > GMT.
> >>> >> > > > > > Looks
> >>> >> > > > > > > > like
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > flow
> >>> >> > > > > > > > > > > > > > to disk is active on the whole virtual
> host
> >>> >> > > > > > > > > > > > > > and
> >>> >> > not a
> >>> >> > > > > > queue level.
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > When the same broker went OOM yesterday, I
> >>> >> > > > > > > > > > > > > > did
> >>> >> not
> >>> >> > > see
> >>> >> > > > > any
> >>> >> > > > > > flow to
> >>> >> > > > > > > > > > > > > > disk
> >>> >> > > > > > > > > > > > > > logs from when it was started until it
> >>> >> > > > > > > > > > > > > > crashed
> >>> >> > > (crashed
> >>> >> > > > > > twice
> >>> >> > > > > > > > within
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > 4hrs).
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3356539KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >>> >> Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3354866KB
> >>> >> > > > > > > > > > > > > > within threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3358509KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >>> >> Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3353501KB
> >>> >> > > > > > > > > > > > > > within threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3357544KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >>> >> Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3353236KB
> >>> >> > > > > > > > > > > > > > within threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3356704KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >>> >> Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3353511KB
> >>> >> > > > > > > > > > > > > > within threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3357948KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >>> >> Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3355310KB
> >>> >> > > > > > > > > > > > > > within threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3365624KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >>> >> Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3355136KB
> >>> >> > > > > > > > > > > > > > within threshold 3355443KB
> >>> >> > > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO
> >>> >> [Housekeeping[test]] -
> >>> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> > > > > > > > > > > > > > Message
> >>> >> > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > 3358683KB
> >>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > After production release (2days back), we
> >>> >> > > > > > > > > > > > > > have
> >>> >> > seen 4
> >>> >> > > > > > crashes in 3
> >>> >> > > > > > > > > > > > > > different brokers, this is the most
> pressing
> >>> >> > concern
> >>> >> > > > for
> >>> >> > > > > > us in
> >>> >> > > > > > > > > > > > > > decision if
> >>> >> > > > > > > > > > > > > > we should roll back to 0.32. Any help is
> >>> >> > > > > > > > > > > > > > greatly
> >>> >> > > > > > appreciated.
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > Thanks
> >>> >> > > > > > > > > > > > > > Ramayan
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr
> >>> >> > > > > > > > > > > > > > Rudyy
> >>> >> <
> >>> >> > > > > > orudyy@gmail.com
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > Ramayan,
> >>> >> > > > > > > > > > > > > > > Thanks for the details. I would like to
> >>> >> > > > > > > > > > > > > > > clarify
> >>> >> > > > whether
> >>> >> > > > > > flow to
> >>> >> > > > > > > > disk
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > was
> >>> >> > > > > > > > > > > > > > > triggered today for 3 million messages?
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > The following logs are issued for flow
> to
> >>> >> > > > > > > > > > > > > > > disk:
> >>> >> > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >>> >> Message
> >>> >> > > > > memory
> >>> >> > > > > > use
> >>> >> > > > > > > > > > > > > > > {0,number,#}KB
> >>> >> > > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> >>> >> > > > > > > > > > > > > > > BRK-1015 : Message flow to disk
> inactive :
> >>> >> > Message
> >>> >> > > > > > memory use
> >>> >> > > > > > > > > > > > > > > {0,number,#}KB within threshold
> >>> >> {1,number,#.##}KB
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > Kind Regards,
> >>> >> > > > > > > > > > > > > > > Alex
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan
> Tiwari <
> >>> >> > > > > > > > ramayan.tiwari@gmail.com>
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > Hi Alex,
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > Thanks for your response, here are the
> >>> >> details:
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > We use "direct" exchange, without
> >>> >> > > > > > > > > > > > > > > > persistence
> >>> >> > (we
> >>> >> > > > > > specify
> >>> >> > > > > > > > > > > > > > > NON_PERSISTENT
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > that while sending from client) and
> use
> >>> >> > > > > > > > > > > > > > > > BDB
> >>> >> > > store.
> >>> >> > > > We
> >>> >> > > > > > use JSON
> >>> >> > > > > > > > > > > > > > > > virtual
> >>> >> > > > > > > > > > > > > > > host
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > type. We are not using SSL.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > When the broker went OOM, we had
> around
> >>> >> > > > > > > > > > > > > > > > 1.3
> >>> >> > > million
> >>> >> > > > > > messages
> >>> >> > > > > > > > with
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > 100
> >>> >> > > > > > > > > > > > > > > bytes
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > average message size. Direct memory
> >>> >> allocation
> >>> >> > > > (value
> >>> >> > > > > > read from
> >>> >> > > > > > > > > > > > > > > > MBean)
> >>> >> > > > > > > > > > > > > > > kept
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > going up, even though it wouldn't need
> >>> >> > > > > > > > > > > > > > > > more
> >>> >> DM
> >>> >> > to
> >>> >> > > > > > store these
> >>> >> > > > > > > > many
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > messages. DM allocated persisted at
> 99%
> >>> >> > > > > > > > > > > > > > > > for
> >>> >> > > about 3
> >>> >> > > > > > and half
> >>> >> > > > > > > > hours
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > before
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > crashing.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > Today, on the same broker we have 3
> >>> >> > > > > > > > > > > > > > > > million
> >>> >> > > > messages
> >>> >> > > > > > (same
> >>> >> > > > > > > > message
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > size)
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > and DM allocated is only at 8%. This
> >>> >> > > > > > > > > > > > > > > > seems
> >>> >> like
> >>> >> > > > there
> >>> >> > > > > > is some
> >>> >> > > > > > > > > > > > > > > > issue
> >>> >> > > > > > > > > > > > > > > with
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > de-allocation or a leak.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > I have uploaded the memory utilization
> >>> >> > > > > > > > > > > > > > > > graph
> >>> >> > > here:
> >>> >> > > > > > > > > > > > > > > > https://drive.google.com/file/d/
> >>> >> > > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> >>> >> > > > > > > > > > > > > > > > view?usp=sharing
> >>> >> > > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is
> DM
> >>> >> > > > > > > > > > > > > > > > Used
> >>> >> > (sum
> >>> >> > > > of
> >>> >> > > > > > queue
> >>> >> > > > > > > > > > > > > > > > payload)
> >>> >> > > > > > > > > > > > > > > and Red
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > is heap usage.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > Thanks
> >>> >> > > > > > > > > > > > > > > > Ramayan
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM,
> >>> >> > > > > > > > > > > > > > > > Oleksandr
> >>> >> > Rudyy
> >>> >> > > > > > > > > > > > > > > > <or...@gmail.com>
> >>> >> > > > > > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > Hi Ramayan,
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > Could please share with us the
> details
> >>> >> > > > > > > > > > > > > > > > > of
> >>> >> > > > messaging
> >>> >> > > > > > use
> >>> >> > > > > > > > case(s)
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > which
> >>> >> > > > > > > > > > > > > > > > ended
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > up in OOM on broker side?
> >>> >> > > > > > > > > > > > > > > > > I would like to reproduce the issue
> on
> >>> >> > > > > > > > > > > > > > > > > my
> >>> >> > local
> >>> >> > > > > > broker in
> >>> >> > > > > > > > order
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > to
> >>> >> > > > > > > > > > > > > > > fix
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > it.
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > I would appreciate if you could
> provide
> >>> >> > > > > > > > > > > > > > > > > as
> >>> >> > much
> >>> >> > > > > > details as
> >>> >> > > > > > > > > > > > > > > > > possible,
> >>> >> > > > > > > > > > > > > > > > > including, messaging topology,
> message
> >>> >> > > > persistence
> >>> >> > > > > > type,
> >>> >> > > > > > > > message
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > sizes,volumes, etc.
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory
> >>> >> > > > > > > > > > > > > > > > > for
> >>> >> > > keeping
> >>> >> > > > > > message
> >>> >> > > > > > > > content
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > and
> >>> >> > > > > > > > > > > > > > > > > receiving/sending data. Each plain
> >>> >> connection
> >>> >> > > > > > utilizes 512K of
> >>> >> > > > > > > > > > > > > > > > > direct
> >>> >> > > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M
> of
> >>> >> direct
> >>> >> > > > > > memory. Your
> >>> >> > > > > > > > > > > > > > > > > memory
> >>> >> > > > > > > > > > > > > > > > settings
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > look Ok to me.
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > Kind Regards,
> >>> >> > > > > > > > > > > > > > > > > Alex
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan
> >>> >> > > > > > > > > > > > > > > > > Tiwari
> >>> >> > > > > > > > > > > > > > > > > <ra...@gmail.com>
> >>> >> > > > > > > > > > > > > > > > > wrote:
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > Hi All,
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > We are using Java broker 6.0.5,
> with
> >>> >> patch
> >>> >> > to
> >>> >> > > > use
> >>> >> > > > > > > > > > > > > > > MultiQueueConsumer
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > feature. We just finished
> deploying
> >>> >> > > > > > > > > > > > > > > > > > to
> >>> >> > > > production
> >>> >> > > > > > and saw
> >>> >> > > > > > > > > > > > > > > > > > couple of
> >>> >> > > > > > > > > > > > > > > > > > instances of broker OOM due to
> >>> >> > > > > > > > > > > > > > > > > > running
> >>> >> out
> >>> >> > of
> >>> >> > > > > > DirectMemory
> >>> >> > > > > > > > > > > > > > > > > > buffer
> >>> >> > > > > > > > > > > > > > > > > > (exceptions at the end of this
> >>> >> > > > > > > > > > > > > > > > > > email).
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > Here is our setup:
> >>> >> > > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct
> memory 4g
> >>> >> (this
> >>> >> > > is
> >>> >> > > > > > opposite of
> >>> >> > > > > > > > > > > > > > > > > > what the
> >>> >> > > > > > > > > > > > > > > > > > recommendation is, however, for
> our
> >>> >> > > > > > > > > > > > > > > > > > use
> >>> >> > cause
> >>> >> > > > > > message
> >>> >> > > > > > > > payload
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > is
> >>> >> > > > > > > > > > > > > > > really
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > small ~400bytes and is way less
> than
> >>> >> > > > > > > > > > > > > > > > > > the
> >>> >> > per
> >>> >> > > > > > message
> >>> >> > > > > > > > overhead
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > of
> >>> >> > > > > > > > > > > > > > > 1KB).
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > In
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > perf testing, we were able to put
> 2
> >>> >> million
> >>> >> > > > > > messages without
> >>> >> > > > > > > > > > > > > > > > > > any
> >>> >> > > > > > > > > > > > > > > > issues.
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
> >>> >> > > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions
> >>> >> > > > > > > > > > > > > > > > > > and
> >>> >> > there
> >>> >> > > is
> >>> >> > > > > > one multi
> >>> >> > > > > > > > > > > > > > > > > > queue
> >>> >> > > > > > > > > > > > > > > > consumer
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > attached to each session,
> listening
> >>> >> > > > > > > > > > > > > > > > > > to
> >>> >> > around
> >>> >> > > > > 1000
> >>> >> > > > > > queues.
> >>> >> > > > > > > > > > > > > > > > > > 4. We are still using 0.16 client
> (I
> >>> >> know).
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > With the above setup, the baseline
> >>> >> > > utilization
> >>> >> > > > > > (without any
> >>> >> > > > > > > > > > > > > > > messages)
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > for
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > direct memory was around 230mb
> (with
> >>> >> > > > > > > > > > > > > > > > > > 410
> >>> >> > > > > > connection each
> >>> >> > > > > > > > > > > > > > > > > > taking
> >>> >> > > > > > > > > > > > > > > 500KB).
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > Based on our understanding of
> broker
> >>> >> memory
> >>> >> > > > > > allocation,
> >>> >> > > > > > > > > > > > > > > > > > message
> >>> >> > > > > > > > > > > > > > > payload
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > should be the only thing adding to
> >>> >> > > > > > > > > > > > > > > > > > direct
> >>> >> > > > memory
> >>> >> > > > > > utilization
> >>> >> > > > > > > > > > > > > > > > > > (on
> >>> >> > > > > > > > > > > > > > > top of
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > baseline), however, we are
> >>> >> > > > > > > > > > > > > > > > > > experiencing
> >>> >> > > > something
> >>> >> > > > > > completely
> >>> >> > > > > > > > > > > > > > > different.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > In
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > our last broker crash, we see that
> >>> >> > > > > > > > > > > > > > > > > > broker
> >>> >> > is
> >>> >> > > > > > constantly
> >>> >> > > > > > > > > > > > > > > > > > running
> >>> >> > > > > > > > > > > > > > > with
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > 90%+
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > direct memory allocated, even when
> >>> >> message
> >>> >> > > > > payload
> >>> >> > > > > > sum from
> >>> >> > > > > > > > > > > > > > > > > > all the
> >>> >> > > > > > > > > > > > > > > > > queues
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > is only 6-8% (these % are against
> >>> >> available
> >>> >> > > DM
> >>> >> > > > of
> >>> >> > > > > > 4gb).
> >>> >> > > > > > > > During
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > these
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > high
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > DM usage period, heap usage was
> >>> >> > > > > > > > > > > > > > > > > > around
> >>> >> 60%
> >>> >> > > (of
> >>> >> > > > > > 12gb).
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > We would like some help in
> >>> >> > > > > > > > > > > > > > > > > > understanding
> >>> >> > what
> >>> >> > > > > > could be the
> >>> >> > > > > > > > > > > > > > > > > > reason
> >>> >> > > > > > > > > > > > > > > of
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > these
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > high DM allocations. Are there
> things
> >>> >> other
> >>> >> > > > than
> >>> >> > > > > > message
> >>> >> > > > > > > > > > > > > > > > > > payload
> >>> >> > > > > > > > > > > > > > > and
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > AMQP
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > connection, which use DM and
> could be
> >>> >> > > > > contributing
> >>> >> > > > > > to these
> >>> >> > > > > > > > > > > > > > > > > > high
> >>> >> > > > > > > > > > > > > > > usage?
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > Another thing where we are
> puzzled is
> >>> >> > > > > > > > > > > > > > > > > > the
> >>> >> > > > > > de-allocation of
> >>> >> > > > > > > > DM
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > byte
> >>> >> > > > > > > > > > > > > > > > > buffers.
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > From log mining of heap and DM
> >>> >> utilization,
> >>> >> > > > > > de-allocation of
> >>> >> > > > > > > > > > > > > > > > > > DM
> >>> >> > > > > > > > > > > > > > > doesn't
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > correlate with heap GC. If anyone
> has
> >>> >> seen
> >>> >> > > any
> >>> >> > > > > > documentation
> >>> >> > > > > > > > > > > > > > > related to
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > this, it would be very helpful if
> you
> >>> >> could
> >>> >> > > > share
> >>> >> > > > > > that.
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > Thanks
> >>> >> > > > > > > > > > > > > > > > > > Ramayan
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > *Exceptions*
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct
> >>> >> > > > > > > > > > > > > > > > > > buffer
> >>> >> > > > memory
> >>> >> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> >>> >> > > Bits.java:658)
> >>> >> > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> >>> >> > > > > > init>(DirectByteBuffer.java:
> >>> >> > > > > > > > 123)
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> >>> >> > > > > allocateDirect(ByteBuffer.
> >>> >> > > > > > java:311)
> >>> >> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> >>> >> > > > > > QpidByteBuffer.allocateDirect(
> >>> >> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingConnectionPlainD
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > elegate.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> >>> >> > > > > > > > NonBlockingConnectionPlainDele
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > gate.java:93)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingConnectionPlainDele
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > gate.processData(
> >>> >> > > > NonBlockingConnectionPlainDele
> >>> >> > > > > > > > gate.java:60)
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingConnection.doRead(
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingConnection.doWork(
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NetworkConnectionScheduler.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > processConnection(
> >>> >> > > NetworkConnectionScheduler.
> >>> >> > > > > > java:124)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >>> >> > > > transport.SelectorThread$
> >>> >> > > > > > > > ConnectionPr
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > ocessor.
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > processConnection(
> >>> >> SelectorThread.java:504)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >>> >> > > > transport.SelectorThread$
> >>> >> > > > > > > > > > > > > > > > > > SelectionTask.performSelect(
> >>> >> > > > > > SelectorThread.java:337)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >>> >> > > > transport.SelectorThread$
> >>> >> > > > > > > > SelectionTask.run(
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:87)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >>> >> > > > > > transport.SelectorThread.run(
> >>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >>> >> > > > > ThreadPoolExecutor.runWorker(
> >>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> >>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >>> >> > > > > > ThreadPoolExecutor$Worker.run(
> >>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> >>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > java.lang.Thread.run(Thread.ja
> va:745)
> >>> >> > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > *Second exception*
> >>> >> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct
> >>> >> > > > > > > > > > > > > > > > > > buffer
> >>> >> > > > memory
> >>> >> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> >>> >> > > Bits.java:658)
> >>> >> > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> >>> >> > > > > > init>(DirectByteBuffer.java:
> >>> >> > > > > > > > 123)
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> >>> >> > > > > allocateDirect(ByteBuffer.
> >>> >> > > > > > java:311)
> >>> >> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> >>> >> > > > > > QpidByteBuffer.allocateDirect(
> >>> >> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingConnectionPlainDele
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > gate.<init>(
> >>> >> NonBlockingConnectionPlainDele
> >>> >> > > > > > gate.java:45)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > NonBlockingConnection.
> >>> >> > > > > > > > > > > > > > > > > > setTransportEncryption(
> >>> >> > > > > NonBlockingConnection.java:
> >>> >> > > > > > 625)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingConnection.<init>(
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >>> >> > > > > > > > NonBlockingNetworkTransport.
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > acceptSocketChannel(
> >>> >> > > > NonBlockingNetworkTransport.
> >>> >> > > > > > java:158)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >>> >> > > > transport.SelectorThread$
> >>> >> > > > > > > > SelectionTas
> >>> >> > > > > > > > >
> >>> >> > > > > > > > > >
> >>> >> > > > > > > > > > >
> >>> >> > > > > > > > > > > >
> >>> >> > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > k$1.run(
> >>> >> > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:191)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >>> >> > > > > > transport.SelectorThread.run(
> >>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
> >>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6
> .0.5]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >>> >> > > > > ThreadPoolExecutor.runWorker(
> >>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> >>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >>> >> > > > > > ThreadPoolExecutor$Worker.run(
> >>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> >>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > > at
> >>> >> > > > > > > > > > > > > > > > > > java.lang.Thread.run(Thread.ja
> va:745)
> >>> >> > > > > > ~[na:1.8.0_40]
> >>> >> > > > > > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > > > >
> >>> >> > > > > > > > > > > ------------------------------
> >>> >> > > ------------------------------
> >>> >> > > > > > ---------
> >>> >> > > > > > > > > > > To unsubscribe, e-mail:
> >>> >> > > > > > > > > > > users-unsubscribe@qpid.apache.
> >>> >> > org
> >>> >> > > > > > > > > > > For additional commands, e-mail:
> >>> >> > > users-help@qpid.apache.org
> >>> >> > > > > > > > > > >
> >>> >> > > > > >
> >>> >> > > > > >
> >>> >> > > > > > ------------------------------
> ------------------------------
> >>> >> > > ---------
> >>> >> > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>> >> > > > > > For additional commands, e-mail:
> users-help@qpid.apache.org
> >>> >> > > > > >
> >>> >> > > > >
> >>> >> > > >
> >>> >> > >
> >>> >> >
> >>> >>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >>> For additional commands, e-mail: users-help@qpid.apache.org
> >>>
> >>
>

Re: Java broker OOM due to DirectMemory

Posted by Keith W <ke...@gmail.com>.
Hi Ramayan

Further changes have been made on the 6.0.x branch that prevent the
noisy flow to disk messages in the log.   The flow to disk on/off
messages we had before didn't really fit and this resulted in the logs
being spammed when either a queue was considered full or memory was
under pressure.  Now you'll see a VHT-1008 periodically written to the
logs when flow to disk has been working.  It reports the number of
bytes that have been evacuated from memory.

The actual mechanism used to prevent the OOM is the same as code you
tested on May 5th: the flow to disk mechanism is triggered when the
allocated direct memory size exceeds the flow to disk threshold.
This evacuates messages from memory, flowing them to disk if
necessary, until the allocated direct memory falls below the threshold
again.  In doing so, some buffers that were previously sparse will be
released.  Expect to see the direct memory graph level off, but it
won't necessarily fall. This is expected.   In other words, the direct
graph you shared on 5th does not surprise me.

We plan to put out a 6.0.7/6.1.3 RCs out in the next few days which
will include these changes.    The patch attached to QPID-7462 has
been updated it will apply.

In the longer term, for 7.0.0 we have already improved the Broker so
that it actively manage the sparsity of buffers without falling back
on flow to disk, Memory requirements for use cases such as yours
should be much more reasonable.

I know you currently have a dependency on the old JMX management
interface.   I'd suggest you look at eliminating the dependency soon,
so you are free to upgrade when the time comes.

Kind regards, Keith Wall.

On 24 May 2017 at 13:52, Keith W <ke...@gmail.com> wrote:
> Hi Ramayan
>
> Further changes have been made on the 6.0.x branch that prevent the
> noisy flow to disk messages in the log.   The flow to disk on/off
> messages we had before didn't really fit and this resulted in the logs
> being spammed when either a queue was considered full or memory was
> under pressure.  Now you'll see a VHT-1008 periodically written to the
> logs when flow to disk has been working.  It reports the number of
> bytes that have been evacuated from memory.
>
> The actual mechanism used to prevent the OOM is the same as code you
> tested on May 5th: the flow to disk mechanism is triggered when the
> allocated direct memory size exceeds the flow to disk threshold.
> This evacuates messages from memory, flowing them to disk if
> necessary, until the allocated direct memory falls below the threshold
> again.  In doing so, some buffers that were previously sparse will be
> released.  Expect to see the direct memory graph level off, but it
> won't necessarily fall. This is expected.   In other words, the direct
> graph you shared on 5th does not surprise me.
>
> We plan to put out a 6.0.7/6.1.3 RCs out in the next few days which
> will include these changes.    The patch attached to QPID-7462 has
> been updated it will apply.
>
> In the longer term, for 7.0.0 we have already improved the Broker so
> that it actively manage the sparsity of buffers without falling back
> on flow to disk, Memory requirements for use cases such as yours
> should be much more reasonable.
>
> I know you currently have a dependency on the old JMX management
> interface.   I'd suggest you look at eliminating the dependency soon,
> so you are free to upgrade when the time comes.
>
> Kind regards, Keith Wall.
>
>
> On 16 May 2017 at 19:58, Ramayan Tiwari <ra...@gmail.com> wrote:
>> Thanks Keith for the update.
>>
>> - Ramayan
>>
>> On Mon, May 15, 2017 at 2:35 AM, Keith W <ke...@gmail.com> wrote:
>>>
>>> Hi Ramayan
>>>
>>> We are still looking at our approach to the Broker's flow to disk
>>> feature in light of the defect you highlighted.  We have some work in
>>> flight this week investigating alternative approaches which I am
>>> hoping will conclude by the end of week.  I should be able to update
>>> you then.
>>>
>>> Thanks Keith
>>>
>>> On 12 May 2017 at 20:58, Ramayan Tiwari <ra...@gmail.com> wrote:
>>> > Hi Alex,
>>> >
>>> > Any update on the fix for this?
>>> > QPID-7753 is assigned a fix version for 7.0.0, I am hoping that the fix
>>> > will also be back ported to 6.0.x.
>>> >
>>> > Thanks
>>> > Ramayan
>>> >
>>> > On Mon, May 8, 2017 at 2:14 AM, Oleksandr Rudyy <or...@gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Ramayan,
>>> >>
>>> >> Thanks for testing the patch and providing a feedback.
>>> >>
>>> >> Regarding direct memory utilization, the Qpid Broker caches up to 256MB
>>> >> of
>>> >> direct memory internally in QpidByteBuffers. Thus, when testing the
>>> >> Broker
>>> >> with only 256MB of direct memory, the entire direct memory could be
>>> >> cached
>>> >> and it would look as if direct memory is never released. Potentially,
>>> >> you
>>> >> can reduce the number of buffers cached on broker by changing context
>>> >> variable 'broker.directByteBufferPoolSize'. By default, it is set to
>>> >> 1000.
>>> >> With buffer size of 256K, it would give ~256M of cache.
>>> >>
>>> >> Regarding introducing lower and upper thresholds for 'flow to disk'. It
>>> >> seems like a good idea and we will try to implement it early this week
>>> >> on
>>> >> trunk first.
>>> >>
>>> >> Kind Regards,
>>> >> Alex
>>> >>
>>> >>
>>> >> On 5 May 2017 at 23:49, Ramayan Tiwari <ra...@gmail.com>
>>> >> wrote:
>>> >>
>>> >> > Hi Alex,
>>> >> >
>>> >> > Thanks for providing the patch. I verified the fix with same perf
>>> >> > test,
>>> >> and
>>> >> > it does prevent broker from going OOM, however. DM utilization
>>> >> > doesn't
>>> >> get
>>> >> > any better after hitting the threshold (where flow to disk is
>>> >> > activated
>>> >> > based on total used % across broker - graph in the link below).
>>> >> >
>>> >> > After hitting the final threshold, flow to disk activates and
>>> >> > deactivates
>>> >> > pretty frequently across all the queues. The reason seems to be
>>> >> > because
>>> >> > there is only one threshold currently to trigger flow to disk. Would
>>> >> > it
>>> >> > make sense to break this down to high and low threshold - so that
>>> >> > once
>>> >> flow
>>> >> > to disk is active after hitting high threshold, it will be active
>>> >> > until
>>> >> the
>>> >> > queue utilization (or broker DM allocation) reaches the low
>>> >> > threshold.
>>> >> >
>>> >> > Graph and flow to disk logs are here:
>>> >> > https://docs.google.com/document/d/1Wc1e-id-
>>> >> WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
>>> >> > U-RiM/edit#heading=h.6400pltvjhy7
>>> >> >
>>> >> > Thanks
>>> >> > Ramayan
>>> >> >
>>> >> > On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <or...@gmail.com>
>>> >> wrote:
>>> >> >
>>> >> > > Hi Ramayan,
>>> >> > >
>>> >> > > We attached to the QPID-7753 a patch with a work around for 6.0.x
>>> >> branch.
>>> >> > > It triggers flow to disk based on direct memory consumption rather
>>> >> > > than
>>> >> > > estimation of the space occupied by the message content. The flow
>>> >> > > to
>>> >> disk
>>> >> > > should evacuate message content preventing running out of direct
>>> >> memory.
>>> >> > We
>>> >> > > already committed the changes into 6.0.x and 6.1.x branches. It
>>> >> > > will be
>>> >> > > included into upcoming 6.0.7 and 6.1.3 releases.
>>> >> > >
>>> >> > > Please try and test the patch in your environment.
>>> >> > >
>>> >> > > We are still working at finishing of the fix for trunk.
>>> >> > >
>>> >> > > Kind Regards,
>>> >> > > Alex
>>> >> > >
>>> >> > > On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com>
>>> >> wrote:
>>> >> > >
>>> >> > > > Hi Ramayan,
>>> >> > > >
>>> >> > > > The high-level plan is currently as follows:
>>> >> > > >  1) Periodically try to compact sparse direct memory buffers.
>>> >> > > >  2) Increase accuracy of messages' direct memory usage estimation
>>> >> > > > to
>>> >> > more
>>> >> > > > reliably trigger flow to disk.
>>> >> > > >  3) Add an additional flow to disk trigger based on the amount of
>>> >> > > allocated
>>> >> > > > direct memory.
>>> >> > > >
>>> >> > > > A little bit more details:
>>> >> > > >  1) We plan on periodically checking the amount of direct memory
>>> >> usage
>>> >> > > and
>>> >> > > > if it is above a
>>> >> > > >     threshold (50%) we compare the sum of all queue sizes with
>>> >> > > > the
>>> >> > amount
>>> >> > > > of allocated direct memory.
>>> >> > > >     If the ratio falls below a certain threshold we trigger a
>>> >> > compaction
>>> >> > > > task which goes through all queues
>>> >> > > >     and copy's a certain amount of old message buffers into new
>>> >> > > > ones
>>> >> > > > thereby freeing the old buffers so
>>> >> > > >     that they can be returned to the buffer pool and be reused.
>>> >> > > >
>>> >> > > >  2) Currently we trigger flow to disk based on an estimate of how
>>> >> much
>>> >> > > > memory the messages on the
>>> >> > > >     queues consume. We had to use estimates because we did not
>>> >> > > > have
>>> >> > > > accurate size numbers for
>>> >> > > >     message headers. By having accurate size information for
>>> >> > > > message
>>> >> > > > headers we can more reliably
>>> >> > > >     enforce queue memory limits.
>>> >> > > >
>>> >> > > >  3) The flow to disk trigger based on message size had another
>>> >> problem
>>> >> > > > which is more pertinent to the
>>> >> > > >     current issue. We only considered the size of the messages
>>> >> > > > and
>>> >> not
>>> >> > > how
>>> >> > > > much memory we allocate
>>> >> > > >     to store those messages. In the FIFO use case those numbers
>>> >> > > > will
>>> >> be
>>> >> > > > very close to each other but in
>>> >> > > >     use cases like yours we can end up with sparse buffers and
>>> >> > > > the
>>> >> > > numbers
>>> >> > > > will diverge. Because of this
>>> >> > > >     divergence we do not trigger flow to disk in time and the
>>> >> > > > broker
>>> >> > can
>>> >> > > go
>>> >> > > > OOM.
>>> >> > > >     To fix the issue we want to add an additional flow to disk
>>> >> trigger
>>> >> > > > based on the amount of allocated direct
>>> >> > > >     memory. This should prevent the broker from going OOM even if
>>> >> > > > the
>>> >> > > > compaction strategy outlined above
>>> >> > > >     should fail for some reason (e.g., the compaction task cannot
>>> >> keep
>>> >> > up
>>> >> > > > with the arrival of new messages).
>>> >> > > >
>>> >> > > > Currently, there are patches for the above points but they suffer
>>> >> from
>>> >> > > some
>>> >> > > > thread-safety issues that need to be addressed.
>>> >> > > >
>>> >> > > > I hope this description helps. Any feedback is, as always,
>>> >> > > > welcome.
>>> >> > > >
>>> >> > > > Kind regards,
>>> >> > > > Lorenz
>>> >> > > >
>>> >> > > >
>>> >> > > >
>>> >> > > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
>>> >> > > ramayan.tiwari@gmail.com
>>> >> > > > >
>>> >> > > > wrote:
>>> >> > > >
>>> >> > > > > Hi Lorenz,
>>> >> > > > >
>>> >> > > > > Thanks so much for the patch. We have a perf test now to
>>> >> > > > > reproduce
>>> >> > this
>>> >> > > > > issue, so we did test with 256KB, 64KB and 4KB network byte
>>> >> > > > > buffer.
>>> >> > > None
>>> >> > > > of
>>> >> > > > > these configurations help with the issue (or give any more
>>> >> breathing
>>> >> > > > room)
>>> >> > > > > for our use case. We would like to share the perf analysis with
>>> >> > > > > the
>>> >> > > > > community:
>>> >> > > > >
>>> >> > > > > https://docs.google.com/document/d/1Wc1e-id-
>>> >> > > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
>>> >> > > > > U-RiM/edit?usp=sharing
>>> >> > > > >
>>> >> > > > > Feel free to comment on the doc if certain details are
>>> >> > > > > incorrect or
>>> >> > if
>>> >> > > > > there are questions.
>>> >> > > > >
>>> >> > > > > Since the short term solution doesn't help us, we are very
>>> >> interested
>>> >> > > in
>>> >> > > > > getting some details on how the community plans to address
>>> >> > > > > this, a
>>> >> > high
>>> >> > > > > level description of the approach will be very helpful for us
>>> >> > > > > in
>>> >> > order
>>> >> > > to
>>> >> > > > > brainstorm our use cases along with this solution.
>>> >> > > > >
>>> >> > > > > - Ramayan
>>> >> > > > >
>>> >> > > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
>>> >> > quack.lorenz@gmail.com>
>>> >> > > > > wrote:
>>> >> > > > >
>>> >> > > > > > Hello Ramayan,
>>> >> > > > > >
>>> >> > > > > > We are still working on a fix for this issue.
>>> >> > > > > > In the mean time we had an idea to potentially workaround the
>>> >> issue
>>> >> > > > until
>>> >> > > > > > a proper fix is released.
>>> >> > > > > >
>>> >> > > > > > The idea is to decrease the qpid network buffer size the
>>> >> > > > > > broker
>>> >> > uses.
>>> >> > > > > > While this still allows for sparsely populated buffers it
>>> >> > > > > > would
>>> >> > > improve
>>> >> > > > > > the overall occupancy ratio.
>>> >> > > > > >
>>> >> > > > > > Here are the steps to follow:
>>> >> > > > > >  * ensure you are not using TLS
>>> >> > > > > >  * apply the attached patch
>>> >> > > > > >  * figure out the size of the largest messages you are
>>> >> > > > > > sending
>>> >> > > > (including
>>> >> > > > > > header and some overhead)
>>> >> > > > > >  * set the context variable "qpid.broker.networkBufferSize"
>>> >> > > > > > to
>>> >> > that
>>> >> > > > > value
>>> >> > > > > > but not smaller than 4096
>>> >> > > > > >  * test
>>> >> > > > > >
>>> >> > > > > > Decreasing the qpid network buffer size automatically limits
>>> >> > > > > > the
>>> >> > > > maximum
>>> >> > > > > > AMQP frame size.
>>> >> > > > > > Since you are using a very old client we are not sure how
>>> >> > > > > > well it
>>> >> > > copes
>>> >> > > > > > with small frame sizes where it has to split a message across
>>> >> > > multiple
>>> >> > > > > > frames.
>>> >> > > > > > Therefore, to play it safe you should not set it smaller than
>>> >> > > > > > the
>>> >> > > > largest
>>> >> > > > > > messages (+ header + overhead) you are sending.
>>> >> > > > > > I do not know what message sizes you are sending but AMQP
>>> >> > > > > > imposes
>>> >> > the
>>> >> > > > > > restriction that the framesize cannot be smaller than 4096
>>> >> > > > > > bytes.
>>> >> > > > > > In the qpid broker the default currently is 256 kB.
>>> >> > > > > >
>>> >> > > > > > In the current state the broker does not allow setting the
>>> >> network
>>> >> > > > buffer
>>> >> > > > > > to values smaller than 64 kB to allow TLS frames to fit into
>>> >> > > > > > one
>>> >> > > > network
>>> >> > > > > > buffer.
>>> >> > > > > > I attached a patch to this mail that lowers that restriction
>>> >> > > > > > to
>>> >> the
>>> >> > > > limit
>>> >> > > > > > imposed by AMQP (4096 Bytes).
>>> >> > > > > > Obviously, you should not use this when using TLS.
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > > > > I hope this reduces the problems you are currently facing
>>> >> > > > > > until
>>> >> we
>>> >> > > can
>>> >> > > > > > complete the proper fix.
>>> >> > > > > >
>>> >> > > > > > Kind regards,
>>> >> > > > > > Lorenz
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
>>> >> > > > > > > Thanks so much Keith and the team for finding the root
>>> >> > > > > > > cause.
>>> >> We
>>> >> > > are
>>> >> > > > so
>>> >> > > > > > > relieved that we fix the root cause shortly.
>>> >> > > > > > >
>>> >> > > > > > > Couple of things that I forgot to mention on the mitigation
>>> >> steps
>>> >> > > we
>>> >> > > > > took
>>> >> > > > > > > in the last incident:
>>> >> > > > > > > 1) We triggered GC from JMX bean multiple times, it did not
>>> >> help
>>> >> > in
>>> >> > > > > > > reducing DM allocated.
>>> >> > > > > > > 2) We also killed all the AMQP connections to the broker
>>> >> > > > > > > when
>>> >> DM
>>> >> > > was
>>> >> > > > at
>>> >> > > > > > > 80%. This did not help either. The way we killed
>>> >> > > > > > > connections -
>>> >> > > using
>>> >> > > > > JMX
>>> >> > > > > > > got list of all the open AMQP connections and called close
>>> >> > > > > > > from
>>> >> > JMX
>>> >> > > > > > mbean.
>>> >> > > > > > >
>>> >> > > > > > > I am hoping the above two are not related to root cause,
>>> >> > > > > > > but
>>> >> > wanted
>>> >> > > > to
>>> >> > > > > > > bring it up in case this is relevant.
>>> >> > > > > > >
>>> >> > > > > > > Thanks
>>> >> > > > > > > Ramayan
>>> >> > > > > > >
>>> >> > > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W
>>> >> > > > > > > <keith.wall@gmail.com
>>> >> >
>>> >> > > > wrote:
>>> >> > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > > Hello Ramayan
>>> >> > > > > > > >
>>> >> > > > > > > > I believe I understand the root cause of the problem.  We
>>> >> have
>>> >> > > > > > > > identified a flaw in the direct memory buffer management
>>> >> > employed
>>> >> > > > by
>>> >> > > > > > > > Qpid Broker J which for some messaging use-cases can lead
>>> >> > > > > > > > to
>>> >> > the
>>> >> > > > OOM
>>> >> > > > > > > > direct you describe.   For the issue to manifest the
>>> >> producing
>>> >> > > > > > > > application needs to use a single connection for the
>>> >> production
>>> >> > > of
>>> >> > > > > > > > messages some of which are short-lived (i.e. are consumed
>>> >> > > quickly)
>>> >> > > > > > > > whilst others remain on the queue for some time.
>>> >> > > > > > > > Priority
>>> >> > > queues,
>>> >> > > > > > > > sorted queues and consumers utilising selectors that
>>> >> > > > > > > > result
>>> >> in
>>> >> > > some
>>> >> > > > > > > > messages being left of the queue could all produce this
>>> >> patten.
>>> >> > > > The
>>> >> > > > > > > > pattern leads to a sparsely occupied 256K net buffers
>>> >> > > > > > > > which
>>> >> > > cannot
>>> >> > > > be
>>> >> > > > > > > > released or reused until every message that reference a
>>> >> 'chunk'
>>> >> > > of
>>> >> > > > it
>>> >> > > > > > > > is either consumed or flown to disk.   The problem was
>>> >> > introduced
>>> >> > > > > with
>>> >> > > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
>>> >> > > > > > > >
>>> >> > > > > > > > The flow to disk feature is not helping us here because
>>> >> > > > > > > > its
>>> >> > > > algorithm
>>> >> > > > > > > > considers only the size of live messages on the queues.
>>> >> > > > > > > > If
>>> >> the
>>> >> > > > > > > > accumulative live size does not exceed the threshold, the
>>> >> > > messages
>>> >> > > > > > > > aren't flown to disk. I speculate that when you observed
>>> >> > > > > > > > that
>>> >> > > > moving
>>> >> > > > > > > > messages cause direct message usage to drop earlier
>>> >> > > > > > > > today,
>>> >> your
>>> >> > > > > > > > message movement cause a queue to go over threshold,
>>> >> > > > > > > > cause
>>> >> > > message
>>> >> > > > to
>>> >> > > > > > > > be flown to disk and their direct memory references
>>> >> > > > > > > > released.
>>> >> > > The
>>> >> > > > > > > > logs will confirm this is so.
>>> >> > > > > > > >
>>> >> > > > > > > > I have not identified an easy workaround at the moment.
>>> >> > > >  Decreasing
>>> >> > > > > > > > the flow to disk threshold and/or increasing available
>>> >> > > > > > > > direct
>>> >> > > > memory
>>> >> > > > > > > > should alleviate and may be an acceptable short term
>>> >> > workaround.
>>> >> > > > If
>>> >> > > > > > > > it were possible for publishing application to publish
>>> >> > > > > > > > short
>>> >> > > lived
>>> >> > > > > and
>>> >> > > > > > > > long lived messages on two separate JMS connections this
>>> >> would
>>> >> > > > avoid
>>> >> > > > > > > > this defect.
>>> >> > > > > > > >
>>> >> > > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related
>>> >> > > > > > > > this
>>> >> > > > problem.
>>> >> > > > > > > > We intend to be working on these early next week and will
>>> >> > > > > > > > be
>>> >> > > aiming
>>> >> > > > > > > > for a fix that is back-portable to 6.0.
>>> >> > > > > > > >
>>> >> > > > > > > > Apologies that you have run into this defect and thanks
>>> >> > > > > > > > for
>>> >> > > > > reporting.
>>> >> > > > > > > >
>>> >> > > > > > > > Thanks, Keith
>>> >> > > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > >
>>> >> > > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
>>> >> > > > ramayan.tiwari@gmail.com>
>>> >> > > > > > > > wrote:
>>> >> > > > > > > > >
>>> >> > > > > > > > > Hi All,
>>> >> > > > > > > > >
>>> >> > > > > > > > > We have been monitoring the brokers everyday and today
>>> >> > > > > > > > > we
>>> >> > found
>>> >> > > > one
>>> >> > > > > > > > instance
>>> >> > > > > > > > >
>>> >> > > > > > > > > where broker’s DM was constantly going up and was about
>>> >> > > > > > > > > to
>>> >> > > crash,
>>> >> > > > > so
>>> >> > > > > > we
>>> >> > > > > > > > > experimented some mitigations, one of which caused the
>>> >> > > > > > > > > DM
>>> >> to
>>> >> > > come
>>> >> > > > > > down.
>>> >> > > > > > > > > Following are the details, which might help us
>>> >> understanding
>>> >> > > the
>>> >> > > > > > issue:
>>> >> > > > > > > > >
>>> >> > > > > > > > > Traffic scenario:
>>> >> > > > > > > > >
>>> >> > > > > > > > > DM allocation had been constantly going up and was at
>>> >> > > > > > > > > 90%.
>>> >> > > There
>>> >> > > > > > were two
>>> >> > > > > > > > > queues which seemed to align with the theories that we
>>> >> > > > > > > > > had.
>>> >> > > Q1’s
>>> >> > > > > > size had
>>> >> > > > > > > > > been large right after the broker start and had slow
>>> >> > > consumption
>>> >> > > > of
>>> >> > > > > > > > > messages, queue size only reduced from 76MB to 75MB
>>> >> > > > > > > > > over a
>>> >> > > period
>>> >> > > > > of
>>> >> > > > > > > > 6hrs.
>>> >> > > > > > > > >
>>> >> > > > > > > > > Q2 on the other hand, started small and was gradually
>>> >> > growing,
>>> >> > > > > queue
>>> >> > > > > > size
>>> >> > > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues
>>> >> > > > > > > > > with
>>> >> > > > traffic
>>> >> > > > > > > > during
>>> >> > > > > > > > >
>>> >> > > > > > > > > this time.
>>> >> > > > > > > > >
>>> >> > > > > > > > > Action taken:
>>> >> > > > > > > > >
>>> >> > > > > > > > > Moved all the messages from Q2 (since this was our
>>> >> > > > > > > > > original
>>> >> > > > theory)
>>> >> > > > > > to Q3
>>> >> > > > > > > > > (already created but no messages in it). This did not
>>> >> > > > > > > > > help
>>> >> > with
>>> >> > > > the
>>> >> > > > > > DM
>>> >> > > > > > > > > growing up.
>>> >> > > > > > > > > Moved all the messages from Q1 to Q4 (already created
>>> >> > > > > > > > > but
>>> >> no
>>> >> > > > > > messages in
>>> >> > > > > > > > > it). This reduced DM allocation from 93% to 31%.
>>> >> > > > > > > > >
>>> >> > > > > > > > > We have the heap dump and thread dump from when broker
>>> >> > > > > > > > > was
>>> >> > 90%
>>> >> > > in
>>> >> > > > > DM
>>> >> > > > > > > > > allocation. We are going to analyze that to see if we
>>> >> > > > > > > > > can
>>> >> get
>>> >> > > > some
>>> >> > > > > > clue.
>>> >> > > > > > > > We
>>> >> > > > > > > > >
>>> >> > > > > > > > > wanted to share this new information which might help
>>> >> > > > > > > > > in
>>> >> > > > reasoning
>>> >> > > > > > about
>>> >> > > > > > > > the
>>> >> > > > > > > > >
>>> >> > > > > > > > > memory issue.
>>> >> > > > > > > > >
>>> >> > > > > > > > > - Ramayan
>>> >> > > > > > > > >
>>> >> > > > > > > > >
>>> >> > > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
>>> >> > > > > > > > ramayan.tiwari@gmail.com>
>>> >> > > > > > > > >
>>> >> > > > > > > > > wrote:
>>> >> > > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > Hi Keith,
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > Thanks so much for your response and digging into the
>>> >> > issue.
>>> >> > > > > Below
>>> >> > > > > > are
>>> >> > > > > > > > the
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > answer to your questions:
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We
>>> >> > > > > > > > > > couldn't
>>> >> use
>>> >> > > 6.1
>>> >> > > > > > where it
>>> >> > > > > > > > > > was released because we need JMX support. Here is the
>>> >> > > > destination
>>> >> > > > > > > > format:
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > ""%s ; {node : { type : queue }, link : {
>>> >> > > > > > > > > > x-subscribes :
>>> >> {
>>> >> > > > > > arguments : {
>>> >> > > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > 2) Our machines have 40 cores, which will make the
>>> >> > > > > > > > > > number
>>> >> > of
>>> >> > > > > > threads to
>>> >> > > > > > > > > > 80. This might not be an issue, because this will
>>> >> > > > > > > > > > show up
>>> >> > in
>>> >> > > > the
>>> >> > > > > > > > baseline DM
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > allocated, which is only 6% (of 4GB) when we just
>>> >> > > > > > > > > > bring
>>> >> up
>>> >> > > the
>>> >> > > > > > broker.
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > 3) The only setting that we tuned WRT to DM is
>>> >> > > > > flowToDiskThreshold,
>>> >> > > > > > > > which
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > is set at 80% now.
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > 4) Only one virtual host in the broker.
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > 5) Most of our queues (99%) are priority, we also
>>> >> > > > > > > > > > have
>>> >> 8-10
>>> >> > > > > sorted
>>> >> > > > > > > > queues.
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > 6) Yeah we are using the standard 0.16 client and not
>>> >> AMQP
>>> >> > > 1.0
>>> >> > > > > > clients.
>>> >> > > > > > > > > > The connection log line looks like:
>>> >> > > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) :
>>> >> > > > > > > > > > Protocol
>>> >> > > > Version
>>> >> > > > > :
>>> >> > > > > > 0-10
>>> >> > > > > > > > :
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > Client ID : test : Client Version : 0.16 : Client
>>> >> Product :
>>> >> > > > qpid
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > We had another broker crashed about an hour back, we
>>> >> > > > > > > > > > do
>>> >> see
>>> >> > > the
>>> >> > > > > > same
>>> >> > > > > > > > > > patterns:
>>> >> > > > > > > > > > 1) There is a queue which is constantly growing,
>>> >> > > > > > > > > > enqueue
>>> >> is
>>> >> > > > > faster
>>> >> > > > > > than
>>> >> > > > > > > > > > dequeue on that queue for a long period of time.
>>> >> > > > > > > > > > 2) Flow to disk didn't kick in at all.
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > This graph shows memory growth (red line - heap, blue
>>> >> > > > > > > > > > -
>>> >> DM
>>> >> > > > > > allocated,
>>> >> > > > > > > > > > yellow - DM used)
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > https://drive.google.com/file/d/
>>> >> > > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
>>> >> > > > > > > > view?usp=sharing
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > The below graph shows growth on a single queue (there
>>> >> > > > > > > > > > are
>>> >> > > 10-12
>>> >> > > > > > other
>>> >> > > > > > > > > > queues with traffic as well, something large size
>>> >> > > > > > > > > > than
>>> >> this
>>> >> > > > > queue):
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > https://drive.google.com/file/d/
>>> >> > > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
>>> >> > > > > > > > view?usp=sharing
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > Couple of questions:
>>> >> > > > > > > > > > 1) Is there any developer level doc/design spec on
>>> >> > > > > > > > > > how
>>> >> Qpid
>>> >> > > > uses
>>> >> > > > > > DM?
>>> >> > > > > > > > > > 2) We are not getting heap dumps automatically when
>>> >> broker
>>> >> > > > > crashes
>>> >> > > > > > due
>>> >> > > > > > > > to
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has
>>> >> > > > > > > > > > anyone
>>> >> > > > found a
>>> >> > > > > > way
>>> >> > > > > > > > to get
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > around this problem?
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > Thanks
>>> >> > > > > > > > > > Ramayan
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
>>> >> > > keith.wall@gmail.com
>>> >> > > > >
>>> >> > > > > > wrote:
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > Hi Ramayan
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > We have been discussing your problem here and have
>>> >> > > > > > > > > > > a
>>> >> > couple
>>> >> > > > of
>>> >> > > > > > > > questions.
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > I have been experimenting with use-cases based on
>>> >> > > > > > > > > > > your
>>> >> > > > > > descriptions
>>> >> > > > > > > > > > > above, but so far, have been unsuccessful in
>>> >> reproducing
>>> >> > a
>>> >> > > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
>>> >> > > > condition.
>>> >> > > > > > The
>>> >> > > > > > > > > > > direct memory usage reflects the expected model: it
>>> >> > levels
>>> >> > > > off
>>> >> > > > > > when
>>> >> > > > > > > > > > > the flow to disk threshold is reached and direct
>>> >> > > > > > > > > > > memory
>>> >> > is
>>> >> > > > > > release as
>>> >> > > > > > > > > > > messages are consumed until the minimum size for
>>> >> caching
>>> >> > of
>>> >> > > > > > direct is
>>> >> > > > > > > > > > > reached.
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > 1] For clarity let me check: we believe when you
>>> >> > > > > > > > > > > say
>>> >> > "patch
>>> >> > > > to
>>> >> > > > > > use
>>> >> > > > > > > > > > > MultiQueueConsumer" you are referring to the patch
>>> >> > attached
>>> >> > > > to
>>> >> > > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the
>>> >> > broker"
>>> >> > > > > and
>>> >> > > > > > you
>>> >> > > > > > > > > > > are using a combination of this "x-pull-only"  with
>>> >> > > > > > > > > > > the
>>> >> > > > > standard
>>> >> > > > > > > > > > > "x-multiqueue" feature.  Is this correct?
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > 2] One idea we had here relates to the size of the
>>> >> > > > virtualhost
>>> >> > > > > IO
>>> >> > > > > > > > > > > pool.   As you know from the documentation, the
>>> >> > > > > > > > > > > Broker
>>> >> > > > > > caches/reuses
>>> >> > > > > > > > > > > direct memory internally but the documentation
>>> >> > > > > > > > > > > fails to
>>> >> > > > > mentions
>>> >> > > > > > that
>>> >> > > > > > > > > > > each pooled virtualhost IO thread also grabs a
>>> >> > > > > > > > > > > chunk
>>> >> > (256K)
>>> >> > > > of
>>> >> > > > > > direct
>>> >> > > > > > > > > > > memory from this cache.  By default the virtual
>>> >> > > > > > > > > > > host IO
>>> >> > > pool
>>> >> > > > is
>>> >> > > > > > sized
>>> >> > > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors()
>>> >> > > > > > > > > > > *
>>> >> 2,
>>> >> > > > 64),
>>> >> > > > > > so if
>>> >> > > > > > > > > > > you have a machine with a very large number of
>>> >> > > > > > > > > > > cores,
>>> >> you
>>> >> > > may
>>> >> > > > > > have a
>>> >> > > > > > > > > > > surprising large amount of direct memory assigned
>>> >> > > > > > > > > > > to
>>> >> > > > > virtualhost
>>> >> > > > > > IO
>>> >> > > > > > > > > > > threads.   Check the value of
>>> >> > > > > > > > > > > connectionThreadPoolSize
>>> >> on
>>> >> > > the
>>> >> > > > > > > > > > > virtualhost
>>> >> > > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
>>> >> > > > > > virtualhostnodename>/<;
>>> >> > > > > > > > virtualhostname>)
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > to see what value is in force.  What is it?  It is
>>> >> > possible
>>> >> > > > to
>>> >> > > > > > tune
>>> >> > > > > > > > > > > the pool size using context variable
>>> >> > > > > > > > > > > virtualhost.connectionThreadPool.size.
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > 3] Tell me if you are tuning the Broker in way
>>> >> > > > > > > > > > > beyond
>>> >> the
>>> >> > > > > > direct/heap
>>> >> > > > > > > > > > > memory settings you have told us about already.
>>> >> > > > > > > > > > > For
>>> >> > > instance
>>> >> > > > > > you are
>>> >> > > > > > > > > > > changing any of the direct memory pooling settings
>>> >> > > > > > > > > > > broker.directByteBufferPoolSize, default network
>>> >> buffer
>>> >> > > size
>>> >> > > > > > > > > > > qpid.broker.networkBufferSize or applying any other
>>> >> > > > > non-standard
>>> >> > > > > > > > > > > settings?
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > 4] How many virtual hosts do you have on the
>>> >> > > > > > > > > > > Broker?
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > 5] What is the consumption pattern of the messages?
>>> >> > > > > > > > > > > Do
>>> >> > > > consume
>>> >> > > > > > in a
>>> >> > > > > > > > > > > strictly FIFO fashion or are you making use of
>>> >> > > > > > > > > > > message
>>> >> > > > > selectors
>>> >> > > > > > > > > > > or/and any of the out-of-order queue types (LVQs,
>>> >> > priority
>>> >> > > > > queue
>>> >> > > > > > or
>>> >> > > > > > > > > > > sorted queues)?
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > 6] Is it just the 0.16 client involved in the
>>> >> > application?
>>> >> > > > >  Can
>>> >> > > > > > I
>>> >> > > > > > > > > > > check that you are not using any of the AMQP 1.0
>>> >> clients
>>> >> > > > > > > > > > > (org,apache.qpid:qpid-jms-client or
>>> >> > > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the
>>> >> > > > > > > > > > > software
>>> >> > > stack
>>> >> > > > > (as
>>> >> > > > > > either
>>> >> > > > > > > > > > > consumers or producers)
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > Hopefully the answers to these questions will get
>>> >> > > > > > > > > > > us
>>> >> > closer
>>> >> > > > to
>>> >> > > > > a
>>> >> > > > > > > > > > > reproduction.   If you are able to reliable
>>> >> > > > > > > > > > > reproduce
>>> >> it,
>>> >> > > > > please
>>> >> > > > > > share
>>> >> > > > > > > > > > > the steps with us.
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > Kind regards, Keith.
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
>>> >> > > > > > ramayan.tiwari@gmail.com>
>>> >> > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > After a lot of log mining, we might have a way to
>>> >> > explain
>>> >> > > > the
>>> >> > > > > > > > sustained
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > increased in DirectMemory allocation, the
>>> >> > > > > > > > > > > > correlation
>>> >> > > seems
>>> >> > > > > to
>>> >> > > > > > be
>>> >> > > > > > > > with
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > the
>>> >> > > > > > > > > > > > growth in the size of a Queue that is getting
>>> >> consumed
>>> >> > > but
>>> >> > > > at
>>> >> > > > > > a much
>>> >> > > > > > > > > > > > slower
>>> >> > > > > > > > > > > > rate than producers putting messages on this
>>> >> > > > > > > > > > > > queue.
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > The pattern we see is that in each instance of
>>> >> > > > > > > > > > > > broker
>>> >> > > > crash,
>>> >> > > > > > there is
>>> >> > > > > > > > > > > > at
>>> >> > > > > > > > > > > > least one queue (usually 1 queue) whose size kept
>>> >> > growing
>>> >> > > > > > steadily.
>>> >> > > > > > > > > > > > It’d be
>>> >> > > > > > > > > > > > of significant size but not the largest queue --
>>> >> > usually
>>> >> > > > > there
>>> >> > > > > > are
>>> >> > > > > > > > > > > > multiple
>>> >> > > > > > > > > > > > larger queues -- but it was different from other
>>> >> queues
>>> >> > > in
>>> >> > > > > > that its
>>> >> > > > > > > > > > > > size
>>> >> > > > > > > > > > > > was growing steadily. The queue would also be
>>> >> > > > > > > > > > > > moving,
>>> >> > but
>>> >> > > > its
>>> >> > > > > > > > > > > > processing
>>> >> > > > > > > > > > > > rate was not keeping up with the enqueue rate.
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > Our theory that might be totally wrong: If a
>>> >> > > > > > > > > > > > queue is
>>> >> > > > moving
>>> >> > > > > > the
>>> >> > > > > > > > entire
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > time, maybe then the broker would keep reusing
>>> >> > > > > > > > > > > > the
>>> >> same
>>> >> > > > > buffer
>>> >> > > > > > in
>>> >> > > > > > > > > > > > direct
>>> >> > > > > > > > > > > > memory for the queue, and keep on adding onto it
>>> >> > > > > > > > > > > > at
>>> >> the
>>> >> > > end
>>> >> > > > > to
>>> >> > > > > > > > > > > > accommodate
>>> >> > > > > > > > > > > > new messages. But because it’s active all the
>>> >> > > > > > > > > > > > time
>>> >> and
>>> >> > > > we’re
>>> >> > > > > > pointing
>>> >> > > > > > > > > > > > to
>>> >> > > > > > > > > > > > the same buffer, space allocated for messages at
>>> >> > > > > > > > > > > > the
>>> >> > head
>>> >> > > > of
>>> >> > > > > > the
>>> >> > > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long
>>> >> > > > > > > > > > > > after
>>> >> > those
>>> >> > > > > > messages
>>> >> > > > > > > > have
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > been processed. Just a theory.
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > We are also trying to reproduce this using some
>>> >> > > > > > > > > > > > perf
>>> >> > > tests
>>> >> > > > to
>>> >> > > > > > enqueue
>>> >> > > > > > > > > > > > with
>>> >> > > > > > > > > > > > same pattern, will update with the findings.
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > Thanks
>>> >> > > > > > > > > > > > Ramayan
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
>>> >> > > > > > > > > > > > <ra...@gmail.com>
>>> >> > > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > Another issue that we noticed is when broker
>>> >> > > > > > > > > > > > > goes
>>> >> OOM
>>> >> > > due
>>> >> > > > > to
>>> >> > > > > > direct
>>> >> > > > > > > > > > > > > memory, it doesn't create heap dump (specified
>>> >> > > > > > > > > > > > > by
>>> >> > > "-XX:+
>>> >> > > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM
>>> >> error
>>> >> > > is
>>> >> > > > > > same as
>>> >> > > > > > > > what
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > is
>>> >> > > > > > > > > > > > > mentioned in the oracle JVM docs
>>> >> > > > > > ("java.lang.OutOfMemoryError").
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > Has anyone been able to find a way to get to
>>> >> > > > > > > > > > > > > heap
>>> >> > dump
>>> >> > > > for
>>> >> > > > > > DM OOM?
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > - Ramayan
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan
>>> >> > > > > > > > > > > > > Tiwari
>>> >> > > > > > > > > > > > > <ramayan.tiwari@gmail.com
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > Alex,
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > Below are the flow to disk logs from broker
>>> >> having
>>> >> > > > > > 3million+
>>> >> > > > > > > > messages
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > this time. We only have one virtual host.
>>> >> > > > > > > > > > > > > > Time is
>>> >> > in
>>> >> > > > GMT.
>>> >> > > > > > Looks
>>> >> > > > > > > > like
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > flow
>>> >> > > > > > > > > > > > > > to disk is active on the whole virtual host
>>> >> > > > > > > > > > > > > > and
>>> >> > not a
>>> >> > > > > > queue level.
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > When the same broker went OOM yesterday, I
>>> >> > > > > > > > > > > > > > did
>>> >> not
>>> >> > > see
>>> >> > > > > any
>>> >> > > > > > flow to
>>> >> > > > > > > > > > > > > > disk
>>> >> > > > > > > > > > > > > > logs from when it was started until it
>>> >> > > > > > > > > > > > > > crashed
>>> >> > > (crashed
>>> >> > > > > > twice
>>> >> > > > > > > > within
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > 4hrs).
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3356539KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3354866KB
>>> >> > > > > > > > > > > > > > within threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3358509KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3353501KB
>>> >> > > > > > > > > > > > > > within threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3357544KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3353236KB
>>> >> > > > > > > > > > > > > > within threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3356704KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3353511KB
>>> >> > > > > > > > > > > > > > within threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3357948KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3355310KB
>>> >> > > > > > > > > > > > > > within threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3365624KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3355136KB
>>> >> > > > > > > > > > > > > > within threshold 3355443KB
>>> >> > > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO
>>> >> [Housekeeping[test]] -
>>> >> > > > > > > > > > > > > > [Housekeeping[test]]
>>> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> > > > > > > > > > > > > > Message
>>> >> > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > 3358683KB
>>> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > After production release (2days back), we
>>> >> > > > > > > > > > > > > > have
>>> >> > seen 4
>>> >> > > > > > crashes in 3
>>> >> > > > > > > > > > > > > > different brokers, this is the most pressing
>>> >> > concern
>>> >> > > > for
>>> >> > > > > > us in
>>> >> > > > > > > > > > > > > > decision if
>>> >> > > > > > > > > > > > > > we should roll back to 0.32. Any help is
>>> >> > > > > > > > > > > > > > greatly
>>> >> > > > > > appreciated.
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > Thanks
>>> >> > > > > > > > > > > > > > Ramayan
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr
>>> >> > > > > > > > > > > > > > Rudyy
>>> >> <
>>> >> > > > > > orudyy@gmail.com
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > Ramayan,
>>> >> > > > > > > > > > > > > > > Thanks for the details. I would like to
>>> >> > > > > > > > > > > > > > > clarify
>>> >> > > > whether
>>> >> > > > > > flow to
>>> >> > > > > > > > disk
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > was
>>> >> > > > > > > > > > > > > > > triggered today for 3 million messages?
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > The following logs are issued for flow to
>>> >> > > > > > > > > > > > > > > disk:
>>> >> > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>>> >> Message
>>> >> > > > > memory
>>> >> > > > > > use
>>> >> > > > > > > > > > > > > > > {0,number,#}KB
>>> >> > > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
>>> >> > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>>> >> > Message
>>> >> > > > > > memory use
>>> >> > > > > > > > > > > > > > > {0,number,#}KB within threshold
>>> >> {1,number,#.##}KB
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > Kind Regards,
>>> >> > > > > > > > > > > > > > > Alex
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
>>> >> > > > > > > > ramayan.tiwari@gmail.com>
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > Hi Alex,
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > Thanks for your response, here are the
>>> >> details:
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > We use "direct" exchange, without
>>> >> > > > > > > > > > > > > > > > persistence
>>> >> > (we
>>> >> > > > > > specify
>>> >> > > > > > > > > > > > > > > NON_PERSISTENT
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > that while sending from client) and use
>>> >> > > > > > > > > > > > > > > > BDB
>>> >> > > store.
>>> >> > > > We
>>> >> > > > > > use JSON
>>> >> > > > > > > > > > > > > > > > virtual
>>> >> > > > > > > > > > > > > > > host
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > type. We are not using SSL.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > When the broker went OOM, we had around
>>> >> > > > > > > > > > > > > > > > 1.3
>>> >> > > million
>>> >> > > > > > messages
>>> >> > > > > > > > with
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > 100
>>> >> > > > > > > > > > > > > > > bytes
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > average message size. Direct memory
>>> >> allocation
>>> >> > > > (value
>>> >> > > > > > read from
>>> >> > > > > > > > > > > > > > > > MBean)
>>> >> > > > > > > > > > > > > > > kept
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > going up, even though it wouldn't need
>>> >> > > > > > > > > > > > > > > > more
>>> >> DM
>>> >> > to
>>> >> > > > > > store these
>>> >> > > > > > > > many
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > messages. DM allocated persisted at 99%
>>> >> > > > > > > > > > > > > > > > for
>>> >> > > about 3
>>> >> > > > > > and half
>>> >> > > > > > > > hours
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > before
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > crashing.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > Today, on the same broker we have 3
>>> >> > > > > > > > > > > > > > > > million
>>> >> > > > messages
>>> >> > > > > > (same
>>> >> > > > > > > > message
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > size)
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > and DM allocated is only at 8%. This
>>> >> > > > > > > > > > > > > > > > seems
>>> >> like
>>> >> > > > there
>>> >> > > > > > is some
>>> >> > > > > > > > > > > > > > > > issue
>>> >> > > > > > > > > > > > > > > with
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > de-allocation or a leak.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > I have uploaded the memory utilization
>>> >> > > > > > > > > > > > > > > > graph
>>> >> > > here:
>>> >> > > > > > > > > > > > > > > > https://drive.google.com/file/d/
>>> >> > > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
>>> >> > > > > > > > > > > > > > > > view?usp=sharing
>>> >> > > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM
>>> >> > > > > > > > > > > > > > > > Used
>>> >> > (sum
>>> >> > > > of
>>> >> > > > > > queue
>>> >> > > > > > > > > > > > > > > > payload)
>>> >> > > > > > > > > > > > > > > and Red
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > is heap usage.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > Thanks
>>> >> > > > > > > > > > > > > > > > Ramayan
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM,
>>> >> > > > > > > > > > > > > > > > Oleksandr
>>> >> > Rudyy
>>> >> > > > > > > > > > > > > > > > <or...@gmail.com>
>>> >> > > > > > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > Hi Ramayan,
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > Could please share with us the details
>>> >> > > > > > > > > > > > > > > > > of
>>> >> > > > messaging
>>> >> > > > > > use
>>> >> > > > > > > > case(s)
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > which
>>> >> > > > > > > > > > > > > > > > ended
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > up in OOM on broker side?
>>> >> > > > > > > > > > > > > > > > > I would like to reproduce the issue on
>>> >> > > > > > > > > > > > > > > > > my
>>> >> > local
>>> >> > > > > > broker in
>>> >> > > > > > > > order
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > to
>>> >> > > > > > > > > > > > > > > fix
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > it.
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > I would appreciate if you could provide
>>> >> > > > > > > > > > > > > > > > > as
>>> >> > much
>>> >> > > > > > details as
>>> >> > > > > > > > > > > > > > > > > possible,
>>> >> > > > > > > > > > > > > > > > > including, messaging topology, message
>>> >> > > > persistence
>>> >> > > > > > type,
>>> >> > > > > > > > message
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > sizes,volumes, etc.
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory
>>> >> > > > > > > > > > > > > > > > > for
>>> >> > > keeping
>>> >> > > > > > message
>>> >> > > > > > > > content
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > and
>>> >> > > > > > > > > > > > > > > > > receiving/sending data. Each plain
>>> >> connection
>>> >> > > > > > utilizes 512K of
>>> >> > > > > > > > > > > > > > > > > direct
>>> >> > > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of
>>> >> direct
>>> >> > > > > > memory. Your
>>> >> > > > > > > > > > > > > > > > > memory
>>> >> > > > > > > > > > > > > > > > settings
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > look Ok to me.
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > Kind Regards,
>>> >> > > > > > > > > > > > > > > > > Alex
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan
>>> >> > > > > > > > > > > > > > > > > Tiwari
>>> >> > > > > > > > > > > > > > > > > <ra...@gmail.com>
>>> >> > > > > > > > > > > > > > > > > wrote:
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > Hi All,
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with
>>> >> patch
>>> >> > to
>>> >> > > > use
>>> >> > > > > > > > > > > > > > > MultiQueueConsumer
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > feature. We just finished deploying
>>> >> > > > > > > > > > > > > > > > > > to
>>> >> > > > production
>>> >> > > > > > and saw
>>> >> > > > > > > > > > > > > > > > > > couple of
>>> >> > > > > > > > > > > > > > > > > > instances of broker OOM due to
>>> >> > > > > > > > > > > > > > > > > > running
>>> >> out
>>> >> > of
>>> >> > > > > > DirectMemory
>>> >> > > > > > > > > > > > > > > > > > buffer
>>> >> > > > > > > > > > > > > > > > > > (exceptions at the end of this
>>> >> > > > > > > > > > > > > > > > > > email).
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > Here is our setup:
>>> >> > > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g
>>> >> (this
>>> >> > > is
>>> >> > > > > > opposite of
>>> >> > > > > > > > > > > > > > > > > > what the
>>> >> > > > > > > > > > > > > > > > > > recommendation is, however, for our
>>> >> > > > > > > > > > > > > > > > > > use
>>> >> > cause
>>> >> > > > > > message
>>> >> > > > > > > > payload
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > is
>>> >> > > > > > > > > > > > > > > really
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > small ~400bytes and is way less than
>>> >> > > > > > > > > > > > > > > > > > the
>>> >> > per
>>> >> > > > > > message
>>> >> > > > > > > > overhead
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > of
>>> >> > > > > > > > > > > > > > > 1KB).
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > In
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > perf testing, we were able to put 2
>>> >> million
>>> >> > > > > > messages without
>>> >> > > > > > > > > > > > > > > > > > any
>>> >> > > > > > > > > > > > > > > > issues.
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
>>> >> > > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions
>>> >> > > > > > > > > > > > > > > > > > and
>>> >> > there
>>> >> > > is
>>> >> > > > > > one multi
>>> >> > > > > > > > > > > > > > > > > > queue
>>> >> > > > > > > > > > > > > > > > consumer
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > attached to each session, listening
>>> >> > > > > > > > > > > > > > > > > > to
>>> >> > around
>>> >> > > > > 1000
>>> >> > > > > > queues.
>>> >> > > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I
>>> >> know).
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > With the above setup, the baseline
>>> >> > > utilization
>>> >> > > > > > (without any
>>> >> > > > > > > > > > > > > > > messages)
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > for
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > direct memory was around 230mb (with
>>> >> > > > > > > > > > > > > > > > > > 410
>>> >> > > > > > connection each
>>> >> > > > > > > > > > > > > > > > > > taking
>>> >> > > > > > > > > > > > > > > 500KB).
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > Based on our understanding of broker
>>> >> memory
>>> >> > > > > > allocation,
>>> >> > > > > > > > > > > > > > > > > > message
>>> >> > > > > > > > > > > > > > > payload
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > should be the only thing adding to
>>> >> > > > > > > > > > > > > > > > > > direct
>>> >> > > > memory
>>> >> > > > > > utilization
>>> >> > > > > > > > > > > > > > > > > > (on
>>> >> > > > > > > > > > > > > > > top of
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > baseline), however, we are
>>> >> > > > > > > > > > > > > > > > > > experiencing
>>> >> > > > something
>>> >> > > > > > completely
>>> >> > > > > > > > > > > > > > > different.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > In
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > our last broker crash, we see that
>>> >> > > > > > > > > > > > > > > > > > broker
>>> >> > is
>>> >> > > > > > constantly
>>> >> > > > > > > > > > > > > > > > > > running
>>> >> > > > > > > > > > > > > > > with
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > 90%+
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > direct memory allocated, even when
>>> >> message
>>> >> > > > > payload
>>> >> > > > > > sum from
>>> >> > > > > > > > > > > > > > > > > > all the
>>> >> > > > > > > > > > > > > > > > > queues
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > is only 6-8% (these % are against
>>> >> available
>>> >> > > DM
>>> >> > > > of
>>> >> > > > > > 4gb).
>>> >> > > > > > > > During
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > these
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > high
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > DM usage period, heap usage was
>>> >> > > > > > > > > > > > > > > > > > around
>>> >> 60%
>>> >> > > (of
>>> >> > > > > > 12gb).
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > We would like some help in
>>> >> > > > > > > > > > > > > > > > > > understanding
>>> >> > what
>>> >> > > > > > could be the
>>> >> > > > > > > > > > > > > > > > > > reason
>>> >> > > > > > > > > > > > > > > of
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > these
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > high DM allocations. Are there things
>>> >> other
>>> >> > > > than
>>> >> > > > > > message
>>> >> > > > > > > > > > > > > > > > > > payload
>>> >> > > > > > > > > > > > > > > and
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > AMQP
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > connection, which use DM and could be
>>> >> > > > > contributing
>>> >> > > > > > to these
>>> >> > > > > > > > > > > > > > > > > > high
>>> >> > > > > > > > > > > > > > > usage?
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > Another thing where we are puzzled is
>>> >> > > > > > > > > > > > > > > > > > the
>>> >> > > > > > de-allocation of
>>> >> > > > > > > > DM
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > byte
>>> >> > > > > > > > > > > > > > > > > buffers.
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > From log mining of heap and DM
>>> >> utilization,
>>> >> > > > > > de-allocation of
>>> >> > > > > > > > > > > > > > > > > > DM
>>> >> > > > > > > > > > > > > > > doesn't
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has
>>> >> seen
>>> >> > > any
>>> >> > > > > > documentation
>>> >> > > > > > > > > > > > > > > related to
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > this, it would be very helpful if you
>>> >> could
>>> >> > > > share
>>> >> > > > > > that.
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > Thanks
>>> >> > > > > > > > > > > > > > > > > > Ramayan
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > *Exceptions*
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct
>>> >> > > > > > > > > > > > > > > > > > buffer
>>> >> > > > memory
>>> >> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
>>> >> > > Bits.java:658)
>>> >> > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
>>> >> > > > > > init>(DirectByteBuffer.java:
>>> >> > > > > > > > 123)
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
>>> >> > > > > allocateDirect(ByteBuffer.
>>> >> > > > > > java:311)
>>> >> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
>>> >> > > > > > QpidByteBuffer.allocateDirect(
>>> >> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingConnectionPlainD
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > elegate.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
>>> >> > > > > > > > NonBlockingConnectionPlainDele
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > gate.java:93)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingConnectionPlainDele
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > gate.processData(
>>> >> > > > NonBlockingConnectionPlainDele
>>> >> > > > > > > > gate.java:60)
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingConnection.doRead(
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingConnection.doWork(
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NetworkConnectionScheduler.
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > processConnection(
>>> >> > > NetworkConnectionScheduler.
>>> >> > > > > > java:124)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>>> >> > > > transport.SelectorThread$
>>> >> > > > > > > > ConnectionPr
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > ocessor.
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > processConnection(
>>> >> SelectorThread.java:504)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>>> >> > > > transport.SelectorThread$
>>> >> > > > > > > > > > > > > > > > > > SelectionTask.performSelect(
>>> >> > > > > > SelectorThread.java:337)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>>> >> > > > transport.SelectorThread$
>>> >> > > > > > > > SelectionTask.run(
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:87)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>>> >> > > > > > transport.SelectorThread.run(
>>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
>>> >> > > > > ThreadPoolExecutor.runWorker(
>>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
>>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
>>> >> > > > > > ThreadPoolExecutor$Worker.run(
>>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
>>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > java.lang.Thread.run(Thread.java:745)
>>> >> > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > *Second exception*
>>> >> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct
>>> >> > > > > > > > > > > > > > > > > > buffer
>>> >> > > > memory
>>> >> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
>>> >> > > Bits.java:658)
>>> >> > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
>>> >> > > > > > init>(DirectByteBuffer.java:
>>> >> > > > > > > > 123)
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
>>> >> > > > > allocateDirect(ByteBuffer.
>>> >> > > > > > java:311)
>>> >> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
>>> >> > > > > > QpidByteBuffer.allocateDirect(
>>> >> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingConnectionPlainDele
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > gate.<init>(
>>> >> NonBlockingConnectionPlainDele
>>> >> > > > > > gate.java:45)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > NonBlockingConnection.
>>> >> > > > > > > > > > > > > > > > > > setTransportEncryption(
>>> >> > > > > NonBlockingConnection.java:
>>> >> > > > > > 625)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingConnection.<init>(
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>>> >> > > > > > > > NonBlockingNetworkTransport.
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > acceptSocketChannel(
>>> >> > > > NonBlockingNetworkTransport.
>>> >> > > > > > java:158)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>>> >> > > > transport.SelectorThread$
>>> >> > > > > > > > SelectionTas
>>> >> > > > > > > > >
>>> >> > > > > > > > > >
>>> >> > > > > > > > > > >
>>> >> > > > > > > > > > > >
>>> >> > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > k$1.run(
>>> >> > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:191)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>>> >> > > > > > transport.SelectorThread.run(
>>> >> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
>>> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
>>> >> > > > > ThreadPoolExecutor.runWorker(
>>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
>>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
>>> >> > > > > > ThreadPoolExecutor$Worker.run(
>>> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
>>> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > > at
>>> >> > > > > > > > > > > > > > > > > > java.lang.Thread.run(Thread.java:745)
>>> >> > > > > > ~[na:1.8.0_40]
>>> >> > > > > > > > > > > > > > > > > >
>>> >> > > > > > > > > > > > > >
>>> >> > > > > > > > > > > ------------------------------
>>> >> > > ------------------------------
>>> >> > > > > > ---------
>>> >> > > > > > > > > > > To unsubscribe, e-mail:
>>> >> > > > > > > > > > > users-unsubscribe@qpid.apache.
>>> >> > org
>>> >> > > > > > > > > > > For additional commands, e-mail:
>>> >> > > users-help@qpid.apache.org
>>> >> > > > > > > > > > >
>>> >> > > > > >
>>> >> > > > > >
>>> >> > > > > > ------------------------------------------------------------
>>> >> > > ---------
>>> >> > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> >> > > > > > For additional commands, e-mail: users-help@qpid.apache.org
>>> >> > > > > >
>>> >> > > > >
>>> >> > > >
>>> >> > >
>>> >> >
>>> >>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>>> For additional commands, e-mail: users-help@qpid.apache.org
>>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Re: Java broker OOM due to DirectMemory

Posted by Ramayan Tiwari <ra...@gmail.com>.
Thanks Keith for the update.

- Ramayan

On Mon, May 15, 2017 at 2:35 AM, Keith W <ke...@gmail.com> wrote:

> Hi Ramayan
>
> We are still looking at our approach to the Broker's flow to disk
> feature in light of the defect you highlighted.  We have some work in
> flight this week investigating alternative approaches which I am
> hoping will conclude by the end of week.  I should be able to update
> you then.
>
> Thanks Keith
>
> On 12 May 2017 at 20:58, Ramayan Tiwari <ra...@gmail.com> wrote:
> > Hi Alex,
> >
> > Any update on the fix for this?
> > QPID-7753 is assigned a fix version for 7.0.0, I am hoping that the fix
> > will also be back ported to 6.0.x.
> >
> > Thanks
> > Ramayan
> >
> > On Mon, May 8, 2017 at 2:14 AM, Oleksandr Rudyy <or...@gmail.com>
> wrote:
> >
> >> Hi Ramayan,
> >>
> >> Thanks for testing the patch and providing a feedback.
> >>
> >> Regarding direct memory utilization, the Qpid Broker caches up to 256MB
> of
> >> direct memory internally in QpidByteBuffers. Thus, when testing the
> Broker
> >> with only 256MB of direct memory, the entire direct memory could be
> cached
> >> and it would look as if direct memory is never released. Potentially,
> you
> >> can reduce the number of buffers cached on broker by changing context
> >> variable 'broker.directByteBufferPoolSize'. By default, it is set to
> 1000.
> >> With buffer size of 256K, it would give ~256M of cache.
> >>
> >> Regarding introducing lower and upper thresholds for 'flow to disk'. It
> >> seems like a good idea and we will try to implement it early this week
> on
> >> trunk first.
> >>
> >> Kind Regards,
> >> Alex
> >>
> >>
> >> On 5 May 2017 at 23:49, Ramayan Tiwari <ra...@gmail.com>
> wrote:
> >>
> >> > Hi Alex,
> >> >
> >> > Thanks for providing the patch. I verified the fix with same perf
> test,
> >> and
> >> > it does prevent broker from going OOM, however. DM utilization doesn't
> >> get
> >> > any better after hitting the threshold (where flow to disk is
> activated
> >> > based on total used % across broker - graph in the link below).
> >> >
> >> > After hitting the final threshold, flow to disk activates and
> deactivates
> >> > pretty frequently across all the queues. The reason seems to be
> because
> >> > there is only one threshold currently to trigger flow to disk. Would
> it
> >> > make sense to break this down to high and low threshold - so that once
> >> flow
> >> > to disk is active after hitting high threshold, it will be active
> until
> >> the
> >> > queue utilization (or broker DM allocation) reaches the low threshold.
> >> >
> >> > Graph and flow to disk logs are here:
> >> > https://docs.google.com/document/d/1Wc1e-id-
> >> WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> >> > U-RiM/edit#heading=h.6400pltvjhy7
> >> >
> >> > Thanks
> >> > Ramayan
> >> >
> >> > On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <or...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Ramayan,
> >> > >
> >> > > We attached to the QPID-7753 a patch with a work around for 6.0.x
> >> branch.
> >> > > It triggers flow to disk based on direct memory consumption rather
> than
> >> > > estimation of the space occupied by the message content. The flow to
> >> disk
> >> > > should evacuate message content preventing running out of direct
> >> memory.
> >> > We
> >> > > already committed the changes into 6.0.x and 6.1.x branches. It
> will be
> >> > > included into upcoming 6.0.7 and 6.1.3 releases.
> >> > >
> >> > > Please try and test the patch in your environment.
> >> > >
> >> > > We are still working at finishing of the fix for trunk.
> >> > >
> >> > > Kind Regards,
> >> > > Alex
> >> > >
> >> > > On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com>
> >> wrote:
> >> > >
> >> > > > Hi Ramayan,
> >> > > >
> >> > > > The high-level plan is currently as follows:
> >> > > >  1) Periodically try to compact sparse direct memory buffers.
> >> > > >  2) Increase accuracy of messages' direct memory usage estimation
> to
> >> > more
> >> > > > reliably trigger flow to disk.
> >> > > >  3) Add an additional flow to disk trigger based on the amount of
> >> > > allocated
> >> > > > direct memory.
> >> > > >
> >> > > > A little bit more details:
> >> > > >  1) We plan on periodically checking the amount of direct memory
> >> usage
> >> > > and
> >> > > > if it is above a
> >> > > >     threshold (50%) we compare the sum of all queue sizes with the
> >> > amount
> >> > > > of allocated direct memory.
> >> > > >     If the ratio falls below a certain threshold we trigger a
> >> > compaction
> >> > > > task which goes through all queues
> >> > > >     and copy's a certain amount of old message buffers into new
> ones
> >> > > > thereby freeing the old buffers so
> >> > > >     that they can be returned to the buffer pool and be reused.
> >> > > >
> >> > > >  2) Currently we trigger flow to disk based on an estimate of how
> >> much
> >> > > > memory the messages on the
> >> > > >     queues consume. We had to use estimates because we did not
> have
> >> > > > accurate size numbers for
> >> > > >     message headers. By having accurate size information for
> message
> >> > > > headers we can more reliably
> >> > > >     enforce queue memory limits.
> >> > > >
> >> > > >  3) The flow to disk trigger based on message size had another
> >> problem
> >> > > > which is more pertinent to the
> >> > > >     current issue. We only considered the size of the messages and
> >> not
> >> > > how
> >> > > > much memory we allocate
> >> > > >     to store those messages. In the FIFO use case those numbers
> will
> >> be
> >> > > > very close to each other but in
> >> > > >     use cases like yours we can end up with sparse buffers and the
> >> > > numbers
> >> > > > will diverge. Because of this
> >> > > >     divergence we do not trigger flow to disk in time and the
> broker
> >> > can
> >> > > go
> >> > > > OOM.
> >> > > >     To fix the issue we want to add an additional flow to disk
> >> trigger
> >> > > > based on the amount of allocated direct
> >> > > >     memory. This should prevent the broker from going OOM even if
> the
> >> > > > compaction strategy outlined above
> >> > > >     should fail for some reason (e.g., the compaction task cannot
> >> keep
> >> > up
> >> > > > with the arrival of new messages).
> >> > > >
> >> > > > Currently, there are patches for the above points but they suffer
> >> from
> >> > > some
> >> > > > thread-safety issues that need to be addressed.
> >> > > >
> >> > > > I hope this description helps. Any feedback is, as always,
> welcome.
> >> > > >
> >> > > > Kind regards,
> >> > > > Lorenz
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
> >> > > ramayan.tiwari@gmail.com
> >> > > > >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi Lorenz,
> >> > > > >
> >> > > > > Thanks so much for the patch. We have a perf test now to
> reproduce
> >> > this
> >> > > > > issue, so we did test with 256KB, 64KB and 4KB network byte
> buffer.
> >> > > None
> >> > > > of
> >> > > > > these configurations help with the issue (or give any more
> >> breathing
> >> > > > room)
> >> > > > > for our use case. We would like to share the perf analysis with
> the
> >> > > > > community:
> >> > > > >
> >> > > > > https://docs.google.com/document/d/1Wc1e-id-
> >> > > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> >> > > > > U-RiM/edit?usp=sharing
> >> > > > >
> >> > > > > Feel free to comment on the doc if certain details are
> incorrect or
> >> > if
> >> > > > > there are questions.
> >> > > > >
> >> > > > > Since the short term solution doesn't help us, we are very
> >> interested
> >> > > in
> >> > > > > getting some details on how the community plans to address
> this, a
> >> > high
> >> > > > > level description of the approach will be very helpful for us in
> >> > order
> >> > > to
> >> > > > > brainstorm our use cases along with this solution.
> >> > > > >
> >> > > > > - Ramayan
> >> > > > >
> >> > > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
> >> > quack.lorenz@gmail.com>
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hello Ramayan,
> >> > > > > >
> >> > > > > > We are still working on a fix for this issue.
> >> > > > > > In the mean time we had an idea to potentially workaround the
> >> issue
> >> > > > until
> >> > > > > > a proper fix is released.
> >> > > > > >
> >> > > > > > The idea is to decrease the qpid network buffer size the
> broker
> >> > uses.
> >> > > > > > While this still allows for sparsely populated buffers it
> would
> >> > > improve
> >> > > > > > the overall occupancy ratio.
> >> > > > > >
> >> > > > > > Here are the steps to follow:
> >> > > > > >  * ensure you are not using TLS
> >> > > > > >  * apply the attached patch
> >> > > > > >  * figure out the size of the largest messages you are sending
> >> > > > (including
> >> > > > > > header and some overhead)
> >> > > > > >  * set the context variable "qpid.broker.networkBufferSize"
> to
> >> > that
> >> > > > > value
> >> > > > > > but not smaller than 4096
> >> > > > > >  * test
> >> > > > > >
> >> > > > > > Decreasing the qpid network buffer size automatically limits
> the
> >> > > > maximum
> >> > > > > > AMQP frame size.
> >> > > > > > Since you are using a very old client we are not sure how
> well it
> >> > > copes
> >> > > > > > with small frame sizes where it has to split a message across
> >> > > multiple
> >> > > > > > frames.
> >> > > > > > Therefore, to play it safe you should not set it smaller than
> the
> >> > > > largest
> >> > > > > > messages (+ header + overhead) you are sending.
> >> > > > > > I do not know what message sizes you are sending but AMQP
> imposes
> >> > the
> >> > > > > > restriction that the framesize cannot be smaller than 4096
> bytes.
> >> > > > > > In the qpid broker the default currently is 256 kB.
> >> > > > > >
> >> > > > > > In the current state the broker does not allow setting the
> >> network
> >> > > > buffer
> >> > > > > > to values smaller than 64 kB to allow TLS frames to fit into
> one
> >> > > > network
> >> > > > > > buffer.
> >> > > > > > I attached a patch to this mail that lowers that restriction
> to
> >> the
> >> > > > limit
> >> > > > > > imposed by AMQP (4096 Bytes).
> >> > > > > > Obviously, you should not use this when using TLS.
> >> > > > > >
> >> > > > > >
> >> > > > > > I hope this reduces the problems you are currently facing
> until
> >> we
> >> > > can
> >> > > > > > complete the proper fix.
> >> > > > > >
> >> > > > > > Kind regards,
> >> > > > > > Lorenz
> >> > > > > >
> >> > > > > >
> >> > > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> >> > > > > > > Thanks so much Keith and the team for finding the root
> cause.
> >> We
> >> > > are
> >> > > > so
> >> > > > > > > relieved that we fix the root cause shortly.
> >> > > > > > >
> >> > > > > > > Couple of things that I forgot to mention on the mitigation
> >> steps
> >> > > we
> >> > > > > took
> >> > > > > > > in the last incident:
> >> > > > > > > 1) We triggered GC from JMX bean multiple times, it did not
> >> help
> >> > in
> >> > > > > > > reducing DM allocated.
> >> > > > > > > 2) We also killed all the AMQP connections to the broker
> when
> >> DM
> >> > > was
> >> > > > at
> >> > > > > > > 80%. This did not help either. The way we killed
> connections -
> >> > > using
> >> > > > > JMX
> >> > > > > > > got list of all the open AMQP connections and called close
> from
> >> > JMX
> >> > > > > > mbean.
> >> > > > > > >
> >> > > > > > > I am hoping the above two are not related to root cause, but
> >> > wanted
> >> > > > to
> >> > > > > > > bring it up in case this is relevant.
> >> > > > > > >
> >> > > > > > > Thanks
> >> > > > > > > Ramayan
> >> > > > > > >
> >> > > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <
> keith.wall@gmail.com
> >> >
> >> > > > wrote:
> >> > > > > > >
> >> > > > > > > >
> >> > > > > > > > Hello Ramayan
> >> > > > > > > >
> >> > > > > > > > I believe I understand the root cause of the problem.  We
> >> have
> >> > > > > > > > identified a flaw in the direct memory buffer management
> >> > employed
> >> > > > by
> >> > > > > > > > Qpid Broker J which for some messaging use-cases can lead
> to
> >> > the
> >> > > > OOM
> >> > > > > > > > direct you describe.   For the issue to manifest the
> >> producing
> >> > > > > > > > application needs to use a single connection for the
> >> production
> >> > > of
> >> > > > > > > > messages some of which are short-lived (i.e. are consumed
> >> > > quickly)
> >> > > > > > > > whilst others remain on the queue for some time.  Priority
> >> > > queues,
> >> > > > > > > > sorted queues and consumers utilising selectors that
> result
> >> in
> >> > > some
> >> > > > > > > > messages being left of the queue could all produce this
> >> patten.
> >> > > > The
> >> > > > > > > > pattern leads to a sparsely occupied 256K net buffers
> which
> >> > > cannot
> >> > > > be
> >> > > > > > > > released or reused until every message that reference a
> >> 'chunk'
> >> > > of
> >> > > > it
> >> > > > > > > > is either consumed or flown to disk.   The problem was
> >> > introduced
> >> > > > > with
> >> > > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> >> > > > > > > >
> >> > > > > > > > The flow to disk feature is not helping us here because
> its
> >> > > > algorithm
> >> > > > > > > > considers only the size of live messages on the queues. If
> >> the
> >> > > > > > > > accumulative live size does not exceed the threshold, the
> >> > > messages
> >> > > > > > > > aren't flown to disk. I speculate that when you observed
> that
> >> > > > moving
> >> > > > > > > > messages cause direct message usage to drop earlier today,
> >> your
> >> > > > > > > > message movement cause a queue to go over threshold, cause
> >> > > message
> >> > > > to
> >> > > > > > > > be flown to disk and their direct memory references
> released.
> >> > > The
> >> > > > > > > > logs will confirm this is so.
> >> > > > > > > >
> >> > > > > > > > I have not identified an easy workaround at the moment.
> >> > > >  Decreasing
> >> > > > > > > > the flow to disk threshold and/or increasing available
> direct
> >> > > > memory
> >> > > > > > > > should alleviate and may be an acceptable short term
> >> > workaround.
> >> > > > If
> >> > > > > > > > it were possible for publishing application to publish
> short
> >> > > lived
> >> > > > > and
> >> > > > > > > > long lived messages on two separate JMS connections this
> >> would
> >> > > > avoid
> >> > > > > > > > this defect.
> >> > > > > > > >
> >> > > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related
> this
> >> > > > problem.
> >> > > > > > > > We intend to be working on these early next week and will
> be
> >> > > aiming
> >> > > > > > > > for a fix that is back-portable to 6.0.
> >> > > > > > > >
> >> > > > > > > > Apologies that you have run into this defect and thanks
> for
> >> > > > > reporting.
> >> > > > > > > >
> >> > > > > > > > Thanks, Keith
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > >
> >> > > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> >> > > > ramayan.tiwari@gmail.com>
> >> > > > > > > > wrote:
> >> > > > > > > > >
> >> > > > > > > > > Hi All,
> >> > > > > > > > >
> >> > > > > > > > > We have been monitoring the brokers everyday and today
> we
> >> > found
> >> > > > one
> >> > > > > > > > instance
> >> > > > > > > > >
> >> > > > > > > > > where broker’s DM was constantly going up and was about
> to
> >> > > crash,
> >> > > > > so
> >> > > > > > we
> >> > > > > > > > > experimented some mitigations, one of which caused the
> DM
> >> to
> >> > > come
> >> > > > > > down.
> >> > > > > > > > > Following are the details, which might help us
> >> understanding
> >> > > the
> >> > > > > > issue:
> >> > > > > > > > >
> >> > > > > > > > > Traffic scenario:
> >> > > > > > > > >
> >> > > > > > > > > DM allocation had been constantly going up and was at
> 90%.
> >> > > There
> >> > > > > > were two
> >> > > > > > > > > queues which seemed to align with the theories that we
> had.
> >> > > Q1’s
> >> > > > > > size had
> >> > > > > > > > > been large right after the broker start and had slow
> >> > > consumption
> >> > > > of
> >> > > > > > > > > messages, queue size only reduced from 76MB to 75MB
> over a
> >> > > period
> >> > > > > of
> >> > > > > > > > 6hrs.
> >> > > > > > > > >
> >> > > > > > > > > Q2 on the other hand, started small and was gradually
> >> > growing,
> >> > > > > queue
> >> > > > > > size
> >> > > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues
> with
> >> > > > traffic
> >> > > > > > > > during
> >> > > > > > > > >
> >> > > > > > > > > this time.
> >> > > > > > > > >
> >> > > > > > > > > Action taken:
> >> > > > > > > > >
> >> > > > > > > > > Moved all the messages from Q2 (since this was our
> original
> >> > > > theory)
> >> > > > > > to Q3
> >> > > > > > > > > (already created but no messages in it). This did not
> help
> >> > with
> >> > > > the
> >> > > > > > DM
> >> > > > > > > > > growing up.
> >> > > > > > > > > Moved all the messages from Q1 to Q4 (already created
> but
> >> no
> >> > > > > > messages in
> >> > > > > > > > > it). This reduced DM allocation from 93% to 31%.
> >> > > > > > > > >
> >> > > > > > > > > We have the heap dump and thread dump from when broker
> was
> >> > 90%
> >> > > in
> >> > > > > DM
> >> > > > > > > > > allocation. We are going to analyze that to see if we
> can
> >> get
> >> > > > some
> >> > > > > > clue.
> >> > > > > > > > We
> >> > > > > > > > >
> >> > > > > > > > > wanted to share this new information which might help in
> >> > > > reasoning
> >> > > > > > about
> >> > > > > > > > the
> >> > > > > > > > >
> >> > > > > > > > > memory issue.
> >> > > > > > > > >
> >> > > > > > > > > - Ramayan
> >> > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> >> > > > > > > > ramayan.tiwari@gmail.com>
> >> > > > > > > > >
> >> > > > > > > > > wrote:
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Hi Keith,
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks so much for your response and digging into the
> >> > issue.
> >> > > > > Below
> >> > > > > > are
> >> > > > > > > > the
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > answer to your questions:
> >> > > > > > > > > >
> >> > > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't
> >> use
> >> > > 6.1
> >> > > > > > where it
> >> > > > > > > > > > was released because we need JMX support. Here is the
> >> > > > destination
> >> > > > > > > > format:
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > ""%s ; {node : { type : queue }, link : {
> x-subscribes :
> >> {
> >> > > > > > arguments : {
> >> > > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> >> > > > > > > > > >
> >> > > > > > > > > > 2) Our machines have 40 cores, which will make the
> number
> >> > of
> >> > > > > > threads to
> >> > > > > > > > > > 80. This might not be an issue, because this will
> show up
> >> > in
> >> > > > the
> >> > > > > > > > baseline DM
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > allocated, which is only 6% (of 4GB) when we just
> bring
> >> up
> >> > > the
> >> > > > > > broker.
> >> > > > > > > > > >
> >> > > > > > > > > > 3) The only setting that we tuned WRT to DM is
> >> > > > > flowToDiskThreshold,
> >> > > > > > > > which
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > is set at 80% now.
> >> > > > > > > > > >
> >> > > > > > > > > > 4) Only one virtual host in the broker.
> >> > > > > > > > > >
> >> > > > > > > > > > 5) Most of our queues (99%) are priority, we also have
> >> 8-10
> >> > > > > sorted
> >> > > > > > > > queues.
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > 6) Yeah we are using the standard 0.16 client and not
> >> AMQP
> >> > > 1.0
> >> > > > > > clients.
> >> > > > > > > > > > The connection log line looks like:
> >> > > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) :
> Protocol
> >> > > > Version
> >> > > > > :
> >> > > > > > 0-10
> >> > > > > > > > :
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Client ID : test : Client Version : 0.16 : Client
> >> Product :
> >> > > > qpid
> >> > > > > > > > > >
> >> > > > > > > > > > We had another broker crashed about an hour back, we
> do
> >> see
> >> > > the
> >> > > > > > same
> >> > > > > > > > > > patterns:
> >> > > > > > > > > > 1) There is a queue which is constantly growing,
> enqueue
> >> is
> >> > > > > faster
> >> > > > > > than
> >> > > > > > > > > > dequeue on that queue for a long period of time.
> >> > > > > > > > > > 2) Flow to disk didn't kick in at all.
> >> > > > > > > > > >
> >> > > > > > > > > > This graph shows memory growth (red line - heap, blue
> -
> >> DM
> >> > > > > > allocated,
> >> > > > > > > > > > yellow - DM used)
> >> > > > > > > > > >
> >> > > > > > > > > > https://drive.google.com/file/d/
> >> > > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> >> > > > > > > > view?usp=sharing
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > The below graph shows growth on a single queue (there
> are
> >> > > 10-12
> >> > > > > > other
> >> > > > > > > > > > queues with traffic as well, something large size than
> >> this
> >> > > > > queue):
> >> > > > > > > > > >
> >> > > > > > > > > > https://drive.google.com/file/d/
> >> > > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> >> > > > > > > > view?usp=sharing
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > Couple of questions:
> >> > > > > > > > > > 1) Is there any developer level doc/design spec on how
> >> Qpid
> >> > > > uses
> >> > > > > > DM?
> >> > > > > > > > > > 2) We are not getting heap dumps automatically when
> >> broker
> >> > > > > crashes
> >> > > > > > due
> >> > > > > > > > to
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has
> anyone
> >> > > > found a
> >> > > > > > way
> >> > > > > > > > to get
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > around this problem?
> >> > > > > > > > > >
> >> > > > > > > > > > Thanks
> >> > > > > > > > > > Ramayan
> >> > > > > > > > > >
> >> > > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
> >> > > keith.wall@gmail.com
> >> > > > >
> >> > > > > > wrote:
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > Hi Ramayan
> >> > > > > > > > > > >
> >> > > > > > > > > > > We have been discussing your problem here and have a
> >> > couple
> >> > > > of
> >> > > > > > > > questions.
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > I have been experimenting with use-cases based on
> your
> >> > > > > > descriptions
> >> > > > > > > > > > > above, but so far, have been unsuccessful in
> >> reproducing
> >> > a
> >> > > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
> >> > > > condition.
> >> > > > > > The
> >> > > > > > > > > > > direct memory usage reflects the expected model: it
> >> > levels
> >> > > > off
> >> > > > > > when
> >> > > > > > > > > > > the flow to disk threshold is reached and direct
> memory
> >> > is
> >> > > > > > release as
> >> > > > > > > > > > > messages are consumed until the minimum size for
> >> caching
> >> > of
> >> > > > > > direct is
> >> > > > > > > > > > > reached.
> >> > > > > > > > > > >
> >> > > > > > > > > > > 1] For clarity let me check: we believe when you say
> >> > "patch
> >> > > > to
> >> > > > > > use
> >> > > > > > > > > > > MultiQueueConsumer" you are referring to the patch
> >> > attached
> >> > > > to
> >> > > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the
> >> > broker"
> >> > > > > and
> >> > > > > > you
> >> > > > > > > > > > > are using a combination of this "x-pull-only"  with
> the
> >> > > > > standard
> >> > > > > > > > > > > "x-multiqueue" feature.  Is this correct?
> >> > > > > > > > > > >
> >> > > > > > > > > > > 2] One idea we had here relates to the size of the
> >> > > > virtualhost
> >> > > > > IO
> >> > > > > > > > > > > pool.   As you know from the documentation, the
> Broker
> >> > > > > > caches/reuses
> >> > > > > > > > > > > direct memory internally but the documentation
> fails to
> >> > > > > mentions
> >> > > > > > that
> >> > > > > > > > > > > each pooled virtualhost IO thread also grabs a chunk
> >> > (256K)
> >> > > > of
> >> > > > > > direct
> >> > > > > > > > > > > memory from this cache.  By default the virtual
> host IO
> >> > > pool
> >> > > > is
> >> > > > > > sized
> >> > > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors()
> *
> >> 2,
> >> > > > 64),
> >> > > > > > so if
> >> > > > > > > > > > > you have a machine with a very large number of
> cores,
> >> you
> >> > > may
> >> > > > > > have a
> >> > > > > > > > > > > surprising large amount of direct memory assigned to
> >> > > > > virtualhost
> >> > > > > > IO
> >> > > > > > > > > > > threads.   Check the value of
> connectionThreadPoolSize
> >> on
> >> > > the
> >> > > > > > > > > > > virtualhost
> >> > > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
> >> > > > > > virtualhostnodename>/<;
> >> > > > > > > > virtualhostname>)
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > to see what value is in force.  What is it?  It is
> >> > possible
> >> > > > to
> >> > > > > > tune
> >> > > > > > > > > > > the pool size using context variable
> >> > > > > > > > > > > virtualhost.connectionThreadPool.size.
> >> > > > > > > > > > >
> >> > > > > > > > > > > 3] Tell me if you are tuning the Broker in way
> beyond
> >> the
> >> > > > > > direct/heap
> >> > > > > > > > > > > memory settings you have told us about already.  For
> >> > > instance
> >> > > > > > you are
> >> > > > > > > > > > > changing any of the direct memory pooling settings
> >> > > > > > > > > > > broker.directByteBufferPoolSize, default network
> >> buffer
> >> > > size
> >> > > > > > > > > > > qpid.broker.networkBufferSize or applying any other
> >> > > > > non-standard
> >> > > > > > > > > > > settings?
> >> > > > > > > > > > >
> >> > > > > > > > > > > 4] How many virtual hosts do you have on the Broker?
> >> > > > > > > > > > >
> >> > > > > > > > > > > 5] What is the consumption pattern of the
> messages?  Do
> >> > > > consume
> >> > > > > > in a
> >> > > > > > > > > > > strictly FIFO fashion or are you making use of
> message
> >> > > > > selectors
> >> > > > > > > > > > > or/and any of the out-of-order queue types (LVQs,
> >> > priority
> >> > > > > queue
> >> > > > > > or
> >> > > > > > > > > > > sorted queues)?
> >> > > > > > > > > > >
> >> > > > > > > > > > > 6] Is it just the 0.16 client involved in the
> >> > application?
> >> > > > >  Can
> >> > > > > > I
> >> > > > > > > > > > > check that you are not using any of the AMQP 1.0
> >> clients
> >> > > > > > > > > > > (org,apache.qpid:qpid-jms-client or
> >> > > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the
> software
> >> > > stack
> >> > > > > (as
> >> > > > > > either
> >> > > > > > > > > > > consumers or producers)
> >> > > > > > > > > > >
> >> > > > > > > > > > > Hopefully the answers to these questions will get us
> >> > closer
> >> > > > to
> >> > > > > a
> >> > > > > > > > > > > reproduction.   If you are able to reliable
> reproduce
> >> it,
> >> > > > > please
> >> > > > > > share
> >> > > > > > > > > > > the steps with us.
> >> > > > > > > > > > >
> >> > > > > > > > > > > Kind regards, Keith.
> >> > > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> >> > > > > > ramayan.tiwari@gmail.com>
> >> > > > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > After a lot of log mining, we might have a way to
> >> > explain
> >> > > > the
> >> > > > > > > > sustained
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > increased in DirectMemory allocation, the
> correlation
> >> > > seems
> >> > > > > to
> >> > > > > > be
> >> > > > > > > > with
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > the
> >> > > > > > > > > > > > growth in the size of a Queue that is getting
> >> consumed
> >> > > but
> >> > > > at
> >> > > > > > a much
> >> > > > > > > > > > > > slower
> >> > > > > > > > > > > > rate than producers putting messages on this
> queue.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > The pattern we see is that in each instance of
> broker
> >> > > > crash,
> >> > > > > > there is
> >> > > > > > > > > > > > at
> >> > > > > > > > > > > > least one queue (usually 1 queue) whose size kept
> >> > growing
> >> > > > > > steadily.
> >> > > > > > > > > > > > It’d be
> >> > > > > > > > > > > > of significant size but not the largest queue --
> >> > usually
> >> > > > > there
> >> > > > > > are
> >> > > > > > > > > > > > multiple
> >> > > > > > > > > > > > larger queues -- but it was different from other
> >> queues
> >> > > in
> >> > > > > > that its
> >> > > > > > > > > > > > size
> >> > > > > > > > > > > > was growing steadily. The queue would also be
> moving,
> >> > but
> >> > > > its
> >> > > > > > > > > > > > processing
> >> > > > > > > > > > > > rate was not keeping up with the enqueue rate.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Our theory that might be totally wrong: If a
> queue is
> >> > > > moving
> >> > > > > > the
> >> > > > > > > > entire
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > time, maybe then the broker would keep reusing the
> >> same
> >> > > > > buffer
> >> > > > > > in
> >> > > > > > > > > > > > direct
> >> > > > > > > > > > > > memory for the queue, and keep on adding onto it
> at
> >> the
> >> > > end
> >> > > > > to
> >> > > > > > > > > > > > accommodate
> >> > > > > > > > > > > > new messages. But because it’s active all the time
> >> and
> >> > > > we’re
> >> > > > > > pointing
> >> > > > > > > > > > > > to
> >> > > > > > > > > > > > the same buffer, space allocated for messages at
> the
> >> > head
> >> > > > of
> >> > > > > > the
> >> > > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long
> after
> >> > those
> >> > > > > > messages
> >> > > > > > > > have
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > been processed. Just a theory.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > We are also trying to reproduce this using some
> perf
> >> > > tests
> >> > > > to
> >> > > > > > enqueue
> >> > > > > > > > > > > > with
> >> > > > > > > > > > > > same pattern, will update with the findings.
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > Ramayan
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
> >> > > > > > > > > > > > <ra...@gmail.com>
> >> > > > > > > > > > > > wrote:
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Another issue that we noticed is when broker
> goes
> >> OOM
> >> > > due
> >> > > > > to
> >> > > > > > direct
> >> > > > > > > > > > > > > memory, it doesn't create heap dump (specified
> by
> >> > > "-XX:+
> >> > > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM
> >> error
> >> > > is
> >> > > > > > same as
> >> > > > > > > > what
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > is
> >> > > > > > > > > > > > > mentioned in the oracle JVM docs
> >> > > > > > ("java.lang.OutOfMemoryError").
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > Has anyone been able to find a way to get to
> heap
> >> > dump
> >> > > > for
> >> > > > > > DM OOM?
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > - Ramayan
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
> >> > > > > > > > > > > > > <ramayan.tiwari@gmail.com
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Alex,
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Below are the flow to disk logs from broker
> >> having
> >> > > > > > 3million+
> >> > > > > > > > messages
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > this time. We only have one virtual host.
> Time is
> >> > in
> >> > > > GMT.
> >> > > > > > Looks
> >> > > > > > > > like
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > flow
> >> > > > > > > > > > > > > > to disk is active on the whole virtual host
> and
> >> > not a
> >> > > > > > queue level.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > When the same broker went OOM yesterday, I did
> >> not
> >> > > see
> >> > > > > any
> >> > > > > > flow to
> >> > > > > > > > > > > > > > disk
> >> > > > > > > > > > > > > > logs from when it was started until it crashed
> >> > > (crashed
> >> > > > > > twice
> >> > > > > > > > within
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > 4hrs).
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3356539KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3354866KB
> >> > > > > > > > > > > > > > within threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3358509KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3353501KB
> >> > > > > > > > > > > > > > within threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3357544KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3353236KB
> >> > > > > > > > > > > > > > within threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3356704KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3353511KB
> >> > > > > > > > > > > > > > within threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3357948KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3355310KB
> >> > > > > > > > > > > > > > within threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3365624KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3355136KB
> >> > > > > > > > > > > > > > within threshold 3355443KB
> >> > > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO
> >> [Housekeeping[test]] -
> >> > > > > > > > > > > > > > [Housekeeping[test]]
> >> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> >> > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > 3358683KB
> >> > > > > > > > > > > > > > exceeds threshold 3355443KB
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > After production release (2days back), we have
> >> > seen 4
> >> > > > > > crashes in 3
> >> > > > > > > > > > > > > > different brokers, this is the most pressing
> >> > concern
> >> > > > for
> >> > > > > > us in
> >> > > > > > > > > > > > > > decision if
> >> > > > > > > > > > > > > > we should roll back to 0.32. Any help is
> greatly
> >> > > > > > appreciated.
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > Ramayan
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr
> Rudyy
> >> <
> >> > > > > > orudyy@gmail.com
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Ramayan,
> >> > > > > > > > > > > > > > > Thanks for the details. I would like to
> clarify
> >> > > > whether
> >> > > > > > flow to
> >> > > > > > > > disk
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > was
> >> > > > > > > > > > > > > > > triggered today for 3 million messages?
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > The following logs are issued for flow to
> disk:
> >> > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> >> Message
> >> > > > > memory
> >> > > > > > use
> >> > > > > > > > > > > > > > > {0,number,#}KB
> >> > > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> >> > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> >> > Message
> >> > > > > > memory use
> >> > > > > > > > > > > > > > > {0,number,#}KB within threshold
> >> {1,number,#.##}KB
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > Kind Regards,
> >> > > > > > > > > > > > > > > Alex
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
> >> > > > > > > > ramayan.tiwari@gmail.com>
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Hi Alex,
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Thanks for your response, here are the
> >> details:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > We use "direct" exchange, without
> persistence
> >> > (we
> >> > > > > > specify
> >> > > > > > > > > > > > > > > NON_PERSISTENT
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > that while sending from client) and use
> BDB
> >> > > store.
> >> > > > We
> >> > > > > > use JSON
> >> > > > > > > > > > > > > > > > virtual
> >> > > > > > > > > > > > > > > host
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > type. We are not using SSL.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > When the broker went OOM, we had around
> 1.3
> >> > > million
> >> > > > > > messages
> >> > > > > > > > with
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > 100
> >> > > > > > > > > > > > > > > bytes
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > average message size. Direct memory
> >> allocation
> >> > > > (value
> >> > > > > > read from
> >> > > > > > > > > > > > > > > > MBean)
> >> > > > > > > > > > > > > > > kept
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > going up, even though it wouldn't need
> more
> >> DM
> >> > to
> >> > > > > > store these
> >> > > > > > > > many
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > messages. DM allocated persisted at 99%
> for
> >> > > about 3
> >> > > > > > and half
> >> > > > > > > > hours
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > before
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > crashing.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Today, on the same broker we have 3
> million
> >> > > > messages
> >> > > > > > (same
> >> > > > > > > > message
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > size)
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > and DM allocated is only at 8%. This seems
> >> like
> >> > > > there
> >> > > > > > is some
> >> > > > > > > > > > > > > > > > issue
> >> > > > > > > > > > > > > > > with
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > de-allocation or a leak.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > I have uploaded the memory utilization
> graph
> >> > > here:
> >> > > > > > > > > > > > > > > > https://drive.google.com/file/d/
> >> > > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> >> > > > > > > > > > > > > > > > view?usp=sharing
> >> > > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM
> Used
> >> > (sum
> >> > > > of
> >> > > > > > queue
> >> > > > > > > > > > > > > > > > payload)
> >> > > > > > > > > > > > > > > and Red
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > is heap usage.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > > > Ramayan
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr
> >> > Rudyy
> >> > > > > > > > > > > > > > > > <or...@gmail.com>
> >> > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Hi Ramayan,
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Could please share with us the details
> of
> >> > > > messaging
> >> > > > > > use
> >> > > > > > > > case(s)
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > which
> >> > > > > > > > > > > > > > > > ended
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > up in OOM on broker side?
> >> > > > > > > > > > > > > > > > > I would like to reproduce the issue on
> my
> >> > local
> >> > > > > > broker in
> >> > > > > > > > order
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > to
> >> > > > > > > > > > > > > > > fix
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > it.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > I would appreciate if you could provide
> as
> >> > much
> >> > > > > > details as
> >> > > > > > > > > > > > > > > > > possible,
> >> > > > > > > > > > > > > > > > > including, messaging topology, message
> >> > > > persistence
> >> > > > > > type,
> >> > > > > > > > message
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > sizes,volumes, etc.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for
> >> > > keeping
> >> > > > > > message
> >> > > > > > > > content
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > > > receiving/sending data. Each plain
> >> connection
> >> > > > > > utilizes 512K of
> >> > > > > > > > > > > > > > > > > direct
> >> > > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of
> >> direct
> >> > > > > > memory. Your
> >> > > > > > > > > > > > > > > > > memory
> >> > > > > > > > > > > > > > > > settings
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > look Ok to me.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > Kind Regards,
> >> > > > > > > > > > > > > > > > > Alex
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan
> Tiwari
> >> > > > > > > > > > > > > > > > > <ra...@gmail.com>
> >> > > > > > > > > > > > > > > > > wrote:
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Hi All,
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with
> >> patch
> >> > to
> >> > > > use
> >> > > > > > > > > > > > > > > MultiQueueConsumer
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > feature. We just finished deploying to
> >> > > > production
> >> > > > > > and saw
> >> > > > > > > > > > > > > > > > > > couple of
> >> > > > > > > > > > > > > > > > > > instances of broker OOM due to running
> >> out
> >> > of
> >> > > > > > DirectMemory
> >> > > > > > > > > > > > > > > > > > buffer
> >> > > > > > > > > > > > > > > > > > (exceptions at the end of this email).
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Here is our setup:
> >> > > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g
> >> (this
> >> > > is
> >> > > > > > opposite of
> >> > > > > > > > > > > > > > > > > > what the
> >> > > > > > > > > > > > > > > > > > recommendation is, however, for our
> use
> >> > cause
> >> > > > > > message
> >> > > > > > > > payload
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > is
> >> > > > > > > > > > > > > > > really
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > small ~400bytes and is way less than
> the
> >> > per
> >> > > > > > message
> >> > > > > > > > overhead
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > 1KB).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > In
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > perf testing, we were able to put 2
> >> million
> >> > > > > > messages without
> >> > > > > > > > > > > > > > > > > > any
> >> > > > > > > > > > > > > > > > issues.
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
> >> > > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and
> >> > there
> >> > > is
> >> > > > > > one multi
> >> > > > > > > > > > > > > > > > > > queue
> >> > > > > > > > > > > > > > > > consumer
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > attached to each session, listening to
> >> > around
> >> > > > > 1000
> >> > > > > > queues.
> >> > > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I
> >> know).
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > With the above setup, the baseline
> >> > > utilization
> >> > > > > > (without any
> >> > > > > > > > > > > > > > > messages)
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > for
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > direct memory was around 230mb (with
> 410
> >> > > > > > connection each
> >> > > > > > > > > > > > > > > > > > taking
> >> > > > > > > > > > > > > > > 500KB).
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Based on our understanding of broker
> >> memory
> >> > > > > > allocation,
> >> > > > > > > > > > > > > > > > > > message
> >> > > > > > > > > > > > > > > payload
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > should be the only thing adding to
> direct
> >> > > > memory
> >> > > > > > utilization
> >> > > > > > > > > > > > > > > > > > (on
> >> > > > > > > > > > > > > > > top of
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > baseline), however, we are
> experiencing
> >> > > > something
> >> > > > > > completely
> >> > > > > > > > > > > > > > > different.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > In
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > our last broker crash, we see that
> broker
> >> > is
> >> > > > > > constantly
> >> > > > > > > > > > > > > > > > > > running
> >> > > > > > > > > > > > > > > with
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > 90%+
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > direct memory allocated, even when
> >> message
> >> > > > > payload
> >> > > > > > sum from
> >> > > > > > > > > > > > > > > > > > all the
> >> > > > > > > > > > > > > > > > > queues
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > is only 6-8% (these % are against
> >> available
> >> > > DM
> >> > > > of
> >> > > > > > 4gb).
> >> > > > > > > > During
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > these
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > high
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > DM usage period, heap usage was around
> >> 60%
> >> > > (of
> >> > > > > > 12gb).
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > We would like some help in
> understanding
> >> > what
> >> > > > > > could be the
> >> > > > > > > > > > > > > > > > > > reason
> >> > > > > > > > > > > > > > > of
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > these
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > high DM allocations. Are there things
> >> other
> >> > > > than
> >> > > > > > message
> >> > > > > > > > > > > > > > > > > > payload
> >> > > > > > > > > > > > > > > and
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > AMQP
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > connection, which use DM and could be
> >> > > > > contributing
> >> > > > > > to these
> >> > > > > > > > > > > > > > > > > > high
> >> > > > > > > > > > > > > > > usage?
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Another thing where we are puzzled is
> the
> >> > > > > > de-allocation of
> >> > > > > > > > DM
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > byte
> >> > > > > > > > > > > > > > > > > buffers.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > From log mining of heap and DM
> >> utilization,
> >> > > > > > de-allocation of
> >> > > > > > > > > > > > > > > > > > DM
> >> > > > > > > > > > > > > > > doesn't
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has
> >> seen
> >> > > any
> >> > > > > > documentation
> >> > > > > > > > > > > > > > > related to
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > this, it would be very helpful if you
> >> could
> >> > > > share
> >> > > > > > that.
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > Thanks
> >> > > > > > > > > > > > > > > > > > Ramayan
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > *Exceptions*
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct
> buffer
> >> > > > memory
> >> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> >> > > Bits.java:658)
> >> > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> >> > > > > > init>(DirectByteBuffer.java:
> >> > > > > > > > 123)
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> >> > > > > allocateDirect(ByteBuffer.
> >> > > > > > java:311)
> >> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> >> > > > > > QpidByteBuffer.allocateDirect(
> >> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> >> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingConnectionPlainD
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > elegate.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> >> > > > > > > > NonBlockingConnectionPlainDele
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > gate.java:93)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingConnectionPlainDele
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > gate.processData(
> >> > > > NonBlockingConnectionPlainDele
> >> > > > > > > > gate.java:60)
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingConnection.doRead(
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingConnection.doWork(
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NetworkConnectionScheduler.
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > processConnection(
> >> > > NetworkConnectionScheduler.
> >> > > > > > java:124)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >> > > > transport.SelectorThread$
> >> > > > > > > > ConnectionPr
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > ocessor.
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > processConnection(
> >> SelectorThread.java:504)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >> > > > transport.SelectorThread$
> >> > > > > > > > > > > > > > > > > > SelectionTask.performSelect(
> >> > > > > > SelectorThread.java:337)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >> > > > transport.SelectorThread$
> >> > > > > > > > SelectionTask.run(
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > SelectorThread.java:87)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >> > > > > > transport.SelectorThread.run(
> >> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >> > > > > ThreadPoolExecutor.runWorker(
> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >> > > > > > ThreadPoolExecutor$Worker.run(
> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.
> java:745)
> >> > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > *Second exception*
> >> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct
> buffer
> >> > > > memory
> >> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> >> > > Bits.java:658)
> >> > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> >> > > > > > init>(DirectByteBuffer.java:
> >> > > > > > > > 123)
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> >> > > > > allocateDirect(ByteBuffer.
> >> > > > > > java:311)
> >> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> >> > > > > > QpidByteBuffer.allocateDirect(
> >> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> >> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingConnectionPlainDele
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > gate.<init>(
> >> NonBlockingConnectionPlainDele
> >> > > > > > gate.java:45)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > NonBlockingConnection.
> >> > > > > > > > > > > > > > > > > > setTransportEncryption(
> >> > > > > NonBlockingConnection.java:
> >> > > > > > 625)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingConnection.<init>(
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> >> > > > > > > > NonBlockingNetworkTransport.
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > acceptSocketChannel(
> >> > > > NonBlockingNetworkTransport.
> >> > > > > > java:158)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >> > > > transport.SelectorThread$
> >> > > > > > > > SelectionTas
> >> > > > > > > > >
> >> > > > > > > > > >
> >> > > > > > > > > > >
> >> > > > > > > > > > > >
> >> > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > k$1.run(
> >> > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > > > > > > SelectorThread.java:191)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> >> > > > > > transport.SelectorThread.run(
> >> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
> >> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >> > > > > ThreadPoolExecutor.runWorker(
> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > > at
> >> > > > > > > > > > > > > > > > > > java.util.concurrent.
> >> > > > > > ThreadPoolExecutor$Worker.run(
> >> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> >> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.
> java:745)
> >> > > > > > ~[na:1.8.0_40]
> >> > > > > > > > > > > > > > > > > >
> >> > > > > > > > > > > > > >
> >> > > > > > > > > > > ------------------------------
> >> > > ------------------------------
> >> > > > > > ---------
> >> > > > > > > > > > > To unsubscribe, e-mail:
> users-unsubscribe@qpid.apache.
> >> > org
> >> > > > > > > > > > > For additional commands, e-mail:
> >> > > users-help@qpid.apache.org
> >> > > > > > > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > ------------------------------------------------------------
> >> > > ---------
> >> > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> >> > > > > > For additional commands, e-mail: users-help@qpid.apache.org
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> For additional commands, e-mail: users-help@qpid.apache.org
>
>

Re: Java broker OOM due to DirectMemory

Posted by Keith W <ke...@gmail.com>.
Hi Ramayan

We are still looking at our approach to the Broker's flow to disk
feature in light of the defect you highlighted.  We have some work in
flight this week investigating alternative approaches which I am
hoping will conclude by the end of week.  I should be able to update
you then.

Thanks Keith

On 12 May 2017 at 20:58, Ramayan Tiwari <ra...@gmail.com> wrote:
> Hi Alex,
>
> Any update on the fix for this?
> QPID-7753 is assigned a fix version for 7.0.0, I am hoping that the fix
> will also be back ported to 6.0.x.
>
> Thanks
> Ramayan
>
> On Mon, May 8, 2017 at 2:14 AM, Oleksandr Rudyy <or...@gmail.com> wrote:
>
>> Hi Ramayan,
>>
>> Thanks for testing the patch and providing a feedback.
>>
>> Regarding direct memory utilization, the Qpid Broker caches up to 256MB of
>> direct memory internally in QpidByteBuffers. Thus, when testing the Broker
>> with only 256MB of direct memory, the entire direct memory could be cached
>> and it would look as if direct memory is never released. Potentially, you
>> can reduce the number of buffers cached on broker by changing context
>> variable 'broker.directByteBufferPoolSize'. By default, it is set to 1000.
>> With buffer size of 256K, it would give ~256M of cache.
>>
>> Regarding introducing lower and upper thresholds for 'flow to disk'. It
>> seems like a good idea and we will try to implement it early this week on
>> trunk first.
>>
>> Kind Regards,
>> Alex
>>
>>
>> On 5 May 2017 at 23:49, Ramayan Tiwari <ra...@gmail.com> wrote:
>>
>> > Hi Alex,
>> >
>> > Thanks for providing the patch. I verified the fix with same perf test,
>> and
>> > it does prevent broker from going OOM, however. DM utilization doesn't
>> get
>> > any better after hitting the threshold (where flow to disk is activated
>> > based on total used % across broker - graph in the link below).
>> >
>> > After hitting the final threshold, flow to disk activates and deactivates
>> > pretty frequently across all the queues. The reason seems to be because
>> > there is only one threshold currently to trigger flow to disk. Would it
>> > make sense to break this down to high and low threshold - so that once
>> flow
>> > to disk is active after hitting high threshold, it will be active until
>> the
>> > queue utilization (or broker DM allocation) reaches the low threshold.
>> >
>> > Graph and flow to disk logs are here:
>> > https://docs.google.com/document/d/1Wc1e-id-
>> WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
>> > U-RiM/edit#heading=h.6400pltvjhy7
>> >
>> > Thanks
>> > Ramayan
>> >
>> > On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <or...@gmail.com>
>> wrote:
>> >
>> > > Hi Ramayan,
>> > >
>> > > We attached to the QPID-7753 a patch with a work around for 6.0.x
>> branch.
>> > > It triggers flow to disk based on direct memory consumption rather than
>> > > estimation of the space occupied by the message content. The flow to
>> disk
>> > > should evacuate message content preventing running out of direct
>> memory.
>> > We
>> > > already committed the changes into 6.0.x and 6.1.x branches. It will be
>> > > included into upcoming 6.0.7 and 6.1.3 releases.
>> > >
>> > > Please try and test the patch in your environment.
>> > >
>> > > We are still working at finishing of the fix for trunk.
>> > >
>> > > Kind Regards,
>> > > Alex
>> > >
>> > > On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com>
>> wrote:
>> > >
>> > > > Hi Ramayan,
>> > > >
>> > > > The high-level plan is currently as follows:
>> > > >  1) Periodically try to compact sparse direct memory buffers.
>> > > >  2) Increase accuracy of messages' direct memory usage estimation to
>> > more
>> > > > reliably trigger flow to disk.
>> > > >  3) Add an additional flow to disk trigger based on the amount of
>> > > allocated
>> > > > direct memory.
>> > > >
>> > > > A little bit more details:
>> > > >  1) We plan on periodically checking the amount of direct memory
>> usage
>> > > and
>> > > > if it is above a
>> > > >     threshold (50%) we compare the sum of all queue sizes with the
>> > amount
>> > > > of allocated direct memory.
>> > > >     If the ratio falls below a certain threshold we trigger a
>> > compaction
>> > > > task which goes through all queues
>> > > >     and copy's a certain amount of old message buffers into new ones
>> > > > thereby freeing the old buffers so
>> > > >     that they can be returned to the buffer pool and be reused.
>> > > >
>> > > >  2) Currently we trigger flow to disk based on an estimate of how
>> much
>> > > > memory the messages on the
>> > > >     queues consume. We had to use estimates because we did not have
>> > > > accurate size numbers for
>> > > >     message headers. By having accurate size information for message
>> > > > headers we can more reliably
>> > > >     enforce queue memory limits.
>> > > >
>> > > >  3) The flow to disk trigger based on message size had another
>> problem
>> > > > which is more pertinent to the
>> > > >     current issue. We only considered the size of the messages and
>> not
>> > > how
>> > > > much memory we allocate
>> > > >     to store those messages. In the FIFO use case those numbers will
>> be
>> > > > very close to each other but in
>> > > >     use cases like yours we can end up with sparse buffers and the
>> > > numbers
>> > > > will diverge. Because of this
>> > > >     divergence we do not trigger flow to disk in time and the broker
>> > can
>> > > go
>> > > > OOM.
>> > > >     To fix the issue we want to add an additional flow to disk
>> trigger
>> > > > based on the amount of allocated direct
>> > > >     memory. This should prevent the broker from going OOM even if the
>> > > > compaction strategy outlined above
>> > > >     should fail for some reason (e.g., the compaction task cannot
>> keep
>> > up
>> > > > with the arrival of new messages).
>> > > >
>> > > > Currently, there are patches for the above points but they suffer
>> from
>> > > some
>> > > > thread-safety issues that need to be addressed.
>> > > >
>> > > > I hope this description helps. Any feedback is, as always, welcome.
>> > > >
>> > > > Kind regards,
>> > > > Lorenz
>> > > >
>> > > >
>> > > >
>> > > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
>> > > ramayan.tiwari@gmail.com
>> > > > >
>> > > > wrote:
>> > > >
>> > > > > Hi Lorenz,
>> > > > >
>> > > > > Thanks so much for the patch. We have a perf test now to reproduce
>> > this
>> > > > > issue, so we did test with 256KB, 64KB and 4KB network byte buffer.
>> > > None
>> > > > of
>> > > > > these configurations help with the issue (or give any more
>> breathing
>> > > > room)
>> > > > > for our use case. We would like to share the perf analysis with the
>> > > > > community:
>> > > > >
>> > > > > https://docs.google.com/document/d/1Wc1e-id-
>> > > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
>> > > > > U-RiM/edit?usp=sharing
>> > > > >
>> > > > > Feel free to comment on the doc if certain details are incorrect or
>> > if
>> > > > > there are questions.
>> > > > >
>> > > > > Since the short term solution doesn't help us, we are very
>> interested
>> > > in
>> > > > > getting some details on how the community plans to address this, a
>> > high
>> > > > > level description of the approach will be very helpful for us in
>> > order
>> > > to
>> > > > > brainstorm our use cases along with this solution.
>> > > > >
>> > > > > - Ramayan
>> > > > >
>> > > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
>> > quack.lorenz@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hello Ramayan,
>> > > > > >
>> > > > > > We are still working on a fix for this issue.
>> > > > > > In the mean time we had an idea to potentially workaround the
>> issue
>> > > > until
>> > > > > > a proper fix is released.
>> > > > > >
>> > > > > > The idea is to decrease the qpid network buffer size the broker
>> > uses.
>> > > > > > While this still allows for sparsely populated buffers it would
>> > > improve
>> > > > > > the overall occupancy ratio.
>> > > > > >
>> > > > > > Here are the steps to follow:
>> > > > > >  * ensure you are not using TLS
>> > > > > >  * apply the attached patch
>> > > > > >  * figure out the size of the largest messages you are sending
>> > > > (including
>> > > > > > header and some overhead)
>> > > > > >  * set the context variable "qpid.broker.networkBufferSize" to
>> > that
>> > > > > value
>> > > > > > but not smaller than 4096
>> > > > > >  * test
>> > > > > >
>> > > > > > Decreasing the qpid network buffer size automatically limits the
>> > > > maximum
>> > > > > > AMQP frame size.
>> > > > > > Since you are using a very old client we are not sure how well it
>> > > copes
>> > > > > > with small frame sizes where it has to split a message across
>> > > multiple
>> > > > > > frames.
>> > > > > > Therefore, to play it safe you should not set it smaller than the
>> > > > largest
>> > > > > > messages (+ header + overhead) you are sending.
>> > > > > > I do not know what message sizes you are sending but AMQP imposes
>> > the
>> > > > > > restriction that the framesize cannot be smaller than 4096 bytes.
>> > > > > > In the qpid broker the default currently is 256 kB.
>> > > > > >
>> > > > > > In the current state the broker does not allow setting the
>> network
>> > > > buffer
>> > > > > > to values smaller than 64 kB to allow TLS frames to fit into one
>> > > > network
>> > > > > > buffer.
>> > > > > > I attached a patch to this mail that lowers that restriction to
>> the
>> > > > limit
>> > > > > > imposed by AMQP (4096 Bytes).
>> > > > > > Obviously, you should not use this when using TLS.
>> > > > > >
>> > > > > >
>> > > > > > I hope this reduces the problems you are currently facing until
>> we
>> > > can
>> > > > > > complete the proper fix.
>> > > > > >
>> > > > > > Kind regards,
>> > > > > > Lorenz
>> > > > > >
>> > > > > >
>> > > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
>> > > > > > > Thanks so much Keith and the team for finding the root cause.
>> We
>> > > are
>> > > > so
>> > > > > > > relieved that we fix the root cause shortly.
>> > > > > > >
>> > > > > > > Couple of things that I forgot to mention on the mitigation
>> steps
>> > > we
>> > > > > took
>> > > > > > > in the last incident:
>> > > > > > > 1) We triggered GC from JMX bean multiple times, it did not
>> help
>> > in
>> > > > > > > reducing DM allocated.
>> > > > > > > 2) We also killed all the AMQP connections to the broker when
>> DM
>> > > was
>> > > > at
>> > > > > > > 80%. This did not help either. The way we killed connections -
>> > > using
>> > > > > JMX
>> > > > > > > got list of all the open AMQP connections and called close from
>> > JMX
>> > > > > > mbean.
>> > > > > > >
>> > > > > > > I am hoping the above two are not related to root cause, but
>> > wanted
>> > > > to
>> > > > > > > bring it up in case this is relevant.
>> > > > > > >
>> > > > > > > Thanks
>> > > > > > > Ramayan
>> > > > > > >
>> > > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <keith.wall@gmail.com
>> >
>> > > > wrote:
>> > > > > > >
>> > > > > > > >
>> > > > > > > > Hello Ramayan
>> > > > > > > >
>> > > > > > > > I believe I understand the root cause of the problem.  We
>> have
>> > > > > > > > identified a flaw in the direct memory buffer management
>> > employed
>> > > > by
>> > > > > > > > Qpid Broker J which for some messaging use-cases can lead to
>> > the
>> > > > OOM
>> > > > > > > > direct you describe.   For the issue to manifest the
>> producing
>> > > > > > > > application needs to use a single connection for the
>> production
>> > > of
>> > > > > > > > messages some of which are short-lived (i.e. are consumed
>> > > quickly)
>> > > > > > > > whilst others remain on the queue for some time.  Priority
>> > > queues,
>> > > > > > > > sorted queues and consumers utilising selectors that result
>> in
>> > > some
>> > > > > > > > messages being left of the queue could all produce this
>> patten.
>> > > > The
>> > > > > > > > pattern leads to a sparsely occupied 256K net buffers which
>> > > cannot
>> > > > be
>> > > > > > > > released or reused until every message that reference a
>> 'chunk'
>> > > of
>> > > > it
>> > > > > > > > is either consumed or flown to disk.   The problem was
>> > introduced
>> > > > > with
>> > > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
>> > > > > > > >
>> > > > > > > > The flow to disk feature is not helping us here because its
>> > > > algorithm
>> > > > > > > > considers only the size of live messages on the queues. If
>> the
>> > > > > > > > accumulative live size does not exceed the threshold, the
>> > > messages
>> > > > > > > > aren't flown to disk. I speculate that when you observed that
>> > > > moving
>> > > > > > > > messages cause direct message usage to drop earlier today,
>> your
>> > > > > > > > message movement cause a queue to go over threshold, cause
>> > > message
>> > > > to
>> > > > > > > > be flown to disk and their direct memory references released.
>> > > The
>> > > > > > > > logs will confirm this is so.
>> > > > > > > >
>> > > > > > > > I have not identified an easy workaround at the moment.
>> > > >  Decreasing
>> > > > > > > > the flow to disk threshold and/or increasing available direct
>> > > > memory
>> > > > > > > > should alleviate and may be an acceptable short term
>> > workaround.
>> > > > If
>> > > > > > > > it were possible for publishing application to publish short
>> > > lived
>> > > > > and
>> > > > > > > > long lived messages on two separate JMS connections this
>> would
>> > > > avoid
>> > > > > > > > this defect.
>> > > > > > > >
>> > > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related this
>> > > > problem.
>> > > > > > > > We intend to be working on these early next week and will be
>> > > aiming
>> > > > > > > > for a fix that is back-portable to 6.0.
>> > > > > > > >
>> > > > > > > > Apologies that you have run into this defect and thanks for
>> > > > > reporting.
>> > > > > > > >
>> > > > > > > > Thanks, Keith
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
>> > > > ramayan.tiwari@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > Hi All,
>> > > > > > > > >
>> > > > > > > > > We have been monitoring the brokers everyday and today we
>> > found
>> > > > one
>> > > > > > > > instance
>> > > > > > > > >
>> > > > > > > > > where broker’s DM was constantly going up and was about to
>> > > crash,
>> > > > > so
>> > > > > > we
>> > > > > > > > > experimented some mitigations, one of which caused the DM
>> to
>> > > come
>> > > > > > down.
>> > > > > > > > > Following are the details, which might help us
>> understanding
>> > > the
>> > > > > > issue:
>> > > > > > > > >
>> > > > > > > > > Traffic scenario:
>> > > > > > > > >
>> > > > > > > > > DM allocation had been constantly going up and was at 90%.
>> > > There
>> > > > > > were two
>> > > > > > > > > queues which seemed to align with the theories that we had.
>> > > Q1’s
>> > > > > > size had
>> > > > > > > > > been large right after the broker start and had slow
>> > > consumption
>> > > > of
>> > > > > > > > > messages, queue size only reduced from 76MB to 75MB over a
>> > > period
>> > > > > of
>> > > > > > > > 6hrs.
>> > > > > > > > >
>> > > > > > > > > Q2 on the other hand, started small and was gradually
>> > growing,
>> > > > > queue
>> > > > > > size
>> > > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues with
>> > > > traffic
>> > > > > > > > during
>> > > > > > > > >
>> > > > > > > > > this time.
>> > > > > > > > >
>> > > > > > > > > Action taken:
>> > > > > > > > >
>> > > > > > > > > Moved all the messages from Q2 (since this was our original
>> > > > theory)
>> > > > > > to Q3
>> > > > > > > > > (already created but no messages in it). This did not help
>> > with
>> > > > the
>> > > > > > DM
>> > > > > > > > > growing up.
>> > > > > > > > > Moved all the messages from Q1 to Q4 (already created but
>> no
>> > > > > > messages in
>> > > > > > > > > it). This reduced DM allocation from 93% to 31%.
>> > > > > > > > >
>> > > > > > > > > We have the heap dump and thread dump from when broker was
>> > 90%
>> > > in
>> > > > > DM
>> > > > > > > > > allocation. We are going to analyze that to see if we can
>> get
>> > > > some
>> > > > > > clue.
>> > > > > > > > We
>> > > > > > > > >
>> > > > > > > > > wanted to share this new information which might help in
>> > > > reasoning
>> > > > > > about
>> > > > > > > > the
>> > > > > > > > >
>> > > > > > > > > memory issue.
>> > > > > > > > >
>> > > > > > > > > - Ramayan
>> > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
>> > > > > > > > ramayan.tiwari@gmail.com>
>> > > > > > > > >
>> > > > > > > > > wrote:
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Hi Keith,
>> > > > > > > > > >
>> > > > > > > > > > Thanks so much for your response and digging into the
>> > issue.
>> > > > > Below
>> > > > > > are
>> > > > > > > > the
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > answer to your questions:
>> > > > > > > > > >
>> > > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't
>> use
>> > > 6.1
>> > > > > > where it
>> > > > > > > > > > was released because we need JMX support. Here is the
>> > > > destination
>> > > > > > > > format:
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes :
>> {
>> > > > > > arguments : {
>> > > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
>> > > > > > > > > >
>> > > > > > > > > > 2) Our machines have 40 cores, which will make the number
>> > of
>> > > > > > threads to
>> > > > > > > > > > 80. This might not be an issue, because this will show up
>> > in
>> > > > the
>> > > > > > > > baseline DM
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > allocated, which is only 6% (of 4GB) when we just bring
>> up
>> > > the
>> > > > > > broker.
>> > > > > > > > > >
>> > > > > > > > > > 3) The only setting that we tuned WRT to DM is
>> > > > > flowToDiskThreshold,
>> > > > > > > > which
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > is set at 80% now.
>> > > > > > > > > >
>> > > > > > > > > > 4) Only one virtual host in the broker.
>> > > > > > > > > >
>> > > > > > > > > > 5) Most of our queues (99%) are priority, we also have
>> 8-10
>> > > > > sorted
>> > > > > > > > queues.
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > 6) Yeah we are using the standard 0.16 client and not
>> AMQP
>> > > 1.0
>> > > > > > clients.
>> > > > > > > > > > The connection log line looks like:
>> > > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol
>> > > > Version
>> > > > > :
>> > > > > > 0-10
>> > > > > > > > :
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Client ID : test : Client Version : 0.16 : Client
>> Product :
>> > > > qpid
>> > > > > > > > > >
>> > > > > > > > > > We had another broker crashed about an hour back, we do
>> see
>> > > the
>> > > > > > same
>> > > > > > > > > > patterns:
>> > > > > > > > > > 1) There is a queue which is constantly growing, enqueue
>> is
>> > > > > faster
>> > > > > > than
>> > > > > > > > > > dequeue on that queue for a long period of time.
>> > > > > > > > > > 2) Flow to disk didn't kick in at all.
>> > > > > > > > > >
>> > > > > > > > > > This graph shows memory growth (red line - heap, blue -
>> DM
>> > > > > > allocated,
>> > > > > > > > > > yellow - DM used)
>> > > > > > > > > >
>> > > > > > > > > > https://drive.google.com/file/d/
>> > > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
>> > > > > > > > view?usp=sharing
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > The below graph shows growth on a single queue (there are
>> > > 10-12
>> > > > > > other
>> > > > > > > > > > queues with traffic as well, something large size than
>> this
>> > > > > queue):
>> > > > > > > > > >
>> > > > > > > > > > https://drive.google.com/file/d/
>> > > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
>> > > > > > > > view?usp=sharing
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > Couple of questions:
>> > > > > > > > > > 1) Is there any developer level doc/design spec on how
>> Qpid
>> > > > uses
>> > > > > > DM?
>> > > > > > > > > > 2) We are not getting heap dumps automatically when
>> broker
>> > > > > crashes
>> > > > > > due
>> > > > > > > > to
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone
>> > > > found a
>> > > > > > way
>> > > > > > > > to get
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > around this problem?
>> > > > > > > > > >
>> > > > > > > > > > Thanks
>> > > > > > > > > > Ramayan
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
>> > > keith.wall@gmail.com
>> > > > >
>> > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > Hi Ramayan
>> > > > > > > > > > >
>> > > > > > > > > > > We have been discussing your problem here and have a
>> > couple
>> > > > of
>> > > > > > > > questions.
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > I have been experimenting with use-cases based on your
>> > > > > > descriptions
>> > > > > > > > > > > above, but so far, have been unsuccessful in
>> reproducing
>> > a
>> > > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
>> > > > condition.
>> > > > > > The
>> > > > > > > > > > > direct memory usage reflects the expected model: it
>> > levels
>> > > > off
>> > > > > > when
>> > > > > > > > > > > the flow to disk threshold is reached and direct memory
>> > is
>> > > > > > release as
>> > > > > > > > > > > messages are consumed until the minimum size for
>> caching
>> > of
>> > > > > > direct is
>> > > > > > > > > > > reached.
>> > > > > > > > > > >
>> > > > > > > > > > > 1] For clarity let me check: we believe when you say
>> > "patch
>> > > > to
>> > > > > > use
>> > > > > > > > > > > MultiQueueConsumer" you are referring to the patch
>> > attached
>> > > > to
>> > > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the
>> > broker"
>> > > > > and
>> > > > > > you
>> > > > > > > > > > > are using a combination of this "x-pull-only"  with the
>> > > > > standard
>> > > > > > > > > > > "x-multiqueue" feature.  Is this correct?
>> > > > > > > > > > >
>> > > > > > > > > > > 2] One idea we had here relates to the size of the
>> > > > virtualhost
>> > > > > IO
>> > > > > > > > > > > pool.   As you know from the documentation, the Broker
>> > > > > > caches/reuses
>> > > > > > > > > > > direct memory internally but the documentation fails to
>> > > > > mentions
>> > > > > > that
>> > > > > > > > > > > each pooled virtualhost IO thread also grabs a chunk
>> > (256K)
>> > > > of
>> > > > > > direct
>> > > > > > > > > > > memory from this cache.  By default the virtual host IO
>> > > pool
>> > > > is
>> > > > > > sized
>> > > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() *
>> 2,
>> > > > 64),
>> > > > > > so if
>> > > > > > > > > > > you have a machine with a very large number of cores,
>> you
>> > > may
>> > > > > > have a
>> > > > > > > > > > > surprising large amount of direct memory assigned to
>> > > > > virtualhost
>> > > > > > IO
>> > > > > > > > > > > threads.   Check the value of connectionThreadPoolSize
>> on
>> > > the
>> > > > > > > > > > > virtualhost
>> > > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
>> > > > > > virtualhostnodename>/<;
>> > > > > > > > virtualhostname>)
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > to see what value is in force.  What is it?  It is
>> > possible
>> > > > to
>> > > > > > tune
>> > > > > > > > > > > the pool size using context variable
>> > > > > > > > > > > virtualhost.connectionThreadPool.size.
>> > > > > > > > > > >
>> > > > > > > > > > > 3] Tell me if you are tuning the Broker in way beyond
>> the
>> > > > > > direct/heap
>> > > > > > > > > > > memory settings you have told us about already.  For
>> > > instance
>> > > > > > you are
>> > > > > > > > > > > changing any of the direct memory pooling settings
>> > > > > > > > > > > broker.directByteBufferPoolSize, default network
>> buffer
>> > > size
>> > > > > > > > > > > qpid.broker.networkBufferSize or applying any other
>> > > > > non-standard
>> > > > > > > > > > > settings?
>> > > > > > > > > > >
>> > > > > > > > > > > 4] How many virtual hosts do you have on the Broker?
>> > > > > > > > > > >
>> > > > > > > > > > > 5] What is the consumption pattern of the messages?  Do
>> > > > consume
>> > > > > > in a
>> > > > > > > > > > > strictly FIFO fashion or are you making use of message
>> > > > > selectors
>> > > > > > > > > > > or/and any of the out-of-order queue types (LVQs,
>> > priority
>> > > > > queue
>> > > > > > or
>> > > > > > > > > > > sorted queues)?
>> > > > > > > > > > >
>> > > > > > > > > > > 6] Is it just the 0.16 client involved in the
>> > application?
>> > > > >  Can
>> > > > > > I
>> > > > > > > > > > > check that you are not using any of the AMQP 1.0
>> clients
>> > > > > > > > > > > (org,apache.qpid:qpid-jms-client or
>> > > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software
>> > > stack
>> > > > > (as
>> > > > > > either
>> > > > > > > > > > > consumers or producers)
>> > > > > > > > > > >
>> > > > > > > > > > > Hopefully the answers to these questions will get us
>> > closer
>> > > > to
>> > > > > a
>> > > > > > > > > > > reproduction.   If you are able to reliable reproduce
>> it,
>> > > > > please
>> > > > > > share
>> > > > > > > > > > > the steps with us.
>> > > > > > > > > > >
>> > > > > > > > > > > Kind regards, Keith.
>> > > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
>> > > > > > ramayan.tiwari@gmail.com>
>> > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > After a lot of log mining, we might have a way to
>> > explain
>> > > > the
>> > > > > > > > sustained
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > increased in DirectMemory allocation, the correlation
>> > > seems
>> > > > > to
>> > > > > > be
>> > > > > > > > with
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > the
>> > > > > > > > > > > > growth in the size of a Queue that is getting
>> consumed
>> > > but
>> > > > at
>> > > > > > a much
>> > > > > > > > > > > > slower
>> > > > > > > > > > > > rate than producers putting messages on this queue.
>> > > > > > > > > > > >
>> > > > > > > > > > > > The pattern we see is that in each instance of broker
>> > > > crash,
>> > > > > > there is
>> > > > > > > > > > > > at
>> > > > > > > > > > > > least one queue (usually 1 queue) whose size kept
>> > growing
>> > > > > > steadily.
>> > > > > > > > > > > > It’d be
>> > > > > > > > > > > > of significant size but not the largest queue --
>> > usually
>> > > > > there
>> > > > > > are
>> > > > > > > > > > > > multiple
>> > > > > > > > > > > > larger queues -- but it was different from other
>> queues
>> > > in
>> > > > > > that its
>> > > > > > > > > > > > size
>> > > > > > > > > > > > was growing steadily. The queue would also be moving,
>> > but
>> > > > its
>> > > > > > > > > > > > processing
>> > > > > > > > > > > > rate was not keeping up with the enqueue rate.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Our theory that might be totally wrong: If a queue is
>> > > > moving
>> > > > > > the
>> > > > > > > > entire
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > time, maybe then the broker would keep reusing the
>> same
>> > > > > buffer
>> > > > > > in
>> > > > > > > > > > > > direct
>> > > > > > > > > > > > memory for the queue, and keep on adding onto it at
>> the
>> > > end
>> > > > > to
>> > > > > > > > > > > > accommodate
>> > > > > > > > > > > > new messages. But because it’s active all the time
>> and
>> > > > we’re
>> > > > > > pointing
>> > > > > > > > > > > > to
>> > > > > > > > > > > > the same buffer, space allocated for messages at the
>> > head
>> > > > of
>> > > > > > the
>> > > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long after
>> > those
>> > > > > > messages
>> > > > > > > > have
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > been processed. Just a theory.
>> > > > > > > > > > > >
>> > > > > > > > > > > > We are also trying to reproduce this using some perf
>> > > tests
>> > > > to
>> > > > > > enqueue
>> > > > > > > > > > > > with
>> > > > > > > > > > > > same pattern, will update with the findings.
>> > > > > > > > > > > >
>> > > > > > > > > > > > Thanks
>> > > > > > > > > > > > Ramayan
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
>> > > > > > > > > > > > <ra...@gmail.com>
>> > > > > > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Another issue that we noticed is when broker goes
>> OOM
>> > > due
>> > > > > to
>> > > > > > direct
>> > > > > > > > > > > > > memory, it doesn't create heap dump (specified by
>> > > "-XX:+
>> > > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM
>> error
>> > > is
>> > > > > > same as
>> > > > > > > > what
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > is
>> > > > > > > > > > > > > mentioned in the oracle JVM docs
>> > > > > > ("java.lang.OutOfMemoryError").
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > Has anyone been able to find a way to get to heap
>> > dump
>> > > > for
>> > > > > > DM OOM?
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > - Ramayan
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
>> > > > > > > > > > > > > <ramayan.tiwari@gmail.com
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Alex,
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Below are the flow to disk logs from broker
>> having
>> > > > > > 3million+
>> > > > > > > > messages
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > this time. We only have one virtual host. Time is
>> > in
>> > > > GMT.
>> > > > > > Looks
>> > > > > > > > like
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > flow
>> > > > > > > > > > > > > > to disk is active on the whole virtual host and
>> > not a
>> > > > > > queue level.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > When the same broker went OOM yesterday, I did
>> not
>> > > see
>> > > > > any
>> > > > > > flow to
>> > > > > > > > > > > > > > disk
>> > > > > > > > > > > > > > logs from when it was started until it crashed
>> > > (crashed
>> > > > > > twice
>> > > > > > > > within
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > 4hrs).
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3356539KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3354866KB
>> > > > > > > > > > > > > > within threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3358509KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3353501KB
>> > > > > > > > > > > > > > within threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3357544KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3353236KB
>> > > > > > > > > > > > > > within threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3356704KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3353511KB
>> > > > > > > > > > > > > > within threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3357948KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3355310KB
>> > > > > > > > > > > > > > within threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3365624KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3355136KB
>> > > > > > > > > > > > > > within threshold 3355443KB
>> > > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO
>> [Housekeeping[test]] -
>> > > > > > > > > > > > > > [Housekeeping[test]]
>> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
>> > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > 3358683KB
>> > > > > > > > > > > > > > exceeds threshold 3355443KB
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > After production release (2days back), we have
>> > seen 4
>> > > > > > crashes in 3
>> > > > > > > > > > > > > > different brokers, this is the most pressing
>> > concern
>> > > > for
>> > > > > > us in
>> > > > > > > > > > > > > > decision if
>> > > > > > > > > > > > > > we should roll back to 0.32. Any help is greatly
>> > > > > > appreciated.
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > Thanks
>> > > > > > > > > > > > > > Ramayan
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy
>> <
>> > > > > > orudyy@gmail.com
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Ramayan,
>> > > > > > > > > > > > > > > Thanks for the details. I would like to clarify
>> > > > whether
>> > > > > > flow to
>> > > > > > > > disk
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > was
>> > > > > > > > > > > > > > > triggered today for 3 million messages?
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > The following logs are issued for flow to disk:
>> > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
>> Message
>> > > > > memory
>> > > > > > use
>> > > > > > > > > > > > > > > {0,number,#}KB
>> > > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
>> > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
>> > Message
>> > > > > > memory use
>> > > > > > > > > > > > > > > {0,number,#}KB within threshold
>> {1,number,#.##}KB
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > Kind Regards,
>> > > > > > > > > > > > > > > Alex
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
>> > > > > > > > ramayan.tiwari@gmail.com>
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Hi Alex,
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks for your response, here are the
>> details:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > We use "direct" exchange, without persistence
>> > (we
>> > > > > > specify
>> > > > > > > > > > > > > > > NON_PERSISTENT
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > that while sending from client) and use BDB
>> > > store.
>> > > > We
>> > > > > > use JSON
>> > > > > > > > > > > > > > > > virtual
>> > > > > > > > > > > > > > > host
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > type. We are not using SSL.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > When the broker went OOM, we had around 1.3
>> > > million
>> > > > > > messages
>> > > > > > > > with
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 100
>> > > > > > > > > > > > > > > bytes
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > average message size. Direct memory
>> allocation
>> > > > (value
>> > > > > > read from
>> > > > > > > > > > > > > > > > MBean)
>> > > > > > > > > > > > > > > kept
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > going up, even though it wouldn't need more
>> DM
>> > to
>> > > > > > store these
>> > > > > > > > many
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > messages. DM allocated persisted at 99% for
>> > > about 3
>> > > > > > and half
>> > > > > > > > hours
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > before
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > crashing.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Today, on the same broker we have 3 million
>> > > > messages
>> > > > > > (same
>> > > > > > > > message
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > size)
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > and DM allocated is only at 8%. This seems
>> like
>> > > > there
>> > > > > > is some
>> > > > > > > > > > > > > > > > issue
>> > > > > > > > > > > > > > > with
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > de-allocation or a leak.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > I have uploaded the memory utilization graph
>> > > here:
>> > > > > > > > > > > > > > > > https://drive.google.com/file/d/
>> > > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
>> > > > > > > > > > > > > > > > view?usp=sharing
>> > > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used
>> > (sum
>> > > > of
>> > > > > > queue
>> > > > > > > > > > > > > > > > payload)
>> > > > > > > > > > > > > > > and Red
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > is heap usage.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > Thanks
>> > > > > > > > > > > > > > > > Ramayan
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr
>> > Rudyy
>> > > > > > > > > > > > > > > > <or...@gmail.com>
>> > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Hi Ramayan,
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Could please share with us the details of
>> > > > messaging
>> > > > > > use
>> > > > > > > > case(s)
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > which
>> > > > > > > > > > > > > > > > ended
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > up in OOM on broker side?
>> > > > > > > > > > > > > > > > > I would like to reproduce the issue on my
>> > local
>> > > > > > broker in
>> > > > > > > > order
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > to
>> > > > > > > > > > > > > > > fix
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > it.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > I would appreciate if you could provide as
>> > much
>> > > > > > details as
>> > > > > > > > > > > > > > > > > possible,
>> > > > > > > > > > > > > > > > > including, messaging topology, message
>> > > > persistence
>> > > > > > type,
>> > > > > > > > message
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > sizes,volumes, etc.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for
>> > > keeping
>> > > > > > message
>> > > > > > > > content
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > > > receiving/sending data. Each plain
>> connection
>> > > > > > utilizes 512K of
>> > > > > > > > > > > > > > > > > direct
>> > > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of
>> direct
>> > > > > > memory. Your
>> > > > > > > > > > > > > > > > > memory
>> > > > > > > > > > > > > > > > settings
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > look Ok to me.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > Kind Regards,
>> > > > > > > > > > > > > > > > > Alex
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari
>> > > > > > > > > > > > > > > > > <ra...@gmail.com>
>> > > > > > > > > > > > > > > > > wrote:
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Hi All,
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with
>> patch
>> > to
>> > > > use
>> > > > > > > > > > > > > > > MultiQueueConsumer
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > feature. We just finished deploying to
>> > > > production
>> > > > > > and saw
>> > > > > > > > > > > > > > > > > > couple of
>> > > > > > > > > > > > > > > > > > instances of broker OOM due to running
>> out
>> > of
>> > > > > > DirectMemory
>> > > > > > > > > > > > > > > > > > buffer
>> > > > > > > > > > > > > > > > > > (exceptions at the end of this email).
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Here is our setup:
>> > > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g
>> (this
>> > > is
>> > > > > > opposite of
>> > > > > > > > > > > > > > > > > > what the
>> > > > > > > > > > > > > > > > > > recommendation is, however, for our use
>> > cause
>> > > > > > message
>> > > > > > > > payload
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > is
>> > > > > > > > > > > > > > > really
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > small ~400bytes and is way less than the
>> > per
>> > > > > > message
>> > > > > > > > overhead
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > 1KB).
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > In
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > perf testing, we were able to put 2
>> million
>> > > > > > messages without
>> > > > > > > > > > > > > > > > > > any
>> > > > > > > > > > > > > > > > issues.
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
>> > > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and
>> > there
>> > > is
>> > > > > > one multi
>> > > > > > > > > > > > > > > > > > queue
>> > > > > > > > > > > > > > > > consumer
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > attached to each session, listening to
>> > around
>> > > > > 1000
>> > > > > > queues.
>> > > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I
>> know).
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > With the above setup, the baseline
>> > > utilization
>> > > > > > (without any
>> > > > > > > > > > > > > > > messages)
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > for
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > direct memory was around 230mb (with 410
>> > > > > > connection each
>> > > > > > > > > > > > > > > > > > taking
>> > > > > > > > > > > > > > > 500KB).
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Based on our understanding of broker
>> memory
>> > > > > > allocation,
>> > > > > > > > > > > > > > > > > > message
>> > > > > > > > > > > > > > > payload
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > should be the only thing adding to direct
>> > > > memory
>> > > > > > utilization
>> > > > > > > > > > > > > > > > > > (on
>> > > > > > > > > > > > > > > top of
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > baseline), however, we are experiencing
>> > > > something
>> > > > > > completely
>> > > > > > > > > > > > > > > different.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > In
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > our last broker crash, we see that broker
>> > is
>> > > > > > constantly
>> > > > > > > > > > > > > > > > > > running
>> > > > > > > > > > > > > > > with
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > 90%+
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > direct memory allocated, even when
>> message
>> > > > > payload
>> > > > > > sum from
>> > > > > > > > > > > > > > > > > > all the
>> > > > > > > > > > > > > > > > > queues
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > is only 6-8% (these % are against
>> available
>> > > DM
>> > > > of
>> > > > > > 4gb).
>> > > > > > > > During
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > these
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > high
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > DM usage period, heap usage was around
>> 60%
>> > > (of
>> > > > > > 12gb).
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > We would like some help in understanding
>> > what
>> > > > > > could be the
>> > > > > > > > > > > > > > > > > > reason
>> > > > > > > > > > > > > > > of
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > these
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > high DM allocations. Are there things
>> other
>> > > > than
>> > > > > > message
>> > > > > > > > > > > > > > > > > > payload
>> > > > > > > > > > > > > > > and
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > AMQP
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > connection, which use DM and could be
>> > > > > contributing
>> > > > > > to these
>> > > > > > > > > > > > > > > > > > high
>> > > > > > > > > > > > > > > usage?
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Another thing where we are puzzled is the
>> > > > > > de-allocation of
>> > > > > > > > DM
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > byte
>> > > > > > > > > > > > > > > > > buffers.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > From log mining of heap and DM
>> utilization,
>> > > > > > de-allocation of
>> > > > > > > > > > > > > > > > > > DM
>> > > > > > > > > > > > > > > doesn't
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has
>> seen
>> > > any
>> > > > > > documentation
>> > > > > > > > > > > > > > > related to
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > this, it would be very helpful if you
>> could
>> > > > share
>> > > > > > that.
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > Thanks
>> > > > > > > > > > > > > > > > > > Ramayan
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > *Exceptions*
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
>> > > > memory
>> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
>> > > Bits.java:658)
>> > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
>> > > > > > init>(DirectByteBuffer.java:
>> > > > > > > > 123)
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
>> > > > > allocateDirect(ByteBuffer.
>> > > > > > java:311)
>> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
>> > > > > > QpidByteBuffer.allocateDirect(
>> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
>> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingConnectionPlainD
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > elegate.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
>> > > > > > > > NonBlockingConnectionPlainDele
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > gate.java:93)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingConnectionPlainDele
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > gate.processData(
>> > > > NonBlockingConnectionPlainDele
>> > > > > > > > gate.java:60)
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingConnection.doRead(
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingConnection.doWork(
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NetworkConnectionScheduler.
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > processConnection(
>> > > NetworkConnectionScheduler.
>> > > > > > java:124)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>> > > > transport.SelectorThread$
>> > > > > > > > ConnectionPr
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > ocessor.
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > processConnection(
>> SelectorThread.java:504)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>> > > > transport.SelectorThread$
>> > > > > > > > > > > > > > > > > > SelectionTask.performSelect(
>> > > > > > SelectorThread.java:337)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>> > > > transport.SelectorThread$
>> > > > > > > > SelectionTask.run(
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > SelectorThread.java:87)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>> > > > > > transport.SelectorThread.run(
>> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > java.util.concurrent.
>> > > > > ThreadPoolExecutor.runWorker(
>> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
>> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > java.util.concurrent.
>> > > > > > ThreadPoolExecutor$Worker.run(
>> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
>> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
>> > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > *Second exception*
>> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
>> > > > memory
>> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
>> > > Bits.java:658)
>> > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
>> > > > > > init>(DirectByteBuffer.java:
>> > > > > > > > 123)
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
>> > > > > allocateDirect(ByteBuffer.
>> > > > > > java:311)
>> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
>> > > > > > QpidByteBuffer.allocateDirect(
>> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
>> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingConnectionPlainDele
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > gate.<init>(
>> NonBlockingConnectionPlainDele
>> > > > > > gate.java:45)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > NonBlockingConnection.
>> > > > > > > > > > > > > > > > > > setTransportEncryption(
>> > > > > NonBlockingConnection.java:
>> > > > > > 625)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingConnection.<init>(
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
>> > > > > > > > NonBlockingNetworkTransport.
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > acceptSocketChannel(
>> > > > NonBlockingNetworkTransport.
>> > > > > > java:158)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>> > > > transport.SelectorThread$
>> > > > > > > > SelectionTas
>> > > > > > > > >
>> > > > > > > > > >
>> > > > > > > > > > >
>> > > > > > > > > > > >
>> > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > k$1.run(
>> > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > > > > > > SelectorThread.java:191)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
>> > > > > > transport.SelectorThread.run(
>> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
>> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > java.util.concurrent.
>> > > > > ThreadPoolExecutor.runWorker(
>> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
>> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > > at
>> > > > > > > > > > > > > > > > > > java.util.concurrent.
>> > > > > > ThreadPoolExecutor$Worker.run(
>> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
>> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
>> > > > > > ~[na:1.8.0_40]
>> > > > > > > > > > > > > > > > > >
>> > > > > > > > > > > > > >
>> > > > > > > > > > > ------------------------------
>> > > ------------------------------
>> > > > > > ---------
>> > > > > > > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.
>> > org
>> > > > > > > > > > > For additional commands, e-mail:
>> > > users-help@qpid.apache.org
>> > > > > > > > > > >
>> > > > > >
>> > > > > >
>> > > > > > ------------------------------------------------------------
>> > > ---------
>> > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
>> > > > > > For additional commands, e-mail: users-help@qpid.apache.org
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
For additional commands, e-mail: users-help@qpid.apache.org


Re: Java broker OOM due to DirectMemory

Posted by Ramayan Tiwari <ra...@gmail.com>.
Hi Alex,

Any update on the fix for this?
QPID-7753 is assigned a fix version for 7.0.0, I am hoping that the fix
will also be back ported to 6.0.x.

Thanks
Ramayan

On Mon, May 8, 2017 at 2:14 AM, Oleksandr Rudyy <or...@gmail.com> wrote:

> Hi Ramayan,
>
> Thanks for testing the patch and providing a feedback.
>
> Regarding direct memory utilization, the Qpid Broker caches up to 256MB of
> direct memory internally in QpidByteBuffers. Thus, when testing the Broker
> with only 256MB of direct memory, the entire direct memory could be cached
> and it would look as if direct memory is never released. Potentially, you
> can reduce the number of buffers cached on broker by changing context
> variable 'broker.directByteBufferPoolSize'. By default, it is set to 1000.
> With buffer size of 256K, it would give ~256M of cache.
>
> Regarding introducing lower and upper thresholds for 'flow to disk'. It
> seems like a good idea and we will try to implement it early this week on
> trunk first.
>
> Kind Regards,
> Alex
>
>
> On 5 May 2017 at 23:49, Ramayan Tiwari <ra...@gmail.com> wrote:
>
> > Hi Alex,
> >
> > Thanks for providing the patch. I verified the fix with same perf test,
> and
> > it does prevent broker from going OOM, however. DM utilization doesn't
> get
> > any better after hitting the threshold (where flow to disk is activated
> > based on total used % across broker - graph in the link below).
> >
> > After hitting the final threshold, flow to disk activates and deactivates
> > pretty frequently across all the queues. The reason seems to be because
> > there is only one threshold currently to trigger flow to disk. Would it
> > make sense to break this down to high and low threshold - so that once
> flow
> > to disk is active after hitting high threshold, it will be active until
> the
> > queue utilization (or broker DM allocation) reaches the low threshold.
> >
> > Graph and flow to disk logs are here:
> > https://docs.google.com/document/d/1Wc1e-id-
> WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> > U-RiM/edit#heading=h.6400pltvjhy7
> >
> > Thanks
> > Ramayan
> >
> > On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <or...@gmail.com>
> wrote:
> >
> > > Hi Ramayan,
> > >
> > > We attached to the QPID-7753 a patch with a work around for 6.0.x
> branch.
> > > It triggers flow to disk based on direct memory consumption rather than
> > > estimation of the space occupied by the message content. The flow to
> disk
> > > should evacuate message content preventing running out of direct
> memory.
> > We
> > > already committed the changes into 6.0.x and 6.1.x branches. It will be
> > > included into upcoming 6.0.7 and 6.1.3 releases.
> > >
> > > Please try and test the patch in your environment.
> > >
> > > We are still working at finishing of the fix for trunk.
> > >
> > > Kind Regards,
> > > Alex
> > >
> > > On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com>
> wrote:
> > >
> > > > Hi Ramayan,
> > > >
> > > > The high-level plan is currently as follows:
> > > >  1) Periodically try to compact sparse direct memory buffers.
> > > >  2) Increase accuracy of messages' direct memory usage estimation to
> > more
> > > > reliably trigger flow to disk.
> > > >  3) Add an additional flow to disk trigger based on the amount of
> > > allocated
> > > > direct memory.
> > > >
> > > > A little bit more details:
> > > >  1) We plan on periodically checking the amount of direct memory
> usage
> > > and
> > > > if it is above a
> > > >     threshold (50%) we compare the sum of all queue sizes with the
> > amount
> > > > of allocated direct memory.
> > > >     If the ratio falls below a certain threshold we trigger a
> > compaction
> > > > task which goes through all queues
> > > >     and copy's a certain amount of old message buffers into new ones
> > > > thereby freeing the old buffers so
> > > >     that they can be returned to the buffer pool and be reused.
> > > >
> > > >  2) Currently we trigger flow to disk based on an estimate of how
> much
> > > > memory the messages on the
> > > >     queues consume. We had to use estimates because we did not have
> > > > accurate size numbers for
> > > >     message headers. By having accurate size information for message
> > > > headers we can more reliably
> > > >     enforce queue memory limits.
> > > >
> > > >  3) The flow to disk trigger based on message size had another
> problem
> > > > which is more pertinent to the
> > > >     current issue. We only considered the size of the messages and
> not
> > > how
> > > > much memory we allocate
> > > >     to store those messages. In the FIFO use case those numbers will
> be
> > > > very close to each other but in
> > > >     use cases like yours we can end up with sparse buffers and the
> > > numbers
> > > > will diverge. Because of this
> > > >     divergence we do not trigger flow to disk in time and the broker
> > can
> > > go
> > > > OOM.
> > > >     To fix the issue we want to add an additional flow to disk
> trigger
> > > > based on the amount of allocated direct
> > > >     memory. This should prevent the broker from going OOM even if the
> > > > compaction strategy outlined above
> > > >     should fail for some reason (e.g., the compaction task cannot
> keep
> > up
> > > > with the arrival of new messages).
> > > >
> > > > Currently, there are patches for the above points but they suffer
> from
> > > some
> > > > thread-safety issues that need to be addressed.
> > > >
> > > > I hope this description helps. Any feedback is, as always, welcome.
> > > >
> > > > Kind regards,
> > > > Lorenz
> > > >
> > > >
> > > >
> > > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
> > > ramayan.tiwari@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Lorenz,
> > > > >
> > > > > Thanks so much for the patch. We have a perf test now to reproduce
> > this
> > > > > issue, so we did test with 256KB, 64KB and 4KB network byte buffer.
> > > None
> > > > of
> > > > > these configurations help with the issue (or give any more
> breathing
> > > > room)
> > > > > for our use case. We would like to share the perf analysis with the
> > > > > community:
> > > > >
> > > > > https://docs.google.com/document/d/1Wc1e-id-
> > > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> > > > > U-RiM/edit?usp=sharing
> > > > >
> > > > > Feel free to comment on the doc if certain details are incorrect or
> > if
> > > > > there are questions.
> > > > >
> > > > > Since the short term solution doesn't help us, we are very
> interested
> > > in
> > > > > getting some details on how the community plans to address this, a
> > high
> > > > > level description of the approach will be very helpful for us in
> > order
> > > to
> > > > > brainstorm our use cases along with this solution.
> > > > >
> > > > > - Ramayan
> > > > >
> > > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
> > quack.lorenz@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello Ramayan,
> > > > > >
> > > > > > We are still working on a fix for this issue.
> > > > > > In the mean time we had an idea to potentially workaround the
> issue
> > > > until
> > > > > > a proper fix is released.
> > > > > >
> > > > > > The idea is to decrease the qpid network buffer size the broker
> > uses.
> > > > > > While this still allows for sparsely populated buffers it would
> > > improve
> > > > > > the overall occupancy ratio.
> > > > > >
> > > > > > Here are the steps to follow:
> > > > > >  * ensure you are not using TLS
> > > > > >  * apply the attached patch
> > > > > >  * figure out the size of the largest messages you are sending
> > > > (including
> > > > > > header and some overhead)
> > > > > >  * set the context variable "qpid.broker.networkBufferSize" to
> > that
> > > > > value
> > > > > > but not smaller than 4096
> > > > > >  * test
> > > > > >
> > > > > > Decreasing the qpid network buffer size automatically limits the
> > > > maximum
> > > > > > AMQP frame size.
> > > > > > Since you are using a very old client we are not sure how well it
> > > copes
> > > > > > with small frame sizes where it has to split a message across
> > > multiple
> > > > > > frames.
> > > > > > Therefore, to play it safe you should not set it smaller than the
> > > > largest
> > > > > > messages (+ header + overhead) you are sending.
> > > > > > I do not know what message sizes you are sending but AMQP imposes
> > the
> > > > > > restriction that the framesize cannot be smaller than 4096 bytes.
> > > > > > In the qpid broker the default currently is 256 kB.
> > > > > >
> > > > > > In the current state the broker does not allow setting the
> network
> > > > buffer
> > > > > > to values smaller than 64 kB to allow TLS frames to fit into one
> > > > network
> > > > > > buffer.
> > > > > > I attached a patch to this mail that lowers that restriction to
> the
> > > > limit
> > > > > > imposed by AMQP (4096 Bytes).
> > > > > > Obviously, you should not use this when using TLS.
> > > > > >
> > > > > >
> > > > > > I hope this reduces the problems you are currently facing until
> we
> > > can
> > > > > > complete the proper fix.
> > > > > >
> > > > > > Kind regards,
> > > > > > Lorenz
> > > > > >
> > > > > >
> > > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> > > > > > > Thanks so much Keith and the team for finding the root cause.
> We
> > > are
> > > > so
> > > > > > > relieved that we fix the root cause shortly.
> > > > > > >
> > > > > > > Couple of things that I forgot to mention on the mitigation
> steps
> > > we
> > > > > took
> > > > > > > in the last incident:
> > > > > > > 1) We triggered GC from JMX bean multiple times, it did not
> help
> > in
> > > > > > > reducing DM allocated.
> > > > > > > 2) We also killed all the AMQP connections to the broker when
> DM
> > > was
> > > > at
> > > > > > > 80%. This did not help either. The way we killed connections -
> > > using
> > > > > JMX
> > > > > > > got list of all the open AMQP connections and called close from
> > JMX
> > > > > > mbean.
> > > > > > >
> > > > > > > I am hoping the above two are not related to root cause, but
> > wanted
> > > > to
> > > > > > > bring it up in case this is relevant.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Ramayan
> > > > > > >
> > > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <keith.wall@gmail.com
> >
> > > > wrote:
> > > > > > >
> > > > > > > >
> > > > > > > > Hello Ramayan
> > > > > > > >
> > > > > > > > I believe I understand the root cause of the problem.  We
> have
> > > > > > > > identified a flaw in the direct memory buffer management
> > employed
> > > > by
> > > > > > > > Qpid Broker J which for some messaging use-cases can lead to
> > the
> > > > OOM
> > > > > > > > direct you describe.   For the issue to manifest the
> producing
> > > > > > > > application needs to use a single connection for the
> production
> > > of
> > > > > > > > messages some of which are short-lived (i.e. are consumed
> > > quickly)
> > > > > > > > whilst others remain on the queue for some time.  Priority
> > > queues,
> > > > > > > > sorted queues and consumers utilising selectors that result
> in
> > > some
> > > > > > > > messages being left of the queue could all produce this
> patten.
> > > > The
> > > > > > > > pattern leads to a sparsely occupied 256K net buffers which
> > > cannot
> > > > be
> > > > > > > > released or reused until every message that reference a
> 'chunk'
> > > of
> > > > it
> > > > > > > > is either consumed or flown to disk.   The problem was
> > introduced
> > > > > with
> > > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> > > > > > > >
> > > > > > > > The flow to disk feature is not helping us here because its
> > > > algorithm
> > > > > > > > considers only the size of live messages on the queues. If
> the
> > > > > > > > accumulative live size does not exceed the threshold, the
> > > messages
> > > > > > > > aren't flown to disk. I speculate that when you observed that
> > > > moving
> > > > > > > > messages cause direct message usage to drop earlier today,
> your
> > > > > > > > message movement cause a queue to go over threshold, cause
> > > message
> > > > to
> > > > > > > > be flown to disk and their direct memory references released.
> > > The
> > > > > > > > logs will confirm this is so.
> > > > > > > >
> > > > > > > > I have not identified an easy workaround at the moment.
> > > >  Decreasing
> > > > > > > > the flow to disk threshold and/or increasing available direct
> > > > memory
> > > > > > > > should alleviate and may be an acceptable short term
> > workaround.
> > > > If
> > > > > > > > it were possible for publishing application to publish short
> > > lived
> > > > > and
> > > > > > > > long lived messages on two separate JMS connections this
> would
> > > > avoid
> > > > > > > > this defect.
> > > > > > > >
> > > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related this
> > > > problem.
> > > > > > > > We intend to be working on these early next week and will be
> > > aiming
> > > > > > > > for a fix that is back-portable to 6.0.
> > > > > > > >
> > > > > > > > Apologies that you have run into this defect and thanks for
> > > > > reporting.
> > > > > > > >
> > > > > > > > Thanks, Keith
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> > > > ramayan.tiwari@gmail.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Hi All,
> > > > > > > > >
> > > > > > > > > We have been monitoring the brokers everyday and today we
> > found
> > > > one
> > > > > > > > instance
> > > > > > > > >
> > > > > > > > > where broker’s DM was constantly going up and was about to
> > > crash,
> > > > > so
> > > > > > we
> > > > > > > > > experimented some mitigations, one of which caused the DM
> to
> > > come
> > > > > > down.
> > > > > > > > > Following are the details, which might help us
> understanding
> > > the
> > > > > > issue:
> > > > > > > > >
> > > > > > > > > Traffic scenario:
> > > > > > > > >
> > > > > > > > > DM allocation had been constantly going up and was at 90%.
> > > There
> > > > > > were two
> > > > > > > > > queues which seemed to align with the theories that we had.
> > > Q1’s
> > > > > > size had
> > > > > > > > > been large right after the broker start and had slow
> > > consumption
> > > > of
> > > > > > > > > messages, queue size only reduced from 76MB to 75MB over a
> > > period
> > > > > of
> > > > > > > > 6hrs.
> > > > > > > > >
> > > > > > > > > Q2 on the other hand, started small and was gradually
> > growing,
> > > > > queue
> > > > > > size
> > > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues with
> > > > traffic
> > > > > > > > during
> > > > > > > > >
> > > > > > > > > this time.
> > > > > > > > >
> > > > > > > > > Action taken:
> > > > > > > > >
> > > > > > > > > Moved all the messages from Q2 (since this was our original
> > > > theory)
> > > > > > to Q3
> > > > > > > > > (already created but no messages in it). This did not help
> > with
> > > > the
> > > > > > DM
> > > > > > > > > growing up.
> > > > > > > > > Moved all the messages from Q1 to Q4 (already created but
> no
> > > > > > messages in
> > > > > > > > > it). This reduced DM allocation from 93% to 31%.
> > > > > > > > >
> > > > > > > > > We have the heap dump and thread dump from when broker was
> > 90%
> > > in
> > > > > DM
> > > > > > > > > allocation. We are going to analyze that to see if we can
> get
> > > > some
> > > > > > clue.
> > > > > > > > We
> > > > > > > > >
> > > > > > > > > wanted to share this new information which might help in
> > > > reasoning
> > > > > > about
> > > > > > > > the
> > > > > > > > >
> > > > > > > > > memory issue.
> > > > > > > > >
> > > > > > > > > - Ramayan
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> > > > > > > > ramayan.tiwari@gmail.com>
> > > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Keith,
> > > > > > > > > >
> > > > > > > > > > Thanks so much for your response and digging into the
> > issue.
> > > > > Below
> > > > > > are
> > > > > > > > the
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > answer to your questions:
> > > > > > > > > >
> > > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't
> use
> > > 6.1
> > > > > > where it
> > > > > > > > > > was released because we need JMX support. Here is the
> > > > destination
> > > > > > > > format:
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes :
> {
> > > > > > arguments : {
> > > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> > > > > > > > > >
> > > > > > > > > > 2) Our machines have 40 cores, which will make the number
> > of
> > > > > > threads to
> > > > > > > > > > 80. This might not be an issue, because this will show up
> > in
> > > > the
> > > > > > > > baseline DM
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > allocated, which is only 6% (of 4GB) when we just bring
> up
> > > the
> > > > > > broker.
> > > > > > > > > >
> > > > > > > > > > 3) The only setting that we tuned WRT to DM is
> > > > > flowToDiskThreshold,
> > > > > > > > which
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > is set at 80% now.
> > > > > > > > > >
> > > > > > > > > > 4) Only one virtual host in the broker.
> > > > > > > > > >
> > > > > > > > > > 5) Most of our queues (99%) are priority, we also have
> 8-10
> > > > > sorted
> > > > > > > > queues.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > 6) Yeah we are using the standard 0.16 client and not
> AMQP
> > > 1.0
> > > > > > clients.
> > > > > > > > > > The connection log line looks like:
> > > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol
> > > > Version
> > > > > :
> > > > > > 0-10
> > > > > > > > :
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Client ID : test : Client Version : 0.16 : Client
> Product :
> > > > qpid
> > > > > > > > > >
> > > > > > > > > > We had another broker crashed about an hour back, we do
> see
> > > the
> > > > > > same
> > > > > > > > > > patterns:
> > > > > > > > > > 1) There is a queue which is constantly growing, enqueue
> is
> > > > > faster
> > > > > > than
> > > > > > > > > > dequeue on that queue for a long period of time.
> > > > > > > > > > 2) Flow to disk didn't kick in at all.
> > > > > > > > > >
> > > > > > > > > > This graph shows memory growth (red line - heap, blue -
> DM
> > > > > > allocated,
> > > > > > > > > > yellow - DM used)
> > > > > > > > > >
> > > > > > > > > > https://drive.google.com/file/d/
> > > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> > > > > > > > view?usp=sharing
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The below graph shows growth on a single queue (there are
> > > 10-12
> > > > > > other
> > > > > > > > > > queues with traffic as well, something large size than
> this
> > > > > queue):
> > > > > > > > > >
> > > > > > > > > > https://drive.google.com/file/d/
> > > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> > > > > > > > view?usp=sharing
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Couple of questions:
> > > > > > > > > > 1) Is there any developer level doc/design spec on how
> Qpid
> > > > uses
> > > > > > DM?
> > > > > > > > > > 2) We are not getting heap dumps automatically when
> broker
> > > > > crashes
> > > > > > due
> > > > > > > > to
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone
> > > > found a
> > > > > > way
> > > > > > > > to get
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > around this problem?
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Ramayan
> > > > > > > > > >
> > > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
> > > keith.wall@gmail.com
> > > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Hi Ramayan
> > > > > > > > > > >
> > > > > > > > > > > We have been discussing your problem here and have a
> > couple
> > > > of
> > > > > > > > questions.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I have been experimenting with use-cases based on your
> > > > > > descriptions
> > > > > > > > > > > above, but so far, have been unsuccessful in
> reproducing
> > a
> > > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
> > > > condition.
> > > > > > The
> > > > > > > > > > > direct memory usage reflects the expected model: it
> > levels
> > > > off
> > > > > > when
> > > > > > > > > > > the flow to disk threshold is reached and direct memory
> > is
> > > > > > release as
> > > > > > > > > > > messages are consumed until the minimum size for
> caching
> > of
> > > > > > direct is
> > > > > > > > > > > reached.
> > > > > > > > > > >
> > > > > > > > > > > 1] For clarity let me check: we believe when you say
> > "patch
> > > > to
> > > > > > use
> > > > > > > > > > > MultiQueueConsumer" you are referring to the patch
> > attached
> > > > to
> > > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the
> > broker"
> > > > > and
> > > > > > you
> > > > > > > > > > > are using a combination of this "x-pull-only"  with the
> > > > > standard
> > > > > > > > > > > "x-multiqueue" feature.  Is this correct?
> > > > > > > > > > >
> > > > > > > > > > > 2] One idea we had here relates to the size of the
> > > > virtualhost
> > > > > IO
> > > > > > > > > > > pool.   As you know from the documentation, the Broker
> > > > > > caches/reuses
> > > > > > > > > > > direct memory internally but the documentation fails to
> > > > > mentions
> > > > > > that
> > > > > > > > > > > each pooled virtualhost IO thread also grabs a chunk
> > (256K)
> > > > of
> > > > > > direct
> > > > > > > > > > > memory from this cache.  By default the virtual host IO
> > > pool
> > > > is
> > > > > > sized
> > > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() *
> 2,
> > > > 64),
> > > > > > so if
> > > > > > > > > > > you have a machine with a very large number of cores,
> you
> > > may
> > > > > > have a
> > > > > > > > > > > surprising large amount of direct memory assigned to
> > > > > virtualhost
> > > > > > IO
> > > > > > > > > > > threads.   Check the value of connectionThreadPoolSize
> on
> > > the
> > > > > > > > > > > virtualhost
> > > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
> > > > > > virtualhostnodename>/<;
> > > > > > > > virtualhostname>)
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > to see what value is in force.  What is it?  It is
> > possible
> > > > to
> > > > > > tune
> > > > > > > > > > > the pool size using context variable
> > > > > > > > > > > virtualhost.connectionThreadPool.size.
> > > > > > > > > > >
> > > > > > > > > > > 3] Tell me if you are tuning the Broker in way beyond
> the
> > > > > > direct/heap
> > > > > > > > > > > memory settings you have told us about already.  For
> > > instance
> > > > > > you are
> > > > > > > > > > > changing any of the direct memory pooling settings
> > > > > > > > > > > broker.directByteBufferPoolSize, default network
> buffer
> > > size
> > > > > > > > > > > qpid.broker.networkBufferSize or applying any other
> > > > > non-standard
> > > > > > > > > > > settings?
> > > > > > > > > > >
> > > > > > > > > > > 4] How many virtual hosts do you have on the Broker?
> > > > > > > > > > >
> > > > > > > > > > > 5] What is the consumption pattern of the messages?  Do
> > > > consume
> > > > > > in a
> > > > > > > > > > > strictly FIFO fashion or are you making use of message
> > > > > selectors
> > > > > > > > > > > or/and any of the out-of-order queue types (LVQs,
> > priority
> > > > > queue
> > > > > > or
> > > > > > > > > > > sorted queues)?
> > > > > > > > > > >
> > > > > > > > > > > 6] Is it just the 0.16 client involved in the
> > application?
> > > > >  Can
> > > > > > I
> > > > > > > > > > > check that you are not using any of the AMQP 1.0
> clients
> > > > > > > > > > > (org,apache.qpid:qpid-jms-client or
> > > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software
> > > stack
> > > > > (as
> > > > > > either
> > > > > > > > > > > consumers or producers)
> > > > > > > > > > >
> > > > > > > > > > > Hopefully the answers to these questions will get us
> > closer
> > > > to
> > > > > a
> > > > > > > > > > > reproduction.   If you are able to reliable reproduce
> it,
> > > > > please
> > > > > > share
> > > > > > > > > > > the steps with us.
> > > > > > > > > > >
> > > > > > > > > > > Kind regards, Keith.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> > > > > > ramayan.tiwari@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > After a lot of log mining, we might have a way to
> > explain
> > > > the
> > > > > > > > sustained
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > increased in DirectMemory allocation, the correlation
> > > seems
> > > > > to
> > > > > > be
> > > > > > > > with
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > the
> > > > > > > > > > > > growth in the size of a Queue that is getting
> consumed
> > > but
> > > > at
> > > > > > a much
> > > > > > > > > > > > slower
> > > > > > > > > > > > rate than producers putting messages on this queue.
> > > > > > > > > > > >
> > > > > > > > > > > > The pattern we see is that in each instance of broker
> > > > crash,
> > > > > > there is
> > > > > > > > > > > > at
> > > > > > > > > > > > least one queue (usually 1 queue) whose size kept
> > growing
> > > > > > steadily.
> > > > > > > > > > > > It’d be
> > > > > > > > > > > > of significant size but not the largest queue --
> > usually
> > > > > there
> > > > > > are
> > > > > > > > > > > > multiple
> > > > > > > > > > > > larger queues -- but it was different from other
> queues
> > > in
> > > > > > that its
> > > > > > > > > > > > size
> > > > > > > > > > > > was growing steadily. The queue would also be moving,
> > but
> > > > its
> > > > > > > > > > > > processing
> > > > > > > > > > > > rate was not keeping up with the enqueue rate.
> > > > > > > > > > > >
> > > > > > > > > > > > Our theory that might be totally wrong: If a queue is
> > > > moving
> > > > > > the
> > > > > > > > entire
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > time, maybe then the broker would keep reusing the
> same
> > > > > buffer
> > > > > > in
> > > > > > > > > > > > direct
> > > > > > > > > > > > memory for the queue, and keep on adding onto it at
> the
> > > end
> > > > > to
> > > > > > > > > > > > accommodate
> > > > > > > > > > > > new messages. But because it’s active all the time
> and
> > > > we’re
> > > > > > pointing
> > > > > > > > > > > > to
> > > > > > > > > > > > the same buffer, space allocated for messages at the
> > head
> > > > of
> > > > > > the
> > > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long after
> > those
> > > > > > messages
> > > > > > > > have
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > been processed. Just a theory.
> > > > > > > > > > > >
> > > > > > > > > > > > We are also trying to reproduce this using some perf
> > > tests
> > > > to
> > > > > > enqueue
> > > > > > > > > > > > with
> > > > > > > > > > > > same pattern, will update with the findings.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > Ramayan
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
> > > > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Another issue that we noticed is when broker goes
> OOM
> > > due
> > > > > to
> > > > > > direct
> > > > > > > > > > > > > memory, it doesn't create heap dump (specified by
> > > "-XX:+
> > > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM
> error
> > > is
> > > > > > same as
> > > > > > > > what
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > is
> > > > > > > > > > > > > mentioned in the oracle JVM docs
> > > > > > ("java.lang.OutOfMemoryError").
> > > > > > > > > > > > >
> > > > > > > > > > > > > Has anyone been able to find a way to get to heap
> > dump
> > > > for
> > > > > > DM OOM?
> > > > > > > > > > > > >
> > > > > > > > > > > > > - Ramayan
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
> > > > > > > > > > > > > <ramayan.tiwari@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alex,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Below are the flow to disk logs from broker
> having
> > > > > > 3million+
> > > > > > > > messages
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > at
> > > > > > > > > > > > > > this time. We only have one virtual host. Time is
> > in
> > > > GMT.
> > > > > > Looks
> > > > > > > > like
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > flow
> > > > > > > > > > > > > > to disk is active on the whole virtual host and
> > not a
> > > > > > queue level.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > When the same broker went OOM yesterday, I did
> not
> > > see
> > > > > any
> > > > > > flow to
> > > > > > > > > > > > > > disk
> > > > > > > > > > > > > > logs from when it was started until it crashed
> > > (crashed
> > > > > > twice
> > > > > > > > within
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4hrs).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3356539KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3354866KB
> > > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3358509KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3353501KB
> > > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3357544KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3353236KB
> > > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3356704KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3353511KB
> > > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3357948KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3355310KB
> > > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3365624KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3355136KB
> > > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO
> [Housekeeping[test]] -
> > > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > > use
> > > > > > > > > > > > > > 3358683KB
> > > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > After production release (2days back), we have
> > seen 4
> > > > > > crashes in 3
> > > > > > > > > > > > > > different brokers, this is the most pressing
> > concern
> > > > for
> > > > > > us in
> > > > > > > > > > > > > > decision if
> > > > > > > > > > > > > > we should roll back to 0.32. Any help is greatly
> > > > > > appreciated.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy
> <
> > > > > > orudyy@gmail.com
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Ramayan,
> > > > > > > > > > > > > > > Thanks for the details. I would like to clarify
> > > > whether
> > > > > > flow to
> > > > > > > > disk
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > was
> > > > > > > > > > > > > > > triggered today for 3 million messages?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The following logs are issued for flow to disk:
> > > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :
> Message
> > > > > memory
> > > > > > use
> > > > > > > > > > > > > > > {0,number,#}KB
> > > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> > > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> > Message
> > > > > > memory use
> > > > > > > > > > > > > > > {0,number,#}KB within threshold
> {1,number,#.##}KB
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
> > > > > > > > ramayan.tiwari@gmail.com>
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Alex,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks for your response, here are the
> details:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We use "direct" exchange, without persistence
> > (we
> > > > > > specify
> > > > > > > > > > > > > > > NON_PERSISTENT
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > that while sending from client) and use BDB
> > > store.
> > > > We
> > > > > > use JSON
> > > > > > > > > > > > > > > > virtual
> > > > > > > > > > > > > > > host
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > type. We are not using SSL.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > When the broker went OOM, we had around 1.3
> > > million
> > > > > > messages
> > > > > > > > with
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 100
> > > > > > > > > > > > > > > bytes
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > average message size. Direct memory
> allocation
> > > > (value
> > > > > > read from
> > > > > > > > > > > > > > > > MBean)
> > > > > > > > > > > > > > > kept
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > going up, even though it wouldn't need more
> DM
> > to
> > > > > > store these
> > > > > > > > many
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > messages. DM allocated persisted at 99% for
> > > about 3
> > > > > > and half
> > > > > > > > hours
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > before
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > crashing.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Today, on the same broker we have 3 million
> > > > messages
> > > > > > (same
> > > > > > > > message
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > size)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and DM allocated is only at 8%. This seems
> like
> > > > there
> > > > > > is some
> > > > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > de-allocation or a leak.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I have uploaded the memory utilization graph
> > > here:
> > > > > > > > > > > > > > > > https://drive.google.com/file/d/
> > > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> > > > > > > > > > > > > > > > view?usp=sharing
> > > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used
> > (sum
> > > > of
> > > > > > queue
> > > > > > > > > > > > > > > > payload)
> > > > > > > > > > > > > > > and Red
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > is heap usage.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr
> > Rudyy
> > > > > > > > > > > > > > > > <or...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi Ramayan,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Could please share with us the details of
> > > > messaging
> > > > > > use
> > > > > > > > case(s)
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > > ended
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > up in OOM on broker side?
> > > > > > > > > > > > > > > > > I would like to reproduce the issue on my
> > local
> > > > > > broker in
> > > > > > > > order
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > fix
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > I would appreciate if you could provide as
> > much
> > > > > > details as
> > > > > > > > > > > > > > > > > possible,
> > > > > > > > > > > > > > > > > including, messaging topology, message
> > > > persistence
> > > > > > type,
> > > > > > > > message
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > sizes,volumes, etc.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for
> > > keeping
> > > > > > message
> > > > > > > > content
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > > receiving/sending data. Each plain
> connection
> > > > > > utilizes 512K of
> > > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of
> direct
> > > > > > memory. Your
> > > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > > settings
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > look Ok to me.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari
> > > > > > > > > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with
> patch
> > to
> > > > use
> > > > > > > > > > > > > > > MultiQueueConsumer
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > feature. We just finished deploying to
> > > > production
> > > > > > and saw
> > > > > > > > > > > > > > > > > > couple of
> > > > > > > > > > > > > > > > > > instances of broker OOM due to running
> out
> > of
> > > > > > DirectMemory
> > > > > > > > > > > > > > > > > > buffer
> > > > > > > > > > > > > > > > > > (exceptions at the end of this email).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Here is our setup:
> > > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g
> (this
> > > is
> > > > > > opposite of
> > > > > > > > > > > > > > > > > > what the
> > > > > > > > > > > > > > > > > > recommendation is, however, for our use
> > cause
> > > > > > message
> > > > > > > > payload
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > > really
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > small ~400bytes and is way less than the
> > per
> > > > > > message
> > > > > > > > overhead
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > 1KB).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > perf testing, we were able to put 2
> million
> > > > > > messages without
> > > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > > issues.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
> > > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and
> > there
> > > is
> > > > > > one multi
> > > > > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > > > > consumer
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > attached to each session, listening to
> > around
> > > > > 1000
> > > > > > queues.
> > > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I
> know).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > With the above setup, the baseline
> > > utilization
> > > > > > (without any
> > > > > > > > > > > > > > > messages)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > direct memory was around 230mb (with 410
> > > > > > connection each
> > > > > > > > > > > > > > > > > > taking
> > > > > > > > > > > > > > > 500KB).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Based on our understanding of broker
> memory
> > > > > > allocation,
> > > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > > payload
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > should be the only thing adding to direct
> > > > memory
> > > > > > utilization
> > > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > > top of
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > baseline), however, we are experiencing
> > > > something
> > > > > > completely
> > > > > > > > > > > > > > > different.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > our last broker crash, we see that broker
> > is
> > > > > > constantly
> > > > > > > > > > > > > > > > > > running
> > > > > > > > > > > > > > > with
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 90%+
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > direct memory allocated, even when
> message
> > > > > payload
> > > > > > sum from
> > > > > > > > > > > > > > > > > > all the
> > > > > > > > > > > > > > > > > queues
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > is only 6-8% (these % are against
> available
> > > DM
> > > > of
> > > > > > 4gb).
> > > > > > > > During
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > DM usage period, heap usage was around
> 60%
> > > (of
> > > > > > 12gb).
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > We would like some help in understanding
> > what
> > > > > > could be the
> > > > > > > > > > > > > > > > > > reason
> > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > high DM allocations. Are there things
> other
> > > > than
> > > > > > message
> > > > > > > > > > > > > > > > > > payload
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > connection, which use DM and could be
> > > > > contributing
> > > > > > to these
> > > > > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > > usage?
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Another thing where we are puzzled is the
> > > > > > de-allocation of
> > > > > > > > DM
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > byte
> > > > > > > > > > > > > > > > > buffers.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > From log mining of heap and DM
> utilization,
> > > > > > de-allocation of
> > > > > > > > > > > > > > > > > > DM
> > > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has
> seen
> > > any
> > > > > > documentation
> > > > > > > > > > > > > > > related to
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > this, it would be very helpful if you
> could
> > > > share
> > > > > > that.
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Exceptions*
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > > > memory
> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> > > Bits.java:658)
> > > > > > > > ~[na:1.8.0_40]
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > > > init>(DirectByteBuffer.java:
> > > > > > > > 123)
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > > > allocateDirect(ByteBuffer.
> > > > > > java:311)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingConnectionPlainD
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > elegate.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> > > > > > > > NonBlockingConnectionPlainDele
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > gate.java:93)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingConnectionPlainDele
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > gate.processData(
> > > > NonBlockingConnectionPlainDele
> > > > > > > > gate.java:60)
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingConnection.doRead(
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingConnection.doWork(
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NetworkConnectionScheduler.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > processConnection(
> > > NetworkConnectionScheduler.
> > > > > > java:124)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > transport.SelectorThread$
> > > > > > > > ConnectionPr
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > ocessor.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > processConnection(
> SelectorThread.java:504)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > transport.SelectorThread$
> > > > > > > > > > > > > > > > > > SelectionTask.performSelect(
> > > > > > SelectorThread.java:337)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > transport.SelectorThread$
> > > > > > > > SelectionTask.run(
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > SelectorThread.java:87)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > *Second exception*
> > > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > > > memory
> > > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> > > Bits.java:658)
> > > > > > > > ~[na:1.8.0_40]
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > > > init>(DirectByteBuffer.java:
> > > > > > > > 123)
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > > > allocateDirect(ByteBuffer.
> > > > > > java:311)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingConnectionPlainDele
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > gate.<init>(
> NonBlockingConnectionPlainDele
> > > > > > gate.java:45)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnection.
> > > > > > > > > > > > > > > > > > setTransportEncryption(
> > > > > NonBlockingConnection.java:
> > > > > > 625)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingConnection.<init>(
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > > NonBlockingNetworkTransport.
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > acceptSocketChannel(
> > > > NonBlockingNetworkTransport.
> > > > > > java:158)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > transport.SelectorThread$
> > > > > > > > SelectionTas
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > k$1.run(
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > > SelectorThread.java:191)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > ------------------------------
> > > ------------------------------
> > > > > > ---------
> > > > > > > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.
> > org
> > > > > > > > > > > For additional commands, e-mail:
> > > users-help@qpid.apache.org
> > > > > > > > > > >
> > > > > >
> > > > > >
> > > > > > ------------------------------------------------------------
> > > ---------
> > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Java broker OOM due to DirectMemory

Posted by Oleksandr Rudyy <or...@gmail.com>.
Hi Ramayan,

Thanks for testing the patch and providing a feedback.

Regarding direct memory utilization, the Qpid Broker caches up to 256MB of
direct memory internally in QpidByteBuffers. Thus, when testing the Broker
with only 256MB of direct memory, the entire direct memory could be cached
and it would look as if direct memory is never released. Potentially, you
can reduce the number of buffers cached on broker by changing context
variable 'broker.directByteBufferPoolSize'. By default, it is set to 1000.
With buffer size of 256K, it would give ~256M of cache.

Regarding introducing lower and upper thresholds for 'flow to disk'. It
seems like a good idea and we will try to implement it early this week on
trunk first.

Kind Regards,
Alex


On 5 May 2017 at 23:49, Ramayan Tiwari <ra...@gmail.com> wrote:

> Hi Alex,
>
> Thanks for providing the patch. I verified the fix with same perf test, and
> it does prevent broker from going OOM, however. DM utilization doesn't get
> any better after hitting the threshold (where flow to disk is activated
> based on total used % across broker - graph in the link below).
>
> After hitting the final threshold, flow to disk activates and deactivates
> pretty frequently across all the queues. The reason seems to be because
> there is only one threshold currently to trigger flow to disk. Would it
> make sense to break this down to high and low threshold - so that once flow
> to disk is active after hitting high threshold, it will be active until the
> queue utilization (or broker DM allocation) reaches the low threshold.
>
> Graph and flow to disk logs are here:
> https://docs.google.com/document/d/1Wc1e-id-WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> U-RiM/edit#heading=h.6400pltvjhy7
>
> Thanks
> Ramayan
>
> On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <or...@gmail.com> wrote:
>
> > Hi Ramayan,
> >
> > We attached to the QPID-7753 a patch with a work around for 6.0.x branch.
> > It triggers flow to disk based on direct memory consumption rather than
> > estimation of the space occupied by the message content. The flow to disk
> > should evacuate message content preventing running out of direct memory.
> We
> > already committed the changes into 6.0.x and 6.1.x branches. It will be
> > included into upcoming 6.0.7 and 6.1.3 releases.
> >
> > Please try and test the patch in your environment.
> >
> > We are still working at finishing of the fix for trunk.
> >
> > Kind Regards,
> > Alex
> >
> > On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com> wrote:
> >
> > > Hi Ramayan,
> > >
> > > The high-level plan is currently as follows:
> > >  1) Periodically try to compact sparse direct memory buffers.
> > >  2) Increase accuracy of messages' direct memory usage estimation to
> more
> > > reliably trigger flow to disk.
> > >  3) Add an additional flow to disk trigger based on the amount of
> > allocated
> > > direct memory.
> > >
> > > A little bit more details:
> > >  1) We plan on periodically checking the amount of direct memory usage
> > and
> > > if it is above a
> > >     threshold (50%) we compare the sum of all queue sizes with the
> amount
> > > of allocated direct memory.
> > >     If the ratio falls below a certain threshold we trigger a
> compaction
> > > task which goes through all queues
> > >     and copy's a certain amount of old message buffers into new ones
> > > thereby freeing the old buffers so
> > >     that they can be returned to the buffer pool and be reused.
> > >
> > >  2) Currently we trigger flow to disk based on an estimate of how much
> > > memory the messages on the
> > >     queues consume. We had to use estimates because we did not have
> > > accurate size numbers for
> > >     message headers. By having accurate size information for message
> > > headers we can more reliably
> > >     enforce queue memory limits.
> > >
> > >  3) The flow to disk trigger based on message size had another problem
> > > which is more pertinent to the
> > >     current issue. We only considered the size of the messages and not
> > how
> > > much memory we allocate
> > >     to store those messages. In the FIFO use case those numbers will be
> > > very close to each other but in
> > >     use cases like yours we can end up with sparse buffers and the
> > numbers
> > > will diverge. Because of this
> > >     divergence we do not trigger flow to disk in time and the broker
> can
> > go
> > > OOM.
> > >     To fix the issue we want to add an additional flow to disk trigger
> > > based on the amount of allocated direct
> > >     memory. This should prevent the broker from going OOM even if the
> > > compaction strategy outlined above
> > >     should fail for some reason (e.g., the compaction task cannot keep
> up
> > > with the arrival of new messages).
> > >
> > > Currently, there are patches for the above points but they suffer from
> > some
> > > thread-safety issues that need to be addressed.
> > >
> > > I hope this description helps. Any feedback is, as always, welcome.
> > >
> > > Kind regards,
> > > Lorenz
> > >
> > >
> > >
> > > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
> > ramayan.tiwari@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Lorenz,
> > > >
> > > > Thanks so much for the patch. We have a perf test now to reproduce
> this
> > > > issue, so we did test with 256KB, 64KB and 4KB network byte buffer.
> > None
> > > of
> > > > these configurations help with the issue (or give any more breathing
> > > room)
> > > > for our use case. We would like to share the perf analysis with the
> > > > community:
> > > >
> > > > https://docs.google.com/document/d/1Wc1e-id-
> > > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> > > > U-RiM/edit?usp=sharing
> > > >
> > > > Feel free to comment on the doc if certain details are incorrect or
> if
> > > > there are questions.
> > > >
> > > > Since the short term solution doesn't help us, we are very interested
> > in
> > > > getting some details on how the community plans to address this, a
> high
> > > > level description of the approach will be very helpful for us in
> order
> > to
> > > > brainstorm our use cases along with this solution.
> > > >
> > > > - Ramayan
> > > >
> > > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <
> quack.lorenz@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Ramayan,
> > > > >
> > > > > We are still working on a fix for this issue.
> > > > > In the mean time we had an idea to potentially workaround the issue
> > > until
> > > > > a proper fix is released.
> > > > >
> > > > > The idea is to decrease the qpid network buffer size the broker
> uses.
> > > > > While this still allows for sparsely populated buffers it would
> > improve
> > > > > the overall occupancy ratio.
> > > > >
> > > > > Here are the steps to follow:
> > > > >  * ensure you are not using TLS
> > > > >  * apply the attached patch
> > > > >  * figure out the size of the largest messages you are sending
> > > (including
> > > > > header and some overhead)
> > > > >  * set the context variable "qpid.broker.networkBufferSize" to
> that
> > > > value
> > > > > but not smaller than 4096
> > > > >  * test
> > > > >
> > > > > Decreasing the qpid network buffer size automatically limits the
> > > maximum
> > > > > AMQP frame size.
> > > > > Since you are using a very old client we are not sure how well it
> > copes
> > > > > with small frame sizes where it has to split a message across
> > multiple
> > > > > frames.
> > > > > Therefore, to play it safe you should not set it smaller than the
> > > largest
> > > > > messages (+ header + overhead) you are sending.
> > > > > I do not know what message sizes you are sending but AMQP imposes
> the
> > > > > restriction that the framesize cannot be smaller than 4096 bytes.
> > > > > In the qpid broker the default currently is 256 kB.
> > > > >
> > > > > In the current state the broker does not allow setting the network
> > > buffer
> > > > > to values smaller than 64 kB to allow TLS frames to fit into one
> > > network
> > > > > buffer.
> > > > > I attached a patch to this mail that lowers that restriction to the
> > > limit
> > > > > imposed by AMQP (4096 Bytes).
> > > > > Obviously, you should not use this when using TLS.
> > > > >
> > > > >
> > > > > I hope this reduces the problems you are currently facing until we
> > can
> > > > > complete the proper fix.
> > > > >
> > > > > Kind regards,
> > > > > Lorenz
> > > > >
> > > > >
> > > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> > > > > > Thanks so much Keith and the team for finding the root cause. We
> > are
> > > so
> > > > > > relieved that we fix the root cause shortly.
> > > > > >
> > > > > > Couple of things that I forgot to mention on the mitigation steps
> > we
> > > > took
> > > > > > in the last incident:
> > > > > > 1) We triggered GC from JMX bean multiple times, it did not help
> in
> > > > > > reducing DM allocated.
> > > > > > 2) We also killed all the AMQP connections to the broker when DM
> > was
> > > at
> > > > > > 80%. This did not help either. The way we killed connections -
> > using
> > > > JMX
> > > > > > got list of all the open AMQP connections and called close from
> JMX
> > > > > mbean.
> > > > > >
> > > > > > I am hoping the above two are not related to root cause, but
> wanted
> > > to
> > > > > > bring it up in case this is relevant.
> > > > > >
> > > > > > Thanks
> > > > > > Ramayan
> > > > > >
> > > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <ke...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > >
> > > > > > > Hello Ramayan
> > > > > > >
> > > > > > > I believe I understand the root cause of the problem.  We have
> > > > > > > identified a flaw in the direct memory buffer management
> employed
> > > by
> > > > > > > Qpid Broker J which for some messaging use-cases can lead to
> the
> > > OOM
> > > > > > > direct you describe.   For the issue to manifest the producing
> > > > > > > application needs to use a single connection for the production
> > of
> > > > > > > messages some of which are short-lived (i.e. are consumed
> > quickly)
> > > > > > > whilst others remain on the queue for some time.  Priority
> > queues,
> > > > > > > sorted queues and consumers utilising selectors that result in
> > some
> > > > > > > messages being left of the queue could all produce this patten.
> > > The
> > > > > > > pattern leads to a sparsely occupied 256K net buffers which
> > cannot
> > > be
> > > > > > > released or reused until every message that reference a 'chunk'
> > of
> > > it
> > > > > > > is either consumed or flown to disk.   The problem was
> introduced
> > > > with
> > > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> > > > > > >
> > > > > > > The flow to disk feature is not helping us here because its
> > > algorithm
> > > > > > > considers only the size of live messages on the queues. If the
> > > > > > > accumulative live size does not exceed the threshold, the
> > messages
> > > > > > > aren't flown to disk. I speculate that when you observed that
> > > moving
> > > > > > > messages cause direct message usage to drop earlier today, your
> > > > > > > message movement cause a queue to go over threshold, cause
> > message
> > > to
> > > > > > > be flown to disk and their direct memory references released.
> > The
> > > > > > > logs will confirm this is so.
> > > > > > >
> > > > > > > I have not identified an easy workaround at the moment.
> > >  Decreasing
> > > > > > > the flow to disk threshold and/or increasing available direct
> > > memory
> > > > > > > should alleviate and may be an acceptable short term
> workaround.
> > > If
> > > > > > > it were possible for publishing application to publish short
> > lived
> > > > and
> > > > > > > long lived messages on two separate JMS connections this would
> > > avoid
> > > > > > > this defect.
> > > > > > >
> > > > > > > QPID-7753 tracks this issue and QPID-7754 is a related this
> > > problem.
> > > > > > > We intend to be working on these early next week and will be
> > aiming
> > > > > > > for a fix that is back-portable to 6.0.
> > > > > > >
> > > > > > > Apologies that you have run into this defect and thanks for
> > > > reporting.
> > > > > > >
> > > > > > > Thanks, Keith
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> > > ramayan.tiwari@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi All,
> > > > > > > >
> > > > > > > > We have been monitoring the brokers everyday and today we
> found
> > > one
> > > > > > > instance
> > > > > > > >
> > > > > > > > where broker’s DM was constantly going up and was about to
> > crash,
> > > > so
> > > > > we
> > > > > > > > experimented some mitigations, one of which caused the DM to
> > come
> > > > > down.
> > > > > > > > Following are the details, which might help us understanding
> > the
> > > > > issue:
> > > > > > > >
> > > > > > > > Traffic scenario:
> > > > > > > >
> > > > > > > > DM allocation had been constantly going up and was at 90%.
> > There
> > > > > were two
> > > > > > > > queues which seemed to align with the theories that we had.
> > Q1’s
> > > > > size had
> > > > > > > > been large right after the broker start and had slow
> > consumption
> > > of
> > > > > > > > messages, queue size only reduced from 76MB to 75MB over a
> > period
> > > > of
> > > > > > > 6hrs.
> > > > > > > >
> > > > > > > > Q2 on the other hand, started small and was gradually
> growing,
> > > > queue
> > > > > size
> > > > > > > > went from 7MB to 10MB in 6hrs. There were other queues with
> > > traffic
> > > > > > > during
> > > > > > > >
> > > > > > > > this time.
> > > > > > > >
> > > > > > > > Action taken:
> > > > > > > >
> > > > > > > > Moved all the messages from Q2 (since this was our original
> > > theory)
> > > > > to Q3
> > > > > > > > (already created but no messages in it). This did not help
> with
> > > the
> > > > > DM
> > > > > > > > growing up.
> > > > > > > > Moved all the messages from Q1 to Q4 (already created but no
> > > > > messages in
> > > > > > > > it). This reduced DM allocation from 93% to 31%.
> > > > > > > >
> > > > > > > > We have the heap dump and thread dump from when broker was
> 90%
> > in
> > > > DM
> > > > > > > > allocation. We are going to analyze that to see if we can get
> > > some
> > > > > clue.
> > > > > > > We
> > > > > > > >
> > > > > > > > wanted to share this new information which might help in
> > > reasoning
> > > > > about
> > > > > > > the
> > > > > > > >
> > > > > > > > memory issue.
> > > > > > > >
> > > > > > > > - Ramayan
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> > > > > > > ramayan.tiwari@gmail.com>
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Keith,
> > > > > > > > >
> > > > > > > > > Thanks so much for your response and digging into the
> issue.
> > > > Below
> > > > > are
> > > > > > > the
> > > > > > > >
> > > > > > > > >
> > > > > > > > > answer to your questions:
> > > > > > > > >
> > > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't use
> > 6.1
> > > > > where it
> > > > > > > > > was released because we need JMX support. Here is the
> > > destination
> > > > > > > format:
> > > > > > > >
> > > > > > > > >
> > > > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes : {
> > > > > arguments : {
> > > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> > > > > > > > >
> > > > > > > > > 2) Our machines have 40 cores, which will make the number
> of
> > > > > threads to
> > > > > > > > > 80. This might not be an issue, because this will show up
> in
> > > the
> > > > > > > baseline DM
> > > > > > > >
> > > > > > > > >
> > > > > > > > > allocated, which is only 6% (of 4GB) when we just bring up
> > the
> > > > > broker.
> > > > > > > > >
> > > > > > > > > 3) The only setting that we tuned WRT to DM is
> > > > flowToDiskThreshold,
> > > > > > > which
> > > > > > > >
> > > > > > > > >
> > > > > > > > > is set at 80% now.
> > > > > > > > >
> > > > > > > > > 4) Only one virtual host in the broker.
> > > > > > > > >
> > > > > > > > > 5) Most of our queues (99%) are priority, we also have 8-10
> > > > sorted
> > > > > > > queues.
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 6) Yeah we are using the standard 0.16 client and not AMQP
> > 1.0
> > > > > clients.
> > > > > > > > > The connection log line looks like:
> > > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol
> > > Version
> > > > :
> > > > > 0-10
> > > > > > > :
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Client ID : test : Client Version : 0.16 : Client Product :
> > > qpid
> > > > > > > > >
> > > > > > > > > We had another broker crashed about an hour back, we do see
> > the
> > > > > same
> > > > > > > > > patterns:
> > > > > > > > > 1) There is a queue which is constantly growing, enqueue is
> > > > faster
> > > > > than
> > > > > > > > > dequeue on that queue for a long period of time.
> > > > > > > > > 2) Flow to disk didn't kick in at all.
> > > > > > > > >
> > > > > > > > > This graph shows memory growth (red line - heap, blue - DM
> > > > > allocated,
> > > > > > > > > yellow - DM used)
> > > > > > > > >
> > > > > > > > > https://drive.google.com/file/d/
> > 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> > > > > > > view?usp=sharing
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The below graph shows growth on a single queue (there are
> > 10-12
> > > > > other
> > > > > > > > > queues with traffic as well, something large size than this
> > > > queue):
> > > > > > > > >
> > > > > > > > > https://drive.google.com/file/d/
> > 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> > > > > > > view?usp=sharing
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Couple of questions:
> > > > > > > > > 1) Is there any developer level doc/design spec on how Qpid
> > > uses
> > > > > DM?
> > > > > > > > > 2) We are not getting heap dumps automatically when broker
> > > > crashes
> > > > > due
> > > > > > > to
> > > > > > > >
> > > > > > > > >
> > > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone
> > > found a
> > > > > way
> > > > > > > to get
> > > > > > > >
> > > > > > > > >
> > > > > > > > > around this problem?
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Ramayan
> > > > > > > > >
> > > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
> > keith.wall@gmail.com
> > > >
> > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Hi Ramayan
> > > > > > > > > >
> > > > > > > > > > We have been discussing your problem here and have a
> couple
> > > of
> > > > > > > questions.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I have been experimenting with use-cases based on your
> > > > > descriptions
> > > > > > > > > > above, but so far, have been unsuccessful in reproducing
> a
> > > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
> > > condition.
> > > > > The
> > > > > > > > > > direct memory usage reflects the expected model: it
> levels
> > > off
> > > > > when
> > > > > > > > > > the flow to disk threshold is reached and direct memory
> is
> > > > > release as
> > > > > > > > > > messages are consumed until the minimum size for caching
> of
> > > > > direct is
> > > > > > > > > > reached.
> > > > > > > > > >
> > > > > > > > > > 1] For clarity let me check: we believe when you say
> "patch
> > > to
> > > > > use
> > > > > > > > > > MultiQueueConsumer" you are referring to the patch
> attached
> > > to
> > > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the
> broker"
> > > > and
> > > > > you
> > > > > > > > > > are using a combination of this "x-pull-only"  with the
> > > > standard
> > > > > > > > > > "x-multiqueue" feature.  Is this correct?
> > > > > > > > > >
> > > > > > > > > > 2] One idea we had here relates to the size of the
> > > virtualhost
> > > > IO
> > > > > > > > > > pool.   As you know from the documentation, the Broker
> > > > > caches/reuses
> > > > > > > > > > direct memory internally but the documentation fails to
> > > > mentions
> > > > > that
> > > > > > > > > > each pooled virtualhost IO thread also grabs a chunk
> (256K)
> > > of
> > > > > direct
> > > > > > > > > > memory from this cache.  By default the virtual host IO
> > pool
> > > is
> > > > > sized
> > > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() * 2,
> > > 64),
> > > > > so if
> > > > > > > > > > you have a machine with a very large number of cores, you
> > may
> > > > > have a
> > > > > > > > > > surprising large amount of direct memory assigned to
> > > > virtualhost
> > > > > IO
> > > > > > > > > > threads.   Check the value of connectionThreadPoolSize on
> > the
> > > > > > > > > > virtualhost
> > > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
> > > > > virtualhostnodename>/<;
> > > > > > > virtualhostname>)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > to see what value is in force.  What is it?  It is
> possible
> > > to
> > > > > tune
> > > > > > > > > > the pool size using context variable
> > > > > > > > > > virtualhost.connectionThreadPool.size.
> > > > > > > > > >
> > > > > > > > > > 3] Tell me if you are tuning the Broker in way beyond the
> > > > > direct/heap
> > > > > > > > > > memory settings you have told us about already.  For
> > instance
> > > > > you are
> > > > > > > > > > changing any of the direct memory pooling settings
> > > > > > > > > > broker.directByteBufferPoolSize, default network buffer
> > size
> > > > > > > > > > qpid.broker.networkBufferSize or applying any other
> > > > non-standard
> > > > > > > > > > settings?
> > > > > > > > > >
> > > > > > > > > > 4] How many virtual hosts do you have on the Broker?
> > > > > > > > > >
> > > > > > > > > > 5] What is the consumption pattern of the messages?  Do
> > > consume
> > > > > in a
> > > > > > > > > > strictly FIFO fashion or are you making use of message
> > > > selectors
> > > > > > > > > > or/and any of the out-of-order queue types (LVQs,
> priority
> > > > queue
> > > > > or
> > > > > > > > > > sorted queues)?
> > > > > > > > > >
> > > > > > > > > > 6] Is it just the 0.16 client involved in the
> application?
> > > >  Can
> > > > > I
> > > > > > > > > > check that you are not using any of the AMQP 1.0 clients
> > > > > > > > > > (org,apache.qpid:qpid-jms-client or
> > > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software
> > stack
> > > > (as
> > > > > either
> > > > > > > > > > consumers or producers)
> > > > > > > > > >
> > > > > > > > > > Hopefully the answers to these questions will get us
> closer
> > > to
> > > > a
> > > > > > > > > > reproduction.   If you are able to reliable reproduce it,
> > > > please
> > > > > share
> > > > > > > > > > the steps with us.
> > > > > > > > > >
> > > > > > > > > > Kind regards, Keith.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> > > > > ramayan.tiwari@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > After a lot of log mining, we might have a way to
> explain
> > > the
> > > > > > > sustained
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > increased in DirectMemory allocation, the correlation
> > seems
> > > > to
> > > > > be
> > > > > > > with
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > the
> > > > > > > > > > > growth in the size of a Queue that is getting consumed
> > but
> > > at
> > > > > a much
> > > > > > > > > > > slower
> > > > > > > > > > > rate than producers putting messages on this queue.
> > > > > > > > > > >
> > > > > > > > > > > The pattern we see is that in each instance of broker
> > > crash,
> > > > > there is
> > > > > > > > > > > at
> > > > > > > > > > > least one queue (usually 1 queue) whose size kept
> growing
> > > > > steadily.
> > > > > > > > > > > It’d be
> > > > > > > > > > > of significant size but not the largest queue --
> usually
> > > > there
> > > > > are
> > > > > > > > > > > multiple
> > > > > > > > > > > larger queues -- but it was different from other queues
> > in
> > > > > that its
> > > > > > > > > > > size
> > > > > > > > > > > was growing steadily. The queue would also be moving,
> but
> > > its
> > > > > > > > > > > processing
> > > > > > > > > > > rate was not keeping up with the enqueue rate.
> > > > > > > > > > >
> > > > > > > > > > > Our theory that might be totally wrong: If a queue is
> > > moving
> > > > > the
> > > > > > > entire
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > time, maybe then the broker would keep reusing the same
> > > > buffer
> > > > > in
> > > > > > > > > > > direct
> > > > > > > > > > > memory for the queue, and keep on adding onto it at the
> > end
> > > > to
> > > > > > > > > > > accommodate
> > > > > > > > > > > new messages. But because it’s active all the time and
> > > we’re
> > > > > pointing
> > > > > > > > > > > to
> > > > > > > > > > > the same buffer, space allocated for messages at the
> head
> > > of
> > > > > the
> > > > > > > > > > > queue/buffer doesn’t get reclaimed, even long after
> those
> > > > > messages
> > > > > > > have
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > been processed. Just a theory.
> > > > > > > > > > >
> > > > > > > > > > > We are also trying to reproduce this using some perf
> > tests
> > > to
> > > > > enqueue
> > > > > > > > > > > with
> > > > > > > > > > > same pattern, will update with the findings.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Ramayan
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
> > > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Another issue that we noticed is when broker goes OOM
> > due
> > > > to
> > > > > direct
> > > > > > > > > > > > memory, it doesn't create heap dump (specified by
> > "-XX:+
> > > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM error
> > is
> > > > > same as
> > > > > > > what
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > is
> > > > > > > > > > > > mentioned in the oracle JVM docs
> > > > > ("java.lang.OutOfMemoryError").
> > > > > > > > > > > >
> > > > > > > > > > > > Has anyone been able to find a way to get to heap
> dump
> > > for
> > > > > DM OOM?
> > > > > > > > > > > >
> > > > > > > > > > > > - Ramayan
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
> > > > > > > > > > > > <ramayan.tiwari@gmail.com
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Alex,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Below are the flow to disk logs from broker having
> > > > > 3million+
> > > > > > > messages
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > at
> > > > > > > > > > > > > this time. We only have one virtual host. Time is
> in
> > > GMT.
> > > > > Looks
> > > > > > > like
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > flow
> > > > > > > > > > > > > to disk is active on the whole virtual host and
> not a
> > > > > queue level.
> > > > > > > > > > > > >
> > > > > > > > > > > > > When the same broker went OOM yesterday, I did not
> > see
> > > > any
> > > > > flow to
> > > > > > > > > > > > > disk
> > > > > > > > > > > > > logs from when it was started until it crashed
> > (crashed
> > > > > twice
> > > > > > > within
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4hrs).
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3356539KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3354866KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3358509KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3353501KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3357544KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3353236KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3356704KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3353511KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3357948KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3355310KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3365624KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3355136KB
> > > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > > use
> > > > > > > > > > > > > 3358683KB
> > > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > After production release (2days back), we have
> seen 4
> > > > > crashes in 3
> > > > > > > > > > > > > different brokers, this is the most pressing
> concern
> > > for
> > > > > us in
> > > > > > > > > > > > > decision if
> > > > > > > > > > > > > we should roll back to 0.32. Any help is greatly
> > > > > appreciated.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy <
> > > > > orudyy@gmail.com
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Ramayan,
> > > > > > > > > > > > > > Thanks for the details. I would like to clarify
> > > whether
> > > > > flow to
> > > > > > > disk
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > was
> > > > > > > > > > > > > > triggered today for 3 million messages?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The following logs are issued for flow to disk:
> > > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > > memory
> > > > > use
> > > > > > > > > > > > > > {0,number,#}KB
> > > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> > > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive :
> Message
> > > > > memory use
> > > > > > > > > > > > > > {0,number,#}KB within threshold {1,number,#.##}KB
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
> > > > > > > ramayan.tiwari@gmail.com>
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Alex,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks for your response, here are the details:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > We use "direct" exchange, without persistence
> (we
> > > > > specify
> > > > > > > > > > > > > > NON_PERSISTENT
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > that while sending from client) and use BDB
> > store.
> > > We
> > > > > use JSON
> > > > > > > > > > > > > > > virtual
> > > > > > > > > > > > > > host
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > type. We are not using SSL.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > When the broker went OOM, we had around 1.3
> > million
> > > > > messages
> > > > > > > with
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 100
> > > > > > > > > > > > > > bytes
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > average message size. Direct memory allocation
> > > (value
> > > > > read from
> > > > > > > > > > > > > > > MBean)
> > > > > > > > > > > > > > kept
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > going up, even though it wouldn't need more DM
> to
> > > > > store these
> > > > > > > many
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > messages. DM allocated persisted at 99% for
> > about 3
> > > > > and half
> > > > > > > hours
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > before
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > crashing.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Today, on the same broker we have 3 million
> > > messages
> > > > > (same
> > > > > > > message
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > size)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and DM allocated is only at 8%. This seems like
> > > there
> > > > > is some
> > > > > > > > > > > > > > > issue
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > de-allocation or a leak.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I have uploaded the memory utilization graph
> > here:
> > > > > > > > > > > > > > > https://drive.google.com/file/d/
> > > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> > > > > > > > > > > > > > > view?usp=sharing
> > > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used
> (sum
> > > of
> > > > > queue
> > > > > > > > > > > > > > > payload)
> > > > > > > > > > > > > > and Red
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > is heap usage.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr
> Rudyy
> > > > > > > > > > > > > > > <or...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi Ramayan,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Could please share with us the details of
> > > messaging
> > > > > use
> > > > > > > case(s)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > > ended
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > up in OOM on broker side?
> > > > > > > > > > > > > > > > I would like to reproduce the issue on my
> local
> > > > > broker in
> > > > > > > order
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > fix
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > I would appreciate if you could provide as
> much
> > > > > details as
> > > > > > > > > > > > > > > > possible,
> > > > > > > > > > > > > > > > including, messaging topology, message
> > > persistence
> > > > > type,
> > > > > > > message
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > sizes,volumes, etc.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for
> > keeping
> > > > > message
> > > > > > > content
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > > receiving/sending data. Each plain connection
> > > > > utilizes 512K of
> > > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of direct
> > > > > memory. Your
> > > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > > settings
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > look Ok to me.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari
> > > > > > > > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with patch
> to
> > > use
> > > > > > > > > > > > > > MultiQueueConsumer
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > feature. We just finished deploying to
> > > production
> > > > > and saw
> > > > > > > > > > > > > > > > > couple of
> > > > > > > > > > > > > > > > > instances of broker OOM due to running out
> of
> > > > > DirectMemory
> > > > > > > > > > > > > > > > > buffer
> > > > > > > > > > > > > > > > > (exceptions at the end of this email).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Here is our setup:
> > > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g (this
> > is
> > > > > opposite of
> > > > > > > > > > > > > > > > > what the
> > > > > > > > > > > > > > > > > recommendation is, however, for our use
> cause
> > > > > message
> > > > > > > payload
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > > really
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > small ~400bytes and is way less than the
> per
> > > > > message
> > > > > > > overhead
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > > 1KB).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > perf testing, we were able to put 2 million
> > > > > messages without
> > > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > > issues.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2. ~400 connections to broker.
> > > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and
> there
> > is
> > > > > one multi
> > > > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > > > consumer
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > attached to each session, listening to
> around
> > > > 1000
> > > > > queues.
> > > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I know).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > With the above setup, the baseline
> > utilization
> > > > > (without any
> > > > > > > > > > > > > > messages)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > for
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > direct memory was around 230mb (with 410
> > > > > connection each
> > > > > > > > > > > > > > > > > taking
> > > > > > > > > > > > > > 500KB).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Based on our understanding of broker memory
> > > > > allocation,
> > > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > > payload
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > should be the only thing adding to direct
> > > memory
> > > > > utilization
> > > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > > top of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > baseline), however, we are experiencing
> > > something
> > > > > completely
> > > > > > > > > > > > > > different.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > our last broker crash, we see that broker
> is
> > > > > constantly
> > > > > > > > > > > > > > > > > running
> > > > > > > > > > > > > > with
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > 90%+
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > direct memory allocated, even when message
> > > > payload
> > > > > sum from
> > > > > > > > > > > > > > > > > all the
> > > > > > > > > > > > > > > > queues
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > is only 6-8% (these % are against available
> > DM
> > > of
> > > > > 4gb).
> > > > > > > During
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > these
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > DM usage period, heap usage was around 60%
> > (of
> > > > > 12gb).
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > We would like some help in understanding
> what
> > > > > could be the
> > > > > > > > > > > > > > > > > reason
> > > > > > > > > > > > > > of
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > high DM allocations. Are there things other
> > > than
> > > > > message
> > > > > > > > > > > > > > > > > payload
> > > > > > > > > > > > > > and
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > connection, which use DM and could be
> > > > contributing
> > > > > to these
> > > > > > > > > > > > > > > > > high
> > > > > > > > > > > > > > usage?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Another thing where we are puzzled is the
> > > > > de-allocation of
> > > > > > > DM
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > byte
> > > > > > > > > > > > > > > > buffers.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > From log mining of heap and DM utilization,
> > > > > de-allocation of
> > > > > > > > > > > > > > > > > DM
> > > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > correlate with heap GC. If anyone has seen
> > any
> > > > > documentation
> > > > > > > > > > > > > > related to
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > this, it would be very helpful if you could
> > > share
> > > > > that.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Exceptions*
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > > memory
> > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> > Bits.java:658)
> > > > > > > ~[na:1.8.0_40]
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > > init>(DirectByteBuffer.java:
> > > > > > > 123)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > > allocateDirect(ByteBuffer.
> > > > > java:311)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnectionPlainD
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > elegate.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> > > > > > > NonBlockingConnectionPlainDele
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gate.java:93)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnectionPlainDele
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gate.processData(
> > > NonBlockingConnectionPlainDele
> > > > > > > gate.java:60)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnection.doRead(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnection.doWork(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NetworkConnectionScheduler.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > processConnection(
> > NetworkConnectionScheduler.
> > > > > java:124)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > ConnectionPr
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > ocessor.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > processConnection(SelectorThread.java:504)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > > > > > > > > > > > SelectionTask.performSelect(
> > > > > SelectorThread.java:337)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > SelectionTask.run(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > SelectorThread.java:87)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > *Second exception*
> > > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > > memory
> > > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> > Bits.java:658)
> > > > > > > ~[na:1.8.0_40]
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > > init>(DirectByteBuffer.java:
> > > > > > > 123)
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > > allocateDirect(ByteBuffer.
> > > > > java:311)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnectionPlainDele
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > gate.<init>(NonBlockingConnectionPlainDele
> > > > > gate.java:45)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > NonBlockingConnection.
> > > > > > > > > > > > > > > > > setTransportEncryption(
> > > > NonBlockingConnection.java:
> > > > > 625)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingConnection.<init>(
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > > NonBlockingNetworkTransport.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > acceptSocketChannel(
> > > NonBlockingNetworkTransport.
> > > > > java:158)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > transport.SelectorThread$
> > > > > > > SelectionTas
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > k$1.run(
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > SelectorThread.java:191)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > > java.util.concurrent.
> > > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > ------------------------------
> > ------------------------------
> > > > > ---------
> > > > > > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.
> org
> > > > > > > > > > For additional commands, e-mail:
> > users-help@qpid.apache.org
> > > > > > > > > >
> > > > >
> > > > >
> > > > > ------------------------------------------------------------
> > ---------
> > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > > >
> > > >
> > >
> >
>

Re: Java broker OOM due to DirectMemory

Posted by Ramayan Tiwari <ra...@gmail.com>.
Hi Alex,

Thanks for providing the patch. I verified the fix with same perf test, and
it does prevent broker from going OOM, however. DM utilization doesn't get
any better after hitting the threshold (where flow to disk is activated
based on total used % across broker - graph in the link below).

After hitting the final threshold, flow to disk activates and deactivates
pretty frequently across all the queues. The reason seems to be because
there is only one threshold currently to trigger flow to disk. Would it
make sense to break this down to high and low threshold - so that once flow
to disk is active after hitting high threshold, it will be active until the
queue utilization (or broker DM allocation) reaches the low threshold.

Graph and flow to disk logs are here:
https://docs.google.com/document/d/1Wc1e-id-WlpI7FGU1Lx8XcKaV8sauRp82T5XZVU-RiM/edit#heading=h.6400pltvjhy7

Thanks
Ramayan

On Thu, May 4, 2017 at 2:44 AM, Oleksandr Rudyy <or...@gmail.com> wrote:

> Hi Ramayan,
>
> We attached to the QPID-7753 a patch with a work around for 6.0.x branch.
> It triggers flow to disk based on direct memory consumption rather than
> estimation of the space occupied by the message content. The flow to disk
> should evacuate message content preventing running out of direct memory. We
> already committed the changes into 6.0.x and 6.1.x branches. It will be
> included into upcoming 6.0.7 and 6.1.3 releases.
>
> Please try and test the patch in your environment.
>
> We are still working at finishing of the fix for trunk.
>
> Kind Regards,
> Alex
>
> On 30 April 2017 at 15:45, Lorenz Quack <qu...@gmail.com> wrote:
>
> > Hi Ramayan,
> >
> > The high-level plan is currently as follows:
> >  1) Periodically try to compact sparse direct memory buffers.
> >  2) Increase accuracy of messages' direct memory usage estimation to more
> > reliably trigger flow to disk.
> >  3) Add an additional flow to disk trigger based on the amount of
> allocated
> > direct memory.
> >
> > A little bit more details:
> >  1) We plan on periodically checking the amount of direct memory usage
> and
> > if it is above a
> >     threshold (50%) we compare the sum of all queue sizes with the amount
> > of allocated direct memory.
> >     If the ratio falls below a certain threshold we trigger a compaction
> > task which goes through all queues
> >     and copy's a certain amount of old message buffers into new ones
> > thereby freeing the old buffers so
> >     that they can be returned to the buffer pool and be reused.
> >
> >  2) Currently we trigger flow to disk based on an estimate of how much
> > memory the messages on the
> >     queues consume. We had to use estimates because we did not have
> > accurate size numbers for
> >     message headers. By having accurate size information for message
> > headers we can more reliably
> >     enforce queue memory limits.
> >
> >  3) The flow to disk trigger based on message size had another problem
> > which is more pertinent to the
> >     current issue. We only considered the size of the messages and not
> how
> > much memory we allocate
> >     to store those messages. In the FIFO use case those numbers will be
> > very close to each other but in
> >     use cases like yours we can end up with sparse buffers and the
> numbers
> > will diverge. Because of this
> >     divergence we do not trigger flow to disk in time and the broker can
> go
> > OOM.
> >     To fix the issue we want to add an additional flow to disk trigger
> > based on the amount of allocated direct
> >     memory. This should prevent the broker from going OOM even if the
> > compaction strategy outlined above
> >     should fail for some reason (e.g., the compaction task cannot keep up
> > with the arrival of new messages).
> >
> > Currently, there are patches for the above points but they suffer from
> some
> > thread-safety issues that need to be addressed.
> >
> > I hope this description helps. Any feedback is, as always, welcome.
> >
> > Kind regards,
> > Lorenz
> >
> >
> >
> > On Sat, Apr 29, 2017 at 12:00 AM, Ramayan Tiwari <
> ramayan.tiwari@gmail.com
> > >
> > wrote:
> >
> > > Hi Lorenz,
> > >
> > > Thanks so much for the patch. We have a perf test now to reproduce this
> > > issue, so we did test with 256KB, 64KB and 4KB network byte buffer.
> None
> > of
> > > these configurations help with the issue (or give any more breathing
> > room)
> > > for our use case. We would like to share the perf analysis with the
> > > community:
> > >
> > > https://docs.google.com/document/d/1Wc1e-id-
> > WlpI7FGU1Lx8XcKaV8sauRp82T5XZV
> > > U-RiM/edit?usp=sharing
> > >
> > > Feel free to comment on the doc if certain details are incorrect or if
> > > there are questions.
> > >
> > > Since the short term solution doesn't help us, we are very interested
> in
> > > getting some details on how the community plans to address this, a high
> > > level description of the approach will be very helpful for us in order
> to
> > > brainstorm our use cases along with this solution.
> > >
> > > - Ramayan
> > >
> > > On Fri, Apr 28, 2017 at 9:34 AM, Lorenz Quack <qu...@gmail.com>
> > > wrote:
> > >
> > > > Hello Ramayan,
> > > >
> > > > We are still working on a fix for this issue.
> > > > In the mean time we had an idea to potentially workaround the issue
> > until
> > > > a proper fix is released.
> > > >
> > > > The idea is to decrease the qpid network buffer size the broker uses.
> > > > While this still allows for sparsely populated buffers it would
> improve
> > > > the overall occupancy ratio.
> > > >
> > > > Here are the steps to follow:
> > > >  * ensure you are not using TLS
> > > >  * apply the attached patch
> > > >  * figure out the size of the largest messages you are sending
> > (including
> > > > header and some overhead)
> > > >  * set the context variable "qpid.broker.networkBufferSize" to that
> > > value
> > > > but not smaller than 4096
> > > >  * test
> > > >
> > > > Decreasing the qpid network buffer size automatically limits the
> > maximum
> > > > AMQP frame size.
> > > > Since you are using a very old client we are not sure how well it
> copes
> > > > with small frame sizes where it has to split a message across
> multiple
> > > > frames.
> > > > Therefore, to play it safe you should not set it smaller than the
> > largest
> > > > messages (+ header + overhead) you are sending.
> > > > I do not know what message sizes you are sending but AMQP imposes the
> > > > restriction that the framesize cannot be smaller than 4096 bytes.
> > > > In the qpid broker the default currently is 256 kB.
> > > >
> > > > In the current state the broker does not allow setting the network
> > buffer
> > > > to values smaller than 64 kB to allow TLS frames to fit into one
> > network
> > > > buffer.
> > > > I attached a patch to this mail that lowers that restriction to the
> > limit
> > > > imposed by AMQP (4096 Bytes).
> > > > Obviously, you should not use this when using TLS.
> > > >
> > > >
> > > > I hope this reduces the problems you are currently facing until we
> can
> > > > complete the proper fix.
> > > >
> > > > Kind regards,
> > > > Lorenz
> > > >
> > > >
> > > > On Fri, 2017-04-21 at 09:17 -0700, Ramayan Tiwari wrote:
> > > > > Thanks so much Keith and the team for finding the root cause. We
> are
> > so
> > > > > relieved that we fix the root cause shortly.
> > > > >
> > > > > Couple of things that I forgot to mention on the mitigation steps
> we
> > > took
> > > > > in the last incident:
> > > > > 1) We triggered GC from JMX bean multiple times, it did not help in
> > > > > reducing DM allocated.
> > > > > 2) We also killed all the AMQP connections to the broker when DM
> was
> > at
> > > > > 80%. This did not help either. The way we killed connections -
> using
> > > JMX
> > > > > got list of all the open AMQP connections and called close from JMX
> > > > mbean.
> > > > >
> > > > > I am hoping the above two are not related to root cause, but wanted
> > to
> > > > > bring it up in case this is relevant.
> > > > >
> > > > > Thanks
> > > > > Ramayan
> > > > >
> > > > > On Fri, Apr 21, 2017 at 8:29 AM, Keith W <ke...@gmail.com>
> > wrote:
> > > > >
> > > > > >
> > > > > > Hello Ramayan
> > > > > >
> > > > > > I believe I understand the root cause of the problem.  We have
> > > > > > identified a flaw in the direct memory buffer management employed
> > by
> > > > > > Qpid Broker J which for some messaging use-cases can lead to the
> > OOM
> > > > > > direct you describe.   For the issue to manifest the producing
> > > > > > application needs to use a single connection for the production
> of
> > > > > > messages some of which are short-lived (i.e. are consumed
> quickly)
> > > > > > whilst others remain on the queue for some time.  Priority
> queues,
> > > > > > sorted queues and consumers utilising selectors that result in
> some
> > > > > > messages being left of the queue could all produce this patten.
> > The
> > > > > > pattern leads to a sparsely occupied 256K net buffers which
> cannot
> > be
> > > > > > released or reused until every message that reference a 'chunk'
> of
> > it
> > > > > > is either consumed or flown to disk.   The problem was introduced
> > > with
> > > > > > Qpid v6.0 and exists in v6.1 and trunk too.
> > > > > >
> > > > > > The flow to disk feature is not helping us here because its
> > algorithm
> > > > > > considers only the size of live messages on the queues. If the
> > > > > > accumulative live size does not exceed the threshold, the
> messages
> > > > > > aren't flown to disk. I speculate that when you observed that
> > moving
> > > > > > messages cause direct message usage to drop earlier today, your
> > > > > > message movement cause a queue to go over threshold, cause
> message
> > to
> > > > > > be flown to disk and their direct memory references released.
> The
> > > > > > logs will confirm this is so.
> > > > > >
> > > > > > I have not identified an easy workaround at the moment.
> >  Decreasing
> > > > > > the flow to disk threshold and/or increasing available direct
> > memory
> > > > > > should alleviate and may be an acceptable short term workaround.
> > If
> > > > > > it were possible for publishing application to publish short
> lived
> > > and
> > > > > > long lived messages on two separate JMS connections this would
> > avoid
> > > > > > this defect.
> > > > > >
> > > > > > QPID-7753 tracks this issue and QPID-7754 is a related this
> > problem.
> > > > > > We intend to be working on these early next week and will be
> aiming
> > > > > > for a fix that is back-portable to 6.0.
> > > > > >
> > > > > > Apologies that you have run into this defect and thanks for
> > > reporting.
> > > > > >
> > > > > > Thanks, Keith
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On 21 April 2017 at 10:21, Ramayan Tiwari <
> > ramayan.tiwari@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > We have been monitoring the brokers everyday and today we found
> > one
> > > > > > instance
> > > > > > >
> > > > > > > where broker’s DM was constantly going up and was about to
> crash,
> > > so
> > > > we
> > > > > > > experimented some mitigations, one of which caused the DM to
> come
> > > > down.
> > > > > > > Following are the details, which might help us understanding
> the
> > > > issue:
> > > > > > >
> > > > > > > Traffic scenario:
> > > > > > >
> > > > > > > DM allocation had been constantly going up and was at 90%.
> There
> > > > were two
> > > > > > > queues which seemed to align with the theories that we had.
> Q1’s
> > > > size had
> > > > > > > been large right after the broker start and had slow
> consumption
> > of
> > > > > > > messages, queue size only reduced from 76MB to 75MB over a
> period
> > > of
> > > > > > 6hrs.
> > > > > > >
> > > > > > > Q2 on the other hand, started small and was gradually growing,
> > > queue
> > > > size
> > > > > > > went from 7MB to 10MB in 6hrs. There were other queues with
> > traffic
> > > > > > during
> > > > > > >
> > > > > > > this time.
> > > > > > >
> > > > > > > Action taken:
> > > > > > >
> > > > > > > Moved all the messages from Q2 (since this was our original
> > theory)
> > > > to Q3
> > > > > > > (already created but no messages in it). This did not help with
> > the
> > > > DM
> > > > > > > growing up.
> > > > > > > Moved all the messages from Q1 to Q4 (already created but no
> > > > messages in
> > > > > > > it). This reduced DM allocation from 93% to 31%.
> > > > > > >
> > > > > > > We have the heap dump and thread dump from when broker was 90%
> in
> > > DM
> > > > > > > allocation. We are going to analyze that to see if we can get
> > some
> > > > clue.
> > > > > > We
> > > > > > >
> > > > > > > wanted to share this new information which might help in
> > reasoning
> > > > about
> > > > > > the
> > > > > > >
> > > > > > > memory issue.
> > > > > > >
> > > > > > > - Ramayan
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Apr 20, 2017 at 11:20 AM, Ramayan Tiwari <
> > > > > > ramayan.tiwari@gmail.com>
> > > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > Hi Keith,
> > > > > > > >
> > > > > > > > Thanks so much for your response and digging into the issue.
> > > Below
> > > > are
> > > > > > the
> > > > > > >
> > > > > > > >
> > > > > > > > answer to your questions:
> > > > > > > >
> > > > > > > > 1) Yeah we are using QPID-7462 with 6.0.5. We couldn't use
> 6.1
> > > > where it
> > > > > > > > was released because we need JMX support. Here is the
> > destination
> > > > > > format:
> > > > > > >
> > > > > > > >
> > > > > > > > ""%s ; {node : { type : queue }, link : { x-subscribes : {
> > > > arguments : {
> > > > > > > > x-multiqueue : [%s], x-pull-only : true }}}}";"
> > > > > > > >
> > > > > > > > 2) Our machines have 40 cores, which will make the number of
> > > > threads to
> > > > > > > > 80. This might not be an issue, because this will show up in
> > the
> > > > > > baseline DM
> > > > > > >
> > > > > > > >
> > > > > > > > allocated, which is only 6% (of 4GB) when we just bring up
> the
> > > > broker.
> > > > > > > >
> > > > > > > > 3) The only setting that we tuned WRT to DM is
> > > flowToDiskThreshold,
> > > > > > which
> > > > > > >
> > > > > > > >
> > > > > > > > is set at 80% now.
> > > > > > > >
> > > > > > > > 4) Only one virtual host in the broker.
> > > > > > > >
> > > > > > > > 5) Most of our queues (99%) are priority, we also have 8-10
> > > sorted
> > > > > > queues.
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > 6) Yeah we are using the standard 0.16 client and not AMQP
> 1.0
> > > > clients.
> > > > > > > > The connection log line looks like:
> > > > > > > > CON-1001 : Open : Destination : AMQP(IP:5672) : Protocol
> > Version
> > > :
> > > > 0-10
> > > > > > :
> > > > > > >
> > > > > > > >
> > > > > > > > Client ID : test : Client Version : 0.16 : Client Product :
> > qpid
> > > > > > > >
> > > > > > > > We had another broker crashed about an hour back, we do see
> the
> > > > same
> > > > > > > > patterns:
> > > > > > > > 1) There is a queue which is constantly growing, enqueue is
> > > faster
> > > > than
> > > > > > > > dequeue on that queue for a long period of time.
> > > > > > > > 2) Flow to disk didn't kick in at all.
> > > > > > > >
> > > > > > > > This graph shows memory growth (red line - heap, blue - DM
> > > > allocated,
> > > > > > > > yellow - DM used)
> > > > > > > >
> > > > > > > > https://drive.google.com/file/d/
> 0Bwi0MEV3srPRdVhXdTBncHJLY2c/
> > > > > > view?usp=sharing
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > The below graph shows growth on a single queue (there are
> 10-12
> > > > other
> > > > > > > > queues with traffic as well, something large size than this
> > > queue):
> > > > > > > >
> > > > > > > > https://drive.google.com/file/d/
> 0Bwi0MEV3srPRWmNGbDNGUkJhQ0U/
> > > > > > view?usp=sharing
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Couple of questions:
> > > > > > > > 1) Is there any developer level doc/design spec on how Qpid
> > uses
> > > > DM?
> > > > > > > > 2) We are not getting heap dumps automatically when broker
> > > crashes
> > > > due
> > > > > > to
> > > > > > >
> > > > > > > >
> > > > > > > > DM (HeapDumpOnOutOfMemoryError not respected). Has anyone
> > found a
> > > > way
> > > > > > to get
> > > > > > >
> > > > > > > >
> > > > > > > > around this problem?
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Ramayan
> > > > > > > >
> > > > > > > > On Thu, Apr 20, 2017 at 9:08 AM, Keith W <
> keith.wall@gmail.com
> > >
> > > > wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Hi Ramayan
> > > > > > > > >
> > > > > > > > > We have been discussing your problem here and have a couple
> > of
> > > > > > questions.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I have been experimenting with use-cases based on your
> > > > descriptions
> > > > > > > > > above, but so far, have been unsuccessful in reproducing a
> > > > > > > > > "java.lang.OutOfMemoryError: Direct buffer memory"
> > condition.
> > > > The
> > > > > > > > > direct memory usage reflects the expected model: it levels
> > off
> > > > when
> > > > > > > > > the flow to disk threshold is reached and direct memory is
> > > > release as
> > > > > > > > > messages are consumed until the minimum size for caching of
> > > > direct is
> > > > > > > > > reached.
> > > > > > > > >
> > > > > > > > > 1] For clarity let me check: we believe when you say "patch
> > to
> > > > use
> > > > > > > > > MultiQueueConsumer" you are referring to the patch attached
> > to
> > > > > > > > > QPID-7462 "Add experimental "pull" consumers to the broker"
> > > and
> > > > you
> > > > > > > > > are using a combination of this "x-pull-only"  with the
> > > standard
> > > > > > > > > "x-multiqueue" feature.  Is this correct?
> > > > > > > > >
> > > > > > > > > 2] One idea we had here relates to the size of the
> > virtualhost
> > > IO
> > > > > > > > > pool.   As you know from the documentation, the Broker
> > > > caches/reuses
> > > > > > > > > direct memory internally but the documentation fails to
> > > mentions
> > > > that
> > > > > > > > > each pooled virtualhost IO thread also grabs a chunk (256K)
> > of
> > > > direct
> > > > > > > > > memory from this cache.  By default the virtual host IO
> pool
> > is
> > > > sized
> > > > > > > > > Math.max(Runtime.getRuntime().availableProcessors() * 2,
> > 64),
> > > > so if
> > > > > > > > > you have a machine with a very large number of cores, you
> may
> > > > have a
> > > > > > > > > surprising large amount of direct memory assigned to
> > > virtualhost
> > > > IO
> > > > > > > > > threads.   Check the value of connectionThreadPoolSize on
> the
> > > > > > > > > virtualhost
> > > > > > > > > (http://<server>:<port>/api/latest/virtualhost/<
> > > > virtualhostnodename>/<;
> > > > > > virtualhostname>)
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > to see what value is in force.  What is it?  It is possible
> > to
> > > > tune
> > > > > > > > > the pool size using context variable
> > > > > > > > > virtualhost.connectionThreadPool.size.
> > > > > > > > >
> > > > > > > > > 3] Tell me if you are tuning the Broker in way beyond the
> > > > direct/heap
> > > > > > > > > memory settings you have told us about already.  For
> instance
> > > > you are
> > > > > > > > > changing any of the direct memory pooling settings
> > > > > > > > > broker.directByteBufferPoolSize, default network buffer
> size
> > > > > > > > > qpid.broker.networkBufferSize or applying any other
> > > non-standard
> > > > > > > > > settings?
> > > > > > > > >
> > > > > > > > > 4] How many virtual hosts do you have on the Broker?
> > > > > > > > >
> > > > > > > > > 5] What is the consumption pattern of the messages?  Do
> > consume
> > > > in a
> > > > > > > > > strictly FIFO fashion or are you making use of message
> > > selectors
> > > > > > > > > or/and any of the out-of-order queue types (LVQs, priority
> > > queue
> > > > or
> > > > > > > > > sorted queues)?
> > > > > > > > >
> > > > > > > > > 6] Is it just the 0.16 client involved in the application?
> > >  Can
> > > > I
> > > > > > > > > check that you are not using any of the AMQP 1.0 clients
> > > > > > > > > (org,apache.qpid:qpid-jms-client or
> > > > > > > > > org.apache.qpid:qpid-amqp-1-0-client) in the software
> stack
> > > (as
> > > > either
> > > > > > > > > consumers or producers)
> > > > > > > > >
> > > > > > > > > Hopefully the answers to these questions will get us closer
> > to
> > > a
> > > > > > > > > reproduction.   If you are able to reliable reproduce it,
> > > please
> > > > share
> > > > > > > > > the steps with us.
> > > > > > > > >
> > > > > > > > > Kind regards, Keith.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On 20 April 2017 at 10:21, Ramayan Tiwari <
> > > > ramayan.tiwari@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > After a lot of log mining, we might have a way to explain
> > the
> > > > > > sustained
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > increased in DirectMemory allocation, the correlation
> seems
> > > to
> > > > be
> > > > > > with
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > the
> > > > > > > > > > growth in the size of a Queue that is getting consumed
> but
> > at
> > > > a much
> > > > > > > > > > slower
> > > > > > > > > > rate than producers putting messages on this queue.
> > > > > > > > > >
> > > > > > > > > > The pattern we see is that in each instance of broker
> > crash,
> > > > there is
> > > > > > > > > > at
> > > > > > > > > > least one queue (usually 1 queue) whose size kept growing
> > > > steadily.
> > > > > > > > > > It’d be
> > > > > > > > > > of significant size but not the largest queue -- usually
> > > there
> > > > are
> > > > > > > > > > multiple
> > > > > > > > > > larger queues -- but it was different from other queues
> in
> > > > that its
> > > > > > > > > > size
> > > > > > > > > > was growing steadily. The queue would also be moving, but
> > its
> > > > > > > > > > processing
> > > > > > > > > > rate was not keeping up with the enqueue rate.
> > > > > > > > > >
> > > > > > > > > > Our theory that might be totally wrong: If a queue is
> > moving
> > > > the
> > > > > > entire
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > time, maybe then the broker would keep reusing the same
> > > buffer
> > > > in
> > > > > > > > > > direct
> > > > > > > > > > memory for the queue, and keep on adding onto it at the
> end
> > > to
> > > > > > > > > > accommodate
> > > > > > > > > > new messages. But because it’s active all the time and
> > we’re
> > > > pointing
> > > > > > > > > > to
> > > > > > > > > > the same buffer, space allocated for messages at the head
> > of
> > > > the
> > > > > > > > > > queue/buffer doesn’t get reclaimed, even long after those
> > > > messages
> > > > > > have
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > been processed. Just a theory.
> > > > > > > > > >
> > > > > > > > > > We are also trying to reproduce this using some perf
> tests
> > to
> > > > enqueue
> > > > > > > > > > with
> > > > > > > > > > same pattern, will update with the findings.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Ramayan
> > > > > > > > > >
> > > > > > > > > > On Wed, Apr 19, 2017 at 6:52 PM, Ramayan Tiwari
> > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Another issue that we noticed is when broker goes OOM
> due
> > > to
> > > > direct
> > > > > > > > > > > memory, it doesn't create heap dump (specified by
> "-XX:+
> > > > > > > > > > > HeapDumpOnOutOfMemoryError"), even when the OOM error
> is
> > > > same as
> > > > > > what
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > is
> > > > > > > > > > > mentioned in the oracle JVM docs
> > > > ("java.lang.OutOfMemoryError").
> > > > > > > > > > >
> > > > > > > > > > > Has anyone been able to find a way to get to heap dump
> > for
> > > > DM OOM?
> > > > > > > > > > >
> > > > > > > > > > > - Ramayan
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Apr 19, 2017 at 11:21 AM, Ramayan Tiwari
> > > > > > > > > > > <ramayan.tiwari@gmail.com
> > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Alex,
> > > > > > > > > > > >
> > > > > > > > > > > > Below are the flow to disk logs from broker having
> > > > 3million+
> > > > > > messages
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > at
> > > > > > > > > > > > this time. We only have one virtual host. Time is in
> > GMT.
> > > > Looks
> > > > > > like
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > flow
> > > > > > > > > > > > to disk is active on the whole virtual host and not a
> > > > queue level.
> > > > > > > > > > > >
> > > > > > > > > > > > When the same broker went OOM yesterday, I did not
> see
> > > any
> > > > flow to
> > > > > > > > > > > > disk
> > > > > > > > > > > > logs from when it was started until it crashed
> (crashed
> > > > twice
> > > > > > within
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 4hrs).
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > 4/19/17 4:17:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3356539KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:31:13.502 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > memory
> > > > use
> > > > > > > > > > > > 3354866KB
> > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:28:43.511 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3358509KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:20:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > memory
> > > > use
> > > > > > > > > > > > 3353501KB
> > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:18:13.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3357544KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:08:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > memory
> > > > use
> > > > > > > > > > > > 3353236KB
> > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:08:13.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3356704KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:00:43.500 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > memory
> > > > use
> > > > > > > > > > > > 3353511KB
> > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > 4/19/17 2:00:13.504 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3357948KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > 4/19/17 1:50:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > memory
> > > > use
> > > > > > > > > > > > 3355310KB
> > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > 4/19/17 1:47:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3365624KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > > 4/19/17 1:43:43.501 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > memory
> > > > use
> > > > > > > > > > > > 3355136KB
> > > > > > > > > > > > within threshold 3355443KB
> > > > > > > > > > > > 4/19/17 1:31:43.509 AM INFO  [Housekeeping[test]] -
> > > > > > > > > > > > [Housekeeping[test]]
> > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > memory
> > > > use
> > > > > > > > > > > > 3358683KB
> > > > > > > > > > > > exceeds threshold 3355443KB
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > After production release (2days back), we have seen 4
> > > > crashes in 3
> > > > > > > > > > > > different brokers, this is the most pressing concern
> > for
> > > > us in
> > > > > > > > > > > > decision if
> > > > > > > > > > > > we should roll back to 0.32. Any help is greatly
> > > > appreciated.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > Ramayan
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Apr 19, 2017 at 9:36 AM, Oleksandr Rudyy <
> > > > orudyy@gmail.com
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Ramayan,
> > > > > > > > > > > > > Thanks for the details. I would like to clarify
> > whether
> > > > flow to
> > > > > > disk
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > was
> > > > > > > > > > > > > triggered today for 3 million messages?
> > > > > > > > > > > > >
> > > > > > > > > > > > > The following logs are issued for flow to disk:
> > > > > > > > > > > > > BRK-1014 : Message flow to disk active :  Message
> > > memory
> > > > use
> > > > > > > > > > > > > {0,number,#}KB
> > > > > > > > > > > > > exceeds threshold {1,number,#.##}KB
> > > > > > > > > > > > > BRK-1015 : Message flow to disk inactive : Message
> > > > memory use
> > > > > > > > > > > > > {0,number,#}KB within threshold {1,number,#.##}KB
> > > > > > > > > > > > >
> > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > Alex
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > On 19 April 2017 at 17:10, Ramayan Tiwari <
> > > > > > ramayan.tiwari@gmail.com>
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Alex,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks for your response, here are the details:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > We use "direct" exchange, without persistence (we
> > > > specify
> > > > > > > > > > > > > NON_PERSISTENT
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > that while sending from client) and use BDB
> store.
> > We
> > > > use JSON
> > > > > > > > > > > > > > virtual
> > > > > > > > > > > > > host
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > type. We are not using SSL.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > When the broker went OOM, we had around 1.3
> million
> > > > messages
> > > > > > with
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 100
> > > > > > > > > > > > > bytes
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > average message size. Direct memory allocation
> > (value
> > > > read from
> > > > > > > > > > > > > > MBean)
> > > > > > > > > > > > > kept
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > going up, even though it wouldn't need more DM to
> > > > store these
> > > > > > many
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > messages. DM allocated persisted at 99% for
> about 3
> > > > and half
> > > > > > hours
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > before
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > crashing.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Today, on the same broker we have 3 million
> > messages
> > > > (same
> > > > > > message
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > size)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > and DM allocated is only at 8%. This seems like
> > there
> > > > is some
> > > > > > > > > > > > > > issue
> > > > > > > > > > > > > with
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > de-allocation or a leak.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I have uploaded the memory utilization graph
> here:
> > > > > > > > > > > > > > https://drive.google.com/file/d/
> > > > 0Bwi0MEV3srPRVHFEbDlIYUpLaUE/
> > > > > > > > > > > > > > view?usp=sharing
> > > > > > > > > > > > > > Blue line is DM allocated, Yellow is DM Used (sum
> > of
> > > > queue
> > > > > > > > > > > > > > payload)
> > > > > > > > > > > > > and Red
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > is heap usage.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Wed, Apr 19, 2017 at 4:10 AM, Oleksandr Rudyy
> > > > > > > > > > > > > > <or...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Hi Ramayan,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Could please share with us the details of
> > messaging
> > > > use
> > > > > > case(s)
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > which
> > > > > > > > > > > > > > ended
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > up in OOM on broker side?
> > > > > > > > > > > > > > > I would like to reproduce the issue on my local
> > > > broker in
> > > > > > order
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > to
> > > > > > > > > > > > > fix
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > it.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > I would appreciate if you could provide as much
> > > > details as
> > > > > > > > > > > > > > > possible,
> > > > > > > > > > > > > > > including, messaging topology, message
> > persistence
> > > > type,
> > > > > > message
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > sizes,volumes, etc.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Qpid Broker 6.0.x uses direct memory for
> keeping
> > > > message
> > > > > > content
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > and
> > > > > > > > > > > > > > > receiving/sending data. Each plain connection
> > > > utilizes 512K of
> > > > > > > > > > > > > > > direct
> > > > > > > > > > > > > > > memory. Each SSL connection uses 1M of direct
> > > > memory. Your
> > > > > > > > > > > > > > > memory
> > > > > > > > > > > > > > settings
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > look Ok to me.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Kind Regards,
> > > > > > > > > > > > > > > Alex
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On 18 April 2017 at 23:39, Ramayan Tiwari
> > > > > > > > > > > > > > > <ra...@gmail.com>
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Hi All,
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We are using Java broker 6.0.5, with patch to
> > use
> > > > > > > > > > > > > MultiQueueConsumer
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > feature. We just finished deploying to
> > production
> > > > and saw
> > > > > > > > > > > > > > > > couple of
> > > > > > > > > > > > > > > > instances of broker OOM due to running out of
> > > > DirectMemory
> > > > > > > > > > > > > > > > buffer
> > > > > > > > > > > > > > > > (exceptions at the end of this email).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Here is our setup:
> > > > > > > > > > > > > > > > 1. Max heap 12g, max direct memory 4g (this
> is
> > > > opposite of
> > > > > > > > > > > > > > > > what the
> > > > > > > > > > > > > > > > recommendation is, however, for our use cause
> > > > message
> > > > > > payload
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > is
> > > > > > > > > > > > > really
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > small ~400bytes and is way less than the per
> > > > message
> > > > > > overhead
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > of
> > > > > > > > > > > > > 1KB).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > In
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > perf testing, we were able to put 2 million
> > > > messages without
> > > > > > > > > > > > > > > > any
> > > > > > > > > > > > > > issues.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > 2. ~400 connections to broker.
> > > > > > > > > > > > > > > > 3. Each connection has 20 sessions and there
> is
> > > > one multi
> > > > > > > > > > > > > > > > queue
> > > > > > > > > > > > > > consumer
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > attached to each session, listening to around
> > > 1000
> > > > queues.
> > > > > > > > > > > > > > > > 4. We are still using 0.16 client (I know).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > With the above setup, the baseline
> utilization
> > > > (without any
> > > > > > > > > > > > > messages)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > for
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > direct memory was around 230mb (with 410
> > > > connection each
> > > > > > > > > > > > > > > > taking
> > > > > > > > > > > > > 500KB).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Based on our understanding of broker memory
> > > > allocation,
> > > > > > > > > > > > > > > > message
> > > > > > > > > > > > > payload
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > should be the only thing adding to direct
> > memory
> > > > utilization
> > > > > > > > > > > > > > > > (on
> > > > > > > > > > > > > top of
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > baseline), however, we are experiencing
> > something
> > > > completely
> > > > > > > > > > > > > different.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > In
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > our last broker crash, we see that broker is
> > > > constantly
> > > > > > > > > > > > > > > > running
> > > > > > > > > > > > > with
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > 90%+
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > direct memory allocated, even when message
> > > payload
> > > > sum from
> > > > > > > > > > > > > > > > all the
> > > > > > > > > > > > > > > queues
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > is only 6-8% (these % are against available
> DM
> > of
> > > > 4gb).
> > > > > > During
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > these
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > high
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > DM usage period, heap usage was around 60%
> (of
> > > > 12gb).
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > We would like some help in understanding what
> > > > could be the
> > > > > > > > > > > > > > > > reason
> > > > > > > > > > > > > of
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > these
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > high DM allocations. Are there things other
> > than
> > > > message
> > > > > > > > > > > > > > > > payload
> > > > > > > > > > > > > and
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > AMQP
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > connection, which use DM and could be
> > > contributing
> > > > to these
> > > > > > > > > > > > > > > > high
> > > > > > > > > > > > > usage?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Another thing where we are puzzled is the
> > > > de-allocation of
> > > > > > DM
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > byte
> > > > > > > > > > > > > > > buffers.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > From log mining of heap and DM utilization,
> > > > de-allocation of
> > > > > > > > > > > > > > > > DM
> > > > > > > > > > > > > doesn't
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > correlate with heap GC. If anyone has seen
> any
> > > > documentation
> > > > > > > > > > > > > related to
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > this, it would be very helpful if you could
> > share
> > > > that.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > > > > Ramayan
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > *Exceptions*
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > memory
> > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> Bits.java:658)
> > > > > > ~[na:1.8.0_40]
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > init>(DirectByteBuffer.java:
> > > > > > 123)
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > allocateDirect(ByteBuffer.
> > > > java:311)
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnectionPlainD
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > elegate.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > restoreApplicationBufferForWrite(
> > > > > > NonBlockingConnectionPlainDele
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > gate.java:93)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnectionPlainDele
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > gate.processData(
> > NonBlockingConnectionPlainDele
> > > > > > gate.java:60)
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnection.doRead(
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > NonBlockingConnection.java:506)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnection.doWork(
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > NonBlockingConnection.java:285)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NetworkConnectionScheduler.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > processConnection(
> NetworkConnectionScheduler.
> > > > java:124)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.
> > transport.SelectorThread$
> > > > > > ConnectionPr
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > ocessor.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > processConnection(SelectorThread.java:504)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.
> > transport.SelectorThread$
> > > > > > > > > > > > > > > > SelectionTask.performSelect(
> > > > SelectorThread.java:337)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > org.apache.qpid.server.
> > transport.SelectorThread$
> > > > > > SelectionTask.run(
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > SelectorThread.java:87)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > java.util.concurrent.
> > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > java.util.concurrent.
> > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > *Second exception*
> > > > > > > > > > > > > > > > java.lang.OutOfMemoryError: Direct buffer
> > memory
> > > > > > > > > > > > > > > > at java.nio.Bits.reserveMemory(
> Bits.java:658)
> > > > > > ~[na:1.8.0_40]
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > at java.nio.DirectByteBuffer.<
> > > > init>(DirectByteBuffer.java:
> > > > > > 123)
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > at java.nio.ByteBuffer.
> > > allocateDirect(ByteBuffer.
> > > > java:311)
> > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.bytebuffer.
> > > > QpidByteBuffer.allocateDirect(
> > > > > > > > > > > > > > > > QpidByteBuffer.java:474)
> > > > > > > > > > > > > > > > ~[qpid-common-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnectionPlainDele
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > gate.<init>(NonBlockingConnectionPlainDele
> > > > gate.java:45)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > NonBlockingConnection.
> > > > > > > > > > > > > > > > setTransportEncryption(
> > > NonBlockingConnection.java:
> > > > 625)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingConnection.<init>(
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > NonBlockingConnection.java:117)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.transport.
> > > > > > NonBlockingNetworkTransport.
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > acceptSocketChannel(
> > NonBlockingNetworkTransport.
> > > > java:158)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.
> > transport.SelectorThread$
> > > > > > SelectionTas
> > > > > > >
> > > > > > > >
> > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > k$1.run(
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > SelectorThread.java:191)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > org.apache.qpid.server.
> > > > transport.SelectorThread.run(
> > > > > > > > > > > > > > > > SelectorThread.java:462)
> > > > > > > > > > > > > > > > ~[qpid-broker-core-6.0.5.jar:6.0.5]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > java.util.concurrent.
> > > ThreadPoolExecutor.runWorker(
> > > > > > > > > > > > > > > > ThreadPoolExecutor.java:1142)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > at
> > > > > > > > > > > > > > > > java.util.concurrent.
> > > > ThreadPoolExecutor$Worker.run(
> > > > > > > > > > > > > > > > ThreadPoolExecutor.java:617)
> > > > > > > > > > > > > > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > > at java.lang.Thread.run(Thread.java:745)
> > > > ~[na:1.8.0_40]
> > > > > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > ------------------------------
> ------------------------------
> > > > ---------
> > > > > > > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > > > > > > For additional commands, e-mail:
> users-help@qpid.apache.org
> > > > > > > > >
> > > >
> > > >
> > > > ------------------------------------------------------------
> ---------
> > > > To unsubscribe, e-mail: users-unsubscribe@qpid.apache.org
> > > > For additional commands, e-mail: users-help@qpid.apache.org
> > > >
> > >
> >
>