You are viewing a plain text version of this content. The canonical link for it is here.
Posted to log4j-user@logging.apache.org by Leon Finker <le...@gmail.com> on 2021/03/14 05:32:48 UTC

Deadlock observed

HI,

Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
Using the following JRE args:
-DAsyncLogger.RingBufferSize=32768
-DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
-Dlog4j.configurationFile=...

The disruptor queue has filled up. And we've observed deadlock with
the interesting threads stuck like the following. Disruptor processing
thread seems to be waiting to process events and all the other threads
locked up on trying to publish the events.

"Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon prio=5
os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
nid=0x611a runnable  [0x00007f4bfdcfc000]
   java.lang.Thread.State: RUNNABLE
        at
com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
        at
com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
        at
com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)

   Locked ownable synchronizers:
        - None


"xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
        at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
/LockSupport.java:357)
        at
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
        at
com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
        at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
        at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
        at
org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
        - locked <0x0000000701017288> (a java.lang.Object)
        at
org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
        at
org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
        at
org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
        at
org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
        at
org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
        at
org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
        at
org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
        at
org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
        at
org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
        at
org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
        at
org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
        at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
        ...
        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)

   Locked ownable synchronizers:
        - None

and all other threads are stuck like this:
"xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
[0x00007f49f6ad2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at
org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
        - waiting to lock <0x0000000701017288> (a java.lang.Object)
        at
org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
        at
org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
        at
org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
        at
org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
        at
org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
        at
org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
        at
org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
        at
org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
        at
org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
        at
org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
        at
org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
        at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
...

Does this ring any bells?

Thank you in advance!

Re: Deadlock observed

Posted by Leon Finker <le...@gmail.com>.
Yes, looks the same. In our case it's also under JRE 11 and G1 and
string deduplication enabled (I doubt it matters :) but I would
mention it. And on running on virtual machines.

On Mon, Mar 15, 2021 at 8:50 AM Carter Kozak <ck...@ckozak.net> wrote:
>
> This sounds a lot like an issue I discussed with the lmax-disruptor folks here: https://github.com/LMAX-Exchange/disruptor/issues/307
> I ended up switching my services to log synchronously rather than enqueue when the buffer is full, which
> can produce events out of order, however we log into a structured pipeline that indexes based on event timestamp. The funny thing is that I haven't seen any occurrences of this behavior since switching the queue-full policy to synchronous rather than enqueue, so I'm inclined to believe there's a disruptor bug somewhere, but I could be wrong.
>
> On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > Yes unfortunately, all logging stopped and nothing was seen in the log
> > file for about 2 hrs till service was restarted. But there is nothing
> > pointing to a deadlock in the threads dump we git so far :( However,
> > the threads dump captured was part of our standard kill -3 if the
> > process isn't exiting normally. So it may not show the correct picture
> > during shutdown.
> >
> > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <ra...@dslextreme.com> wrote:
> > >
> > > One other thing. My expectation is that once the buffer is full you should continue to log but at the rate your underlying system can handle. This wouldn’t be a deadlock but if the rate at which you are logging is higher than the system can write the result will be that the application slows down.
> > >
> > > If you are truly seeing a deadlock then it is possible there is a problem in the disruptor but I have very little experience with that and would need Remko to weigh in.
> > >
> > > Ralph
> > >
> > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <ra...@dslextreme.com> wrote:
> > > >
> > > > Yes, it rings bells and really should have a slot on our FAQ page (although the question really isn’t asked all that frequently).
> > > >
> > > > First it means that you are logging faster than your system can handle. The proper response is to figure out why you are logging so much and try to reduce it or use an alternative destination that is faster. For example, if you are logging to a file you may need to improve the hardware or increase the buffer size.
> > > >
> > > > As you observed your buffer has filled up. The default behavior is for threads to wait until there is an empty slot.
> > > >
> > > > Remko is the real expert on this but I will take a stab at answering your question. The threads below show that they are in handleRingBufferFull. That method has 3 options to choose from based on how you have configured the asyncQueueFullPolicy. You basically have 3 choices:
> > > > Wait for an open slot (the default)
> > > > Resort to logging synchronously.
> > > > Discard log events.
> > > >
> > > > You should look at https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html <https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>. Note that option 2 above (synchronous) is only used by Log4j in certain circumstances and there isn’t an option to specify that, so the only real options are the default behavior, discarding, or create a custom AsyncQueueFullPolicy.
> > > >
> > > > Ralph
> > > >
> > > >
> > > >
> > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com> wrote:
> > > >>
> > > >> HI,
> > > >>
> > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
> > > >> Using the following JRE args:
> > > >> -DAsyncLogger.RingBufferSize=32768
> > > >> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > >> -Dlog4j.configurationFile=...
> > > >>
> > > >> The disruptor queue has filled up. And we've observed deadlock with
> > > >> the interesting threads stuck like the following. Disruptor processing
> > > >> thread seems to be waiting to process events and all the other threads
> > > >> locked up on trying to publish the events.
> > > >>
> > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon prio=5
> > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
> > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > >>  java.lang.Thread.State: RUNNABLE
> > > >>       at
> > > >> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > >>       at
> > > >> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > >>       at
> > > >> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > >>
> > > >>  Locked ownable synchronizers:
> > > >>       - None
> > > >>
> > > >>
> > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
> > > >>       at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > >> /LockSupport.java:357)
> > > >>       at
> > > >> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > >>       at
> > > >> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > >>       at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > >>       at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > >>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
> > > >>       ...
> > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > >>
> > > >>  Locked ownable synchronizers:
> > > >>       - None
> > > >>
> > > >> and all other threads are stuck like this:
> > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
> > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > > >> [0x00007f49f6ad2000]
> > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > >>       - waiting to lock <0x0000000701017288> (a java.lang.Object)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > >>       at
> > > >> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > >>       at
> > > >> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > >>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
> > > >> ...
> > > >>
> > > >> Does this ring any bells?
> > > >>
> > > >> Thank you in advance!
> > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > For additional commands, e-mail: log4j-user-help@logging.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Deadlock observed

Posted by Leon Finker <le...@gmail.com>.
No, unfortunately I only have thread dump at jvm  shutdown :(

On Mon, Mar 15, 2021, 6:29 PM Remko Popma <re...@gmail.com> wrote:

>   Leon, can you definitely rule out the possibility that there are any
> objects being logged which themselves call Log4j from their toString
> method?
>
> On Tue, Mar 16, 2021 at 7:28 AM Leon Finker <le...@gmail.com> wrote:
>
> > We are using default TimeoutBlockingWaitStrategy. I doubt it will
> > reproduce again. We have been using 1.13.1 since October 2020 and this
> > never happened till now (been using log4j2 since the beginning on
> > various versions). We run about 200 of these specific service
> > instances and never observed it before. But it would require the async
> > queue to be filled up, which almost never happens. The only other time
> > I've seen a somewhat similar deadlock with the disruptor consumer
> > thread was https://issues.apache.org/jira/browse/LOG4J2-1518, but that
> > was a clear issue with toString logging as well on the producer side.
> >
> > What I don't understand is that default async queue full policy is
> > supposed to be synchronous. So not sure why it would block on full
> > queue. Also, I see a bit more changes in the default one since my
> > 13.1.1 version (but doesn't look like it would help):
> >
> >
> https://github.com/apache/logging-log4j2/blame/master/log4j-core/src/main/java/org/apache/logging/log4j/core/async/DefaultAsyncQueueFullPolicy.java
> > https://issues.apache.org/jira/browse/LOG4J2-2895
> >
> > I don't think my thread is either the background thread or
> > Log4J2Thread...So I can't understand the blocking on full queue in my
> > case.
> >
> > On Mon, Mar 15, 2021 at 1:39 PM Anthony Maire <ma...@gmail.com>
> > wrote:
> > >
> > > My bad, I read things a little too quickly and I havent seen the
> > separation
> > > between applicative threads thread and background logging thread in the
> > > original message. There are 2 issues there :
> > > - the one from Leon : the application thread cannot make progress
> because
> > > the disruptor buffer is full, but the background locking thread is
> doing
> > > nothing, waiting to be signaled from one producers. Can you tell what
> > > waitStrategy you are using for disruptor ? If it is the blocking wait
> > > strategy, there were some bugs in it and another strategy should be
> > > preferred, cf https://issues.apache.org/jira/browse/LOG4J2-1221.
> > However in
> > > this case, I expect the background thread to be in WAITING state. If it
> > is
> > > still RUNNABLE but doing nothing, it would suggest that you are using a
> > > spinning wait strategy, and it would be the sign of a pretty serious
> > > disruptor bug. If you are able to reproduce it and do a thread dump
> while
> > > the system is stuck and not during your shutdown procedure, it can
> help a
> > > lot.
> > >
> > > - the one from Carter where we sometimes need to log from the
> > > background thread (according to the disruptor github issue) : with
> > default
> > > strategy that wait for a slot in the disruptor to be empty, this just
> > > cannot work by design since the only thread that can free some slot is
> > > blocked waiting for an empty slot.
> > >      - change the queue full strategy to either be synchronous or drop
> > > messages so that the background thread can always make progress but it
> > has
> > > some drawbacks (messages lost or out of orders)
> > >      - don't log asynchronously from the background thread : if
> > > parameter.toString() calls log4j, I think that it is a pretty bad
> design
> > > but if you don't have the choice then you need to have some workaround
> > like
> > > mixing async and sync logging in your log4j config so that these
> > toString()
> > > related logs are always done synchronously.
> > >
> > >
> > >
> > > Le lun. 15 mars 2021 à 14:21, Carter Kozak <ck...@ckozak.net> a
> écrit :
> > >
> > > > > The issue here is that you are trying to log from the thread.
> > > >
> > > > That doesn't match my understanding of the problem, but it's entirely
> > > > possible I've missed something. The
> > > > "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" thread in the
> > example
> > > > above is waiting for events to become available on the disruptor
> > ringbuffer
> > > > and is not in the process of logging anything itself as far as I can
> > tell.
> > > >
> > > > > The main question here is : why do you need to log from the logging
> > > > thread ?
> > > >
> > > > Sometimes the background thread must log, although in many cases we
> > > > attempt to avoid it. Consider message parameters which must be
> > extracted,
> > > > calling Object.toString may result in a logger invocation. We have
> > code in
> > > > place to prevent recursive logging which can avoid some of these
> > calls, as
> > > > well as code log synchronously in some of these cases to avoid
> > potentially
> > > > queueing between different ringbuffers on the same system, although
> in
> > many
> > > > cases we want to allow these calls to log because it provides
> valuable
> > and
> > > > expected information.
> > > >
> > > > Best,
> > > > -Carter
> > > >
> > > > On Mon, Mar 15, 2021, at 09:08, Anthony Maire wrote:
> > > > > Hello
> > > > >
> > > > > The issue here is that you are trying to log from the thread. Since
> > this
> > > > > thread is the only one that consume event from the disruptor, is
> the
> > > > buffer
> > > > > is full, it deadlocks itself.
> > > > > It is not really a disruptor bug, it is an inappropriate usage of
> it
> > > > which
> > > > > leads to this behavior.
> > > > >
> > > > > In this thread, only log4j-related processing should occurs
> (loggers,
> > > > > appenders, filters and so on), so using a standard log4j logger for
> > this
> > > > is
> > > > > pretty weird. Using the statusLogger is more appropriate most of
> the
> > time
> > > > > (and this logger as far as I know doesn't go through the disruptor,
> > > > please
> > > > > correctly if I'm wrong)
> > > > >
> > > > > The main question here is : why do you need to log from the logging
> > > > thread ?
> > > > >
> > > > > Best regards,
> > > > > Anthony
> > > > >
> > > > >
> > > > > Le lun. 15 mars 2021 à 13:50, Carter Kozak <ck...@ckozak.net> a
> > écrit :
> > > > >
> > > > > > This sounds a lot like an issue I discussed with the
> lmax-disruptor
> > > > folks
> > > > > > here: https://github.com/LMAX-Exchange/disruptor/issues/307
> > > > > > I ended up switching my services to log synchronously rather than
> > > > enqueue
> > > > > > when the buffer is full, which
> > > > > > can produce events out of order, however we log into a structured
> > > > pipeline
> > > > > > that indexes based on event timestamp. The funny thing is that I
> > > > haven't
> > > > > > seen any occurrences of this behavior since switching the
> > queue-full
> > > > policy
> > > > > > to synchronous rather than enqueue, so I'm inclined to believe
> > there's
> > > > a
> > > > > > disruptor bug somewhere, but I could be wrong.
> > > > > >
> > > > > > On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > > > > > > Yes unfortunately, all logging stopped and nothing was seen in
> > the
> > > > log
> > > > > > > file for about 2 hrs till service was restarted. But there is
> > nothing
> > > > > > > pointing to a deadlock in the threads dump we git so far :(
> > However,
> > > > > > > the threads dump captured was part of our standard kill -3 if
> the
> > > > > > > process isn't exiting normally. So it may not show the correct
> > > > picture
> > > > > > > during shutdown.
> > > > > > >
> > > > > > > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <
> > > > ralph.goers@dslextreme.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > One other thing. My expectation is that once the buffer is
> > full you
> > > > > > should continue to log but at the rate your underlying system can
> > > > handle.
> > > > > > This wouldn’t be a deadlock but if the rate at which you are
> > logging is
> > > > > > higher than the system can write the result will be that the
> > > > application
> > > > > > slows down.
> > > > > > > >
> > > > > > > > If you are truly seeing a deadlock then it is possible there
> > is a
> > > > > > problem in the disruptor but I have very little experience with
> > that
> > > > and
> > > > > > would need Remko to weigh in.
> > > > > > > >
> > > > > > > > Ralph
> > > > > > > >
> > > > > > > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <
> > > > > > ralph.goers@dslextreme.com> wrote:
> > > > > > > > >
> > > > > > > > > Yes, it rings bells and really should have a slot on our
> FAQ
> > page
> > > > > > (although the question really isn’t asked all that frequently).
> > > > > > > > >
> > > > > > > > > First it means that you are logging faster than your system
> > can
> > > > > > handle. The proper response is to figure out why you are logging
> so
> > > > much
> > > > > > and try to reduce it or use an alternative destination that is
> > faster.
> > > > For
> > > > > > example, if you are logging to a file you may need to improve the
> > > > hardware
> > > > > > or increase the buffer size.
> > > > > > > > >
> > > > > > > > > As you observed your buffer has filled up. The default
> > behavior
> > > > is
> > > > > > for threads to wait until there is an empty slot.
> > > > > > > > >
> > > > > > > > > Remko is the real expert on this but I will take a stab at
> > > > answering
> > > > > > your question. The threads below show that they are in
> > > > > > handleRingBufferFull. That method has 3 options to choose from
> > based
> > > > on how
> > > > > > you have configured the asyncQueueFullPolicy. You basically have
> 3
> > > > choices:
> > > > > > > > > Wait for an open slot (the default)
> > > > > > > > > Resort to logging synchronously.
> > > > > > > > > Discard log events.
> > > > > > > > >
> > > > > > > > > You should look at
> > > > > >
> > > >
> >
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > > > > > <
> > > > > >
> > > >
> >
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > > > >.
> > > > > > Note that option 2 above (synchronous) is only used by Log4j in
> > certain
> > > > > > circumstances and there isn’t an option to specify that, so the
> > only
> > > > real
> > > > > > options are the default behavior, discarding, or create a custom
> > > > > > AsyncQueueFullPolicy.
> > > > > > > > >
> > > > > > > > > Ralph
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <
> > leonfin@gmail.com>
> > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> HI,
> > > > > > > > >>
> > > > > > > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS.
> JRE
> > > > 11.0.5
> > > > > > > > >> Using the following JRE args:
> > > > > > > > >> -DAsyncLogger.RingBufferSize=32768
> > > > > > > > >>
> > > > > >
> > > >
> >
> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > > > > > > >> -Dlog4j.configurationFile=...
> > > > > > > > >>
> > > > > > > > >> The disruptor queue has filled up. And we've observed
> > deadlock
> > > > with
> > > > > > > > >> the interesting threads stuck like the following.
> Disruptor
> > > > > > processing
> > > > > > > > >> thread seems to be waiting to process events and all the
> > other
> > > > > > threads
> > > > > > > > >> locked up on trying to publish the events.
> > > > > > > > >>
> > > > > > > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23
> > daemon
> > > > > > prio=5
> > > > > > > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s
> > > > tid=0x00007f4c2ca34000
> > > > > > > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > > > > > > >>  java.lang.Thread.State: RUNNABLE
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> > com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > > > > > > >>       at java.lang.Thread.run(java.base@11.0.5
> > /Thread.java:834)
> > > > > > > > >>
> > > > > > > > >>  Locked ownable synchronizers:
> > > > > > > > >>       - None
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms
> > elapsed=43967.04s
> > > > > > > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable
> > [0x00007f49f77dd000]
> > > > > > > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > > > > > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5
> > /Native
> > > > > > Method)
> > > > > > > > >>       at
> > > > > > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > > > > > > >> /LockSupport.java:357)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > > > > > > >>       at
> > > > > > com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > > > > > > >>       at
> > > > > > com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > > > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > > > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > > > > (Log4jLogger.java:179)
> > > > > > > > >>       ...
> > > > > > > > >>       at java.lang.Thread.run(java.base@11.0.5
> > /Thread.java:834)
> > > > > > > > >>
> > > > > > > > >>  Locked ownable synchronizers:
> > > > > > > > >>       - None
> > > > > > > > >>
> > > > > > > > >> and all other threads are stuck like this:
> > > > > > > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms
> > > > elapsed=43966.67s
> > > > > > > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor
> entry
> > > > > > > > >> [0x00007f49f6ad2000]
> > > > > > > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > > > > >>       - waiting to lock <0x0000000701017288> (a
> > > > java.lang.Object)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > > > > > > >>       at
> > > > > > > > >>
> > > > > >
> > > >
> >
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > > > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > > > > (Log4jLogger.java:194)
> > > > > > > > >> ...
> > > > > > > > >>
> > > > > > > > >> Does this ring any bells?
> > > > > > > > >>
> > > > > > > > >> Thank you in advance!
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > ---------------------------------------------------------------------
> > > > > > > > To unsubscribe, e-mail:
> > log4j-user-unsubscribe@logging.apache.org
> > > > > > > > For additional commands, e-mail:
> > > > log4j-user-help@logging.apache.org
> > > > > > > >
> > > > > > >
> > > > > > >
> > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail:
> > log4j-user-unsubscribe@logging.apache.org
> > > > > > > For additional commands, e-mail:
> > log4j-user-help@logging.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> log4j-user-unsubscribe@logging.apache.org
> > > > > > For additional commands, e-mail:
> > log4j-user-help@logging.apache.org
> > > > > >
> > > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > > >
> > > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > For additional commands, e-mail: log4j-user-help@logging.apache.org
> >
> >
>

Re: Deadlock observed

Posted by Remko Popma <re...@gmail.com>.
  Leon, can you definitely rule out the possibility that there are any
objects being logged which themselves call Log4j from their toString
method?

On Tue, Mar 16, 2021 at 7:28 AM Leon Finker <le...@gmail.com> wrote:

> We are using default TimeoutBlockingWaitStrategy. I doubt it will
> reproduce again. We have been using 1.13.1 since October 2020 and this
> never happened till now (been using log4j2 since the beginning on
> various versions). We run about 200 of these specific service
> instances and never observed it before. But it would require the async
> queue to be filled up, which almost never happens. The only other time
> I've seen a somewhat similar deadlock with the disruptor consumer
> thread was https://issues.apache.org/jira/browse/LOG4J2-1518, but that
> was a clear issue with toString logging as well on the producer side.
>
> What I don't understand is that default async queue full policy is
> supposed to be synchronous. So not sure why it would block on full
> queue. Also, I see a bit more changes in the default one since my
> 13.1.1 version (but doesn't look like it would help):
>
> https://github.com/apache/logging-log4j2/blame/master/log4j-core/src/main/java/org/apache/logging/log4j/core/async/DefaultAsyncQueueFullPolicy.java
> https://issues.apache.org/jira/browse/LOG4J2-2895
>
> I don't think my thread is either the background thread or
> Log4J2Thread...So I can't understand the blocking on full queue in my
> case.
>
> On Mon, Mar 15, 2021 at 1:39 PM Anthony Maire <ma...@gmail.com>
> wrote:
> >
> > My bad, I read things a little too quickly and I havent seen the
> separation
> > between applicative threads thread and background logging thread in the
> > original message. There are 2 issues there :
> > - the one from Leon : the application thread cannot make progress because
> > the disruptor buffer is full, but the background locking thread is doing
> > nothing, waiting to be signaled from one producers. Can you tell what
> > waitStrategy you are using for disruptor ? If it is the blocking wait
> > strategy, there were some bugs in it and another strategy should be
> > preferred, cf https://issues.apache.org/jira/browse/LOG4J2-1221.
> However in
> > this case, I expect the background thread to be in WAITING state. If it
> is
> > still RUNNABLE but doing nothing, it would suggest that you are using a
> > spinning wait strategy, and it would be the sign of a pretty serious
> > disruptor bug. If you are able to reproduce it and do a thread dump while
> > the system is stuck and not during your shutdown procedure, it can help a
> > lot.
> >
> > - the one from Carter where we sometimes need to log from the
> > background thread (according to the disruptor github issue) : with
> default
> > strategy that wait for a slot in the disruptor to be empty, this just
> > cannot work by design since the only thread that can free some slot is
> > blocked waiting for an empty slot.
> >      - change the queue full strategy to either be synchronous or drop
> > messages so that the background thread can always make progress but it
> has
> > some drawbacks (messages lost or out of orders)
> >      - don't log asynchronously from the background thread : if
> > parameter.toString() calls log4j, I think that it is a pretty bad design
> > but if you don't have the choice then you need to have some workaround
> like
> > mixing async and sync logging in your log4j config so that these
> toString()
> > related logs are always done synchronously.
> >
> >
> >
> > Le lun. 15 mars 2021 à 14:21, Carter Kozak <ck...@ckozak.net> a écrit :
> >
> > > > The issue here is that you are trying to log from the thread.
> > >
> > > That doesn't match my understanding of the problem, but it's entirely
> > > possible I've missed something. The
> > > "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" thread in the
> example
> > > above is waiting for events to become available on the disruptor
> ringbuffer
> > > and is not in the process of logging anything itself as far as I can
> tell.
> > >
> > > > The main question here is : why do you need to log from the logging
> > > thread ?
> > >
> > > Sometimes the background thread must log, although in many cases we
> > > attempt to avoid it. Consider message parameters which must be
> extracted,
> > > calling Object.toString may result in a logger invocation. We have
> code in
> > > place to prevent recursive logging which can avoid some of these
> calls, as
> > > well as code log synchronously in some of these cases to avoid
> potentially
> > > queueing between different ringbuffers on the same system, although in
> many
> > > cases we want to allow these calls to log because it provides valuable
> and
> > > expected information.
> > >
> > > Best,
> > > -Carter
> > >
> > > On Mon, Mar 15, 2021, at 09:08, Anthony Maire wrote:
> > > > Hello
> > > >
> > > > The issue here is that you are trying to log from the thread. Since
> this
> > > > thread is the only one that consume event from the disruptor, is the
> > > buffer
> > > > is full, it deadlocks itself.
> > > > It is not really a disruptor bug, it is an inappropriate usage of it
> > > which
> > > > leads to this behavior.
> > > >
> > > > In this thread, only log4j-related processing should occurs (loggers,
> > > > appenders, filters and so on), so using a standard log4j logger for
> this
> > > is
> > > > pretty weird. Using the statusLogger is more appropriate most of the
> time
> > > > (and this logger as far as I know doesn't go through the disruptor,
> > > please
> > > > correctly if I'm wrong)
> > > >
> > > > The main question here is : why do you need to log from the logging
> > > thread ?
> > > >
> > > > Best regards,
> > > > Anthony
> > > >
> > > >
> > > > Le lun. 15 mars 2021 à 13:50, Carter Kozak <ck...@ckozak.net> a
> écrit :
> > > >
> > > > > This sounds a lot like an issue I discussed with the lmax-disruptor
> > > folks
> > > > > here: https://github.com/LMAX-Exchange/disruptor/issues/307
> > > > > I ended up switching my services to log synchronously rather than
> > > enqueue
> > > > > when the buffer is full, which
> > > > > can produce events out of order, however we log into a structured
> > > pipeline
> > > > > that indexes based on event timestamp. The funny thing is that I
> > > haven't
> > > > > seen any occurrences of this behavior since switching the
> queue-full
> > > policy
> > > > > to synchronous rather than enqueue, so I'm inclined to believe
> there's
> > > a
> > > > > disruptor bug somewhere, but I could be wrong.
> > > > >
> > > > > On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > > > > > Yes unfortunately, all logging stopped and nothing was seen in
> the
> > > log
> > > > > > file for about 2 hrs till service was restarted. But there is
> nothing
> > > > > > pointing to a deadlock in the threads dump we git so far :(
> However,
> > > > > > the threads dump captured was part of our standard kill -3 if the
> > > > > > process isn't exiting normally. So it may not show the correct
> > > picture
> > > > > > during shutdown.
> > > > > >
> > > > > > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <
> > > ralph.goers@dslextreme.com>
> > > > > wrote:
> > > > > > >
> > > > > > > One other thing. My expectation is that once the buffer is
> full you
> > > > > should continue to log but at the rate your underlying system can
> > > handle.
> > > > > This wouldn’t be a deadlock but if the rate at which you are
> logging is
> > > > > higher than the system can write the result will be that the
> > > application
> > > > > slows down.
> > > > > > >
> > > > > > > If you are truly seeing a deadlock then it is possible there
> is a
> > > > > problem in the disruptor but I have very little experience with
> that
> > > and
> > > > > would need Remko to weigh in.
> > > > > > >
> > > > > > > Ralph
> > > > > > >
> > > > > > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <
> > > > > ralph.goers@dslextreme.com> wrote:
> > > > > > > >
> > > > > > > > Yes, it rings bells and really should have a slot on our FAQ
> page
> > > > > (although the question really isn’t asked all that frequently).
> > > > > > > >
> > > > > > > > First it means that you are logging faster than your system
> can
> > > > > handle. The proper response is to figure out why you are logging so
> > > much
> > > > > and try to reduce it or use an alternative destination that is
> faster.
> > > For
> > > > > example, if you are logging to a file you may need to improve the
> > > hardware
> > > > > or increase the buffer size.
> > > > > > > >
> > > > > > > > As you observed your buffer has filled up. The default
> behavior
> > > is
> > > > > for threads to wait until there is an empty slot.
> > > > > > > >
> > > > > > > > Remko is the real expert on this but I will take a stab at
> > > answering
> > > > > your question. The threads below show that they are in
> > > > > handleRingBufferFull. That method has 3 options to choose from
> based
> > > on how
> > > > > you have configured the asyncQueueFullPolicy. You basically have 3
> > > choices:
> > > > > > > > Wait for an open slot (the default)
> > > > > > > > Resort to logging synchronously.
> > > > > > > > Discard log events.
> > > > > > > >
> > > > > > > > You should look at
> > > > >
> > >
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > > > > <
> > > > >
> > >
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > > >.
> > > > > Note that option 2 above (synchronous) is only used by Log4j in
> certain
> > > > > circumstances and there isn’t an option to specify that, so the
> only
> > > real
> > > > > options are the default behavior, discarding, or create a custom
> > > > > AsyncQueueFullPolicy.
> > > > > > > >
> > > > > > > > Ralph
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <
> leonfin@gmail.com>
> > > > > wrote:
> > > > > > > >>
> > > > > > > >> HI,
> > > > > > > >>
> > > > > > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE
> > > 11.0.5
> > > > > > > >> Using the following JRE args:
> > > > > > > >> -DAsyncLogger.RingBufferSize=32768
> > > > > > > >>
> > > > >
> > >
> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > > > > > >> -Dlog4j.configurationFile=...
> > > > > > > >>
> > > > > > > >> The disruptor queue has filled up. And we've observed
> deadlock
> > > with
> > > > > > > >> the interesting threads stuck like the following. Disruptor
> > > > > processing
> > > > > > > >> thread seems to be waiting to process events and all the
> other
> > > > > threads
> > > > > > > >> locked up on trying to publish the events.
> > > > > > > >>
> > > > > > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23
> daemon
> > > > > prio=5
> > > > > > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s
> > > tid=0x00007f4c2ca34000
> > > > > > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > > > > > >>  java.lang.Thread.State: RUNNABLE
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > > > > > >>       at java.lang.Thread.run(java.base@11.0.5
> /Thread.java:834)
> > > > > > > >>
> > > > > > > >>  Locked ownable synchronizers:
> > > > > > > >>       - None
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms
> elapsed=43967.04s
> > > > > > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable
> [0x00007f49f77dd000]
> > > > > > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > > > > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5
> /Native
> > > > > Method)
> > > > > > > >>       at
> > > > > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > > > > > >> /LockSupport.java:357)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > > > > > >>       at
> > > > > com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > > > > > >>       at
> > > > > com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > > > (Log4jLogger.java:179)
> > > > > > > >>       ...
> > > > > > > >>       at java.lang.Thread.run(java.base@11.0.5
> /Thread.java:834)
> > > > > > > >>
> > > > > > > >>  Locked ownable synchronizers:
> > > > > > > >>       - None
> > > > > > > >>
> > > > > > > >> and all other threads are stuck like this:
> > > > > > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms
> > > elapsed=43966.67s
> > > > > > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > > > > > > >> [0x00007f49f6ad2000]
> > > > > > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > > > >>       - waiting to lock <0x0000000701017288> (a
> > > java.lang.Object)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > > > > > >>       at
> > > > > > > >>
> > > > >
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > > > (Log4jLogger.java:194)
> > > > > > > >> ...
> > > > > > > >>
> > > > > > > >> Does this ring any bells?
> > > > > > > >>
> > > > > > > >> Thank you in advance!
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > To unsubscribe, e-mail:
> log4j-user-unsubscribe@logging.apache.org
> > > > > > > For additional commands, e-mail:
> > > log4j-user-help@logging.apache.org
> > > > > > >
> > > > > >
> > > > > >
> ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail:
> log4j-user-unsubscribe@logging.apache.org
> > > > > > For additional commands, e-mail:
> log4j-user-help@logging.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > > For additional commands, e-mail:
> log4j-user-help@logging.apache.org
> > > > >
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > >
> > >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
>
>

Re: Deadlock observed

Posted by Leon Finker <le...@gmail.com>.
We are using default TimeoutBlockingWaitStrategy. I doubt it will
reproduce again. We have been using 1.13.1 since October 2020 and this
never happened till now (been using log4j2 since the beginning on
various versions). We run about 200 of these specific service
instances and never observed it before. But it would require the async
queue to be filled up, which almost never happens. The only other time
I've seen a somewhat similar deadlock with the disruptor consumer
thread was https://issues.apache.org/jira/browse/LOG4J2-1518, but that
was a clear issue with toString logging as well on the producer side.

What I don't understand is that default async queue full policy is
supposed to be synchronous. So not sure why it would block on full
queue. Also, I see a bit more changes in the default one since my
13.1.1 version (but doesn't look like it would help):
https://github.com/apache/logging-log4j2/blame/master/log4j-core/src/main/java/org/apache/logging/log4j/core/async/DefaultAsyncQueueFullPolicy.java
https://issues.apache.org/jira/browse/LOG4J2-2895

I don't think my thread is either the background thread or
Log4J2Thread...So I can't understand the blocking on full queue in my
case.

On Mon, Mar 15, 2021 at 1:39 PM Anthony Maire <ma...@gmail.com> wrote:
>
> My bad, I read things a little too quickly and I havent seen the separation
> between applicative threads thread and background logging thread in the
> original message. There are 2 issues there :
> - the one from Leon : the application thread cannot make progress because
> the disruptor buffer is full, but the background locking thread is doing
> nothing, waiting to be signaled from one producers. Can you tell what
> waitStrategy you are using for disruptor ? If it is the blocking wait
> strategy, there were some bugs in it and another strategy should be
> preferred, cf https://issues.apache.org/jira/browse/LOG4J2-1221. However in
> this case, I expect the background thread to be in WAITING state. If it is
> still RUNNABLE but doing nothing, it would suggest that you are using a
> spinning wait strategy, and it would be the sign of a pretty serious
> disruptor bug. If you are able to reproduce it and do a thread dump while
> the system is stuck and not during your shutdown procedure, it can help a
> lot.
>
> - the one from Carter where we sometimes need to log from the
> background thread (according to the disruptor github issue) : with default
> strategy that wait for a slot in the disruptor to be empty, this just
> cannot work by design since the only thread that can free some slot is
> blocked waiting for an empty slot.
>      - change the queue full strategy to either be synchronous or drop
> messages so that the background thread can always make progress but it has
> some drawbacks (messages lost or out of orders)
>      - don't log asynchronously from the background thread : if
> parameter.toString() calls log4j, I think that it is a pretty bad design
> but if you don't have the choice then you need to have some workaround like
> mixing async and sync logging in your log4j config so that these toString()
> related logs are always done synchronously.
>
>
>
> Le lun. 15 mars 2021 à 14:21, Carter Kozak <ck...@ckozak.net> a écrit :
>
> > > The issue here is that you are trying to log from the thread.
> >
> > That doesn't match my understanding of the problem, but it's entirely
> > possible I've missed something. The
> > "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" thread in the example
> > above is waiting for events to become available on the disruptor ringbuffer
> > and is not in the process of logging anything itself as far as I can tell.
> >
> > > The main question here is : why do you need to log from the logging
> > thread ?
> >
> > Sometimes the background thread must log, although in many cases we
> > attempt to avoid it. Consider message parameters which must be extracted,
> > calling Object.toString may result in a logger invocation. We have code in
> > place to prevent recursive logging which can avoid some of these calls, as
> > well as code log synchronously in some of these cases to avoid potentially
> > queueing between different ringbuffers on the same system, although in many
> > cases we want to allow these calls to log because it provides valuable and
> > expected information.
> >
> > Best,
> > -Carter
> >
> > On Mon, Mar 15, 2021, at 09:08, Anthony Maire wrote:
> > > Hello
> > >
> > > The issue here is that you are trying to log from the thread. Since this
> > > thread is the only one that consume event from the disruptor, is the
> > buffer
> > > is full, it deadlocks itself.
> > > It is not really a disruptor bug, it is an inappropriate usage of it
> > which
> > > leads to this behavior.
> > >
> > > In this thread, only log4j-related processing should occurs (loggers,
> > > appenders, filters and so on), so using a standard log4j logger for this
> > is
> > > pretty weird. Using the statusLogger is more appropriate most of the time
> > > (and this logger as far as I know doesn't go through the disruptor,
> > please
> > > correctly if I'm wrong)
> > >
> > > The main question here is : why do you need to log from the logging
> > thread ?
> > >
> > > Best regards,
> > > Anthony
> > >
> > >
> > > Le lun. 15 mars 2021 à 13:50, Carter Kozak <ck...@ckozak.net> a écrit :
> > >
> > > > This sounds a lot like an issue I discussed with the lmax-disruptor
> > folks
> > > > here: https://github.com/LMAX-Exchange/disruptor/issues/307
> > > > I ended up switching my services to log synchronously rather than
> > enqueue
> > > > when the buffer is full, which
> > > > can produce events out of order, however we log into a structured
> > pipeline
> > > > that indexes based on event timestamp. The funny thing is that I
> > haven't
> > > > seen any occurrences of this behavior since switching the queue-full
> > policy
> > > > to synchronous rather than enqueue, so I'm inclined to believe there's
> > a
> > > > disruptor bug somewhere, but I could be wrong.
> > > >
> > > > On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > > > > Yes unfortunately, all logging stopped and nothing was seen in the
> > log
> > > > > file for about 2 hrs till service was restarted. But there is nothing
> > > > > pointing to a deadlock in the threads dump we git so far :( However,
> > > > > the threads dump captured was part of our standard kill -3 if the
> > > > > process isn't exiting normally. So it may not show the correct
> > picture
> > > > > during shutdown.
> > > > >
> > > > > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <
> > ralph.goers@dslextreme.com>
> > > > wrote:
> > > > > >
> > > > > > One other thing. My expectation is that once the buffer is full you
> > > > should continue to log but at the rate your underlying system can
> > handle.
> > > > This wouldn’t be a deadlock but if the rate at which you are logging is
> > > > higher than the system can write the result will be that the
> > application
> > > > slows down.
> > > > > >
> > > > > > If you are truly seeing a deadlock then it is possible there is a
> > > > problem in the disruptor but I have very little experience with that
> > and
> > > > would need Remko to weigh in.
> > > > > >
> > > > > > Ralph
> > > > > >
> > > > > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <
> > > > ralph.goers@dslextreme.com> wrote:
> > > > > > >
> > > > > > > Yes, it rings bells and really should have a slot on our FAQ page
> > > > (although the question really isn’t asked all that frequently).
> > > > > > >
> > > > > > > First it means that you are logging faster than your system can
> > > > handle. The proper response is to figure out why you are logging so
> > much
> > > > and try to reduce it or use an alternative destination that is faster.
> > For
> > > > example, if you are logging to a file you may need to improve the
> > hardware
> > > > or increase the buffer size.
> > > > > > >
> > > > > > > As you observed your buffer has filled up. The default behavior
> > is
> > > > for threads to wait until there is an empty slot.
> > > > > > >
> > > > > > > Remko is the real expert on this but I will take a stab at
> > answering
> > > > your question. The threads below show that they are in
> > > > handleRingBufferFull. That method has 3 options to choose from based
> > on how
> > > > you have configured the asyncQueueFullPolicy. You basically have 3
> > choices:
> > > > > > > Wait for an open slot (the default)
> > > > > > > Resort to logging synchronously.
> > > > > > > Discard log events.
> > > > > > >
> > > > > > > You should look at
> > > >
> > https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > > > <
> > > >
> > https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > >.
> > > > Note that option 2 above (synchronous) is only used by Log4j in certain
> > > > circumstances and there isn’t an option to specify that, so the only
> > real
> > > > options are the default behavior, discarding, or create a custom
> > > > AsyncQueueFullPolicy.
> > > > > > >
> > > > > > > Ralph
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com>
> > > > wrote:
> > > > > > >>
> > > > > > >> HI,
> > > > > > >>
> > > > > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE
> > 11.0.5
> > > > > > >> Using the following JRE args:
> > > > > > >> -DAsyncLogger.RingBufferSize=32768
> > > > > > >>
> > > >
> > -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > > > > >> -Dlog4j.configurationFile=...
> > > > > > >>
> > > > > > >> The disruptor queue has filled up. And we've observed deadlock
> > with
> > > > > > >> the interesting threads stuck like the following. Disruptor
> > > > processing
> > > > > > >> thread seems to be waiting to process events and all the other
> > > > threads
> > > > > > >> locked up on trying to publish the events.
> > > > > > >>
> > > > > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon
> > > > prio=5
> > > > > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s
> > tid=0x00007f4c2ca34000
> > > > > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > > > > >>  java.lang.Thread.State: RUNNABLE
> > > > > > >>       at
> > > > > > >>
> > > >
> > com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > > > > >>       at
> > > > > > >>
> > > >
> > com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > > > > >>       at
> > > > > > >>
> > > >
> > com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > > > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > > > > >>
> > > > > > >>  Locked ownable synchronizers:
> > > > > > >>       - None
> > > > > > >>
> > > > > > >>
> > > > > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> > > > > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> > > > > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > > > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native
> > > > Method)
> > > > > > >>       at
> > > > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > > > > >> /LockSupport.java:357)
> > > > > > >>       at
> > > > > > >>
> > > >
> > com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > > > > >>       at
> > > > > > >>
> > > >
> > com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > > > > >>       at
> > > > com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > > > > >>       at
> > > > com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > > (Log4jLogger.java:179)
> > > > > > >>       ...
> > > > > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > > > > >>
> > > > > > >>  Locked ownable synchronizers:
> > > > > > >>       - None
> > > > > > >>
> > > > > > >> and all other threads are stuck like this:
> > > > > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms
> > elapsed=43966.67s
> > > > > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > > > > > >> [0x00007f49f6ad2000]
> > > > > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > > >>       - waiting to lock <0x0000000701017288> (a
> > java.lang.Object)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > > > > >>       at
> > > > > > >>
> > > >
> > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > > (Log4jLogger.java:194)
> > > > > > >> ...
> > > > > > >>
> > > > > > >> Does this ring any bells?
> > > > > > >>
> > > > > > >> Thank you in advance!
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > > > For additional commands, e-mail:
> > log4j-user-help@logging.apache.org
> > > > > >
> > > > >
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > > > >
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > > >
> > > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > For additional commands, e-mail: log4j-user-help@logging.apache.org
> >
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Deadlock observed

Posted by Anthony Maire <ma...@gmail.com>.
My bad, I read things a little too quickly and I havent seen the separation
between applicative threads thread and background logging thread in the
original message. There are 2 issues there :
- the one from Leon : the application thread cannot make progress because
the disruptor buffer is full, but the background locking thread is doing
nothing, waiting to be signaled from one producers. Can you tell what
waitStrategy you are using for disruptor ? If it is the blocking wait
strategy, there were some bugs in it and another strategy should be
preferred, cf https://issues.apache.org/jira/browse/LOG4J2-1221. However in
this case, I expect the background thread to be in WAITING state. If it is
still RUNNABLE but doing nothing, it would suggest that you are using a
spinning wait strategy, and it would be the sign of a pretty serious
disruptor bug. If you are able to reproduce it and do a thread dump while
the system is stuck and not during your shutdown procedure, it can help a
lot.

- the one from Carter where we sometimes need to log from the
background thread (according to the disruptor github issue) : with default
strategy that wait for a slot in the disruptor to be empty, this just
cannot work by design since the only thread that can free some slot is
blocked waiting for an empty slot.
     - change the queue full strategy to either be synchronous or drop
messages so that the background thread can always make progress but it has
some drawbacks (messages lost or out of orders)
     - don't log asynchronously from the background thread : if
parameter.toString() calls log4j, I think that it is a pretty bad design
but if you don't have the choice then you need to have some workaround like
mixing async and sync logging in your log4j config so that these toString()
related logs are always done synchronously.



Le lun. 15 mars 2021 à 14:21, Carter Kozak <ck...@ckozak.net> a écrit :

> > The issue here is that you are trying to log from the thread.
>
> That doesn't match my understanding of the problem, but it's entirely
> possible I've missed something. The
> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" thread in the example
> above is waiting for events to become available on the disruptor ringbuffer
> and is not in the process of logging anything itself as far as I can tell.
>
> > The main question here is : why do you need to log from the logging
> thread ?
>
> Sometimes the background thread must log, although in many cases we
> attempt to avoid it. Consider message parameters which must be extracted,
> calling Object.toString may result in a logger invocation. We have code in
> place to prevent recursive logging which can avoid some of these calls, as
> well as code log synchronously in some of these cases to avoid potentially
> queueing between different ringbuffers on the same system, although in many
> cases we want to allow these calls to log because it provides valuable and
> expected information.
>
> Best,
> -Carter
>
> On Mon, Mar 15, 2021, at 09:08, Anthony Maire wrote:
> > Hello
> >
> > The issue here is that you are trying to log from the thread. Since this
> > thread is the only one that consume event from the disruptor, is the
> buffer
> > is full, it deadlocks itself.
> > It is not really a disruptor bug, it is an inappropriate usage of it
> which
> > leads to this behavior.
> >
> > In this thread, only log4j-related processing should occurs (loggers,
> > appenders, filters and so on), so using a standard log4j logger for this
> is
> > pretty weird. Using the statusLogger is more appropriate most of the time
> > (and this logger as far as I know doesn't go through the disruptor,
> please
> > correctly if I'm wrong)
> >
> > The main question here is : why do you need to log from the logging
> thread ?
> >
> > Best regards,
> > Anthony
> >
> >
> > Le lun. 15 mars 2021 à 13:50, Carter Kozak <ck...@ckozak.net> a écrit :
> >
> > > This sounds a lot like an issue I discussed with the lmax-disruptor
> folks
> > > here: https://github.com/LMAX-Exchange/disruptor/issues/307
> > > I ended up switching my services to log synchronously rather than
> enqueue
> > > when the buffer is full, which
> > > can produce events out of order, however we log into a structured
> pipeline
> > > that indexes based on event timestamp. The funny thing is that I
> haven't
> > > seen any occurrences of this behavior since switching the queue-full
> policy
> > > to synchronous rather than enqueue, so I'm inclined to believe there's
> a
> > > disruptor bug somewhere, but I could be wrong.
> > >
> > > On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > > > Yes unfortunately, all logging stopped and nothing was seen in the
> log
> > > > file for about 2 hrs till service was restarted. But there is nothing
> > > > pointing to a deadlock in the threads dump we git so far :( However,
> > > > the threads dump captured was part of our standard kill -3 if the
> > > > process isn't exiting normally. So it may not show the correct
> picture
> > > > during shutdown.
> > > >
> > > > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <
> ralph.goers@dslextreme.com>
> > > wrote:
> > > > >
> > > > > One other thing. My expectation is that once the buffer is full you
> > > should continue to log but at the rate your underlying system can
> handle.
> > > This wouldn’t be a deadlock but if the rate at which you are logging is
> > > higher than the system can write the result will be that the
> application
> > > slows down.
> > > > >
> > > > > If you are truly seeing a deadlock then it is possible there is a
> > > problem in the disruptor but I have very little experience with that
> and
> > > would need Remko to weigh in.
> > > > >
> > > > > Ralph
> > > > >
> > > > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <
> > > ralph.goers@dslextreme.com> wrote:
> > > > > >
> > > > > > Yes, it rings bells and really should have a slot on our FAQ page
> > > (although the question really isn’t asked all that frequently).
> > > > > >
> > > > > > First it means that you are logging faster than your system can
> > > handle. The proper response is to figure out why you are logging so
> much
> > > and try to reduce it or use an alternative destination that is faster.
> For
> > > example, if you are logging to a file you may need to improve the
> hardware
> > > or increase the buffer size.
> > > > > >
> > > > > > As you observed your buffer has filled up. The default behavior
> is
> > > for threads to wait until there is an empty slot.
> > > > > >
> > > > > > Remko is the real expert on this but I will take a stab at
> answering
> > > your question. The threads below show that they are in
> > > handleRingBufferFull. That method has 3 options to choose from based
> on how
> > > you have configured the asyncQueueFullPolicy. You basically have 3
> choices:
> > > > > > Wait for an open slot (the default)
> > > > > > Resort to logging synchronously.
> > > > > > Discard log events.
> > > > > >
> > > > > > You should look at
> > >
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > > <
> > >
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> >.
> > > Note that option 2 above (synchronous) is only used by Log4j in certain
> > > circumstances and there isn’t an option to specify that, so the only
> real
> > > options are the default behavior, discarding, or create a custom
> > > AsyncQueueFullPolicy.
> > > > > >
> > > > > > Ralph
> > > > > >
> > > > > >
> > > > > >
> > > > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com>
> > > wrote:
> > > > > >>
> > > > > >> HI,
> > > > > >>
> > > > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE
> 11.0.5
> > > > > >> Using the following JRE args:
> > > > > >> -DAsyncLogger.RingBufferSize=32768
> > > > > >>
> > >
> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > > > >> -Dlog4j.configurationFile=...
> > > > > >>
> > > > > >> The disruptor queue has filled up. And we've observed deadlock
> with
> > > > > >> the interesting threads stuck like the following. Disruptor
> > > processing
> > > > > >> thread seems to be waiting to process events and all the other
> > > threads
> > > > > >> locked up on trying to publish the events.
> > > > > >>
> > > > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon
> > > prio=5
> > > > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s
> tid=0x00007f4c2ca34000
> > > > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > > > >>  java.lang.Thread.State: RUNNABLE
> > > > > >>       at
> > > > > >>
> > >
> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > > > >>       at
> > > > > >>
> > >
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > > > >>       at
> > > > > >>
> > >
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > > > >>
> > > > > >>  Locked ownable synchronizers:
> > > > > >>       - None
> > > > > >>
> > > > > >>
> > > > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> > > > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> > > > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native
> > > Method)
> > > > > >>       at
> > > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > > > >> /LockSupport.java:357)
> > > > > >>       at
> > > > > >>
> > >
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > > > >>       at
> > > > > >>
> > >
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > > > >>       at
> > > com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > > > >>       at
> > > com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > (Log4jLogger.java:179)
> > > > > >>       ...
> > > > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > > > >>
> > > > > >>  Locked ownable synchronizers:
> > > > > >>       - None
> > > > > >>
> > > > > >> and all other threads are stuck like this:
> > > > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms
> elapsed=43966.67s
> > > > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > > > > >> [0x00007f49f6ad2000]
> > > > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > > >>       - waiting to lock <0x0000000701017288> (a
> java.lang.Object)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > > > >>       at
> > > > > >>
> > >
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > > (Log4jLogger.java:194)
> > > > > >> ...
> > > > > >>
> > > > > >> Does this ring any bells?
> > > > > >>
> > > > > >> Thank you in advance!
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > > For additional commands, e-mail:
> log4j-user-help@logging.apache.org
> > > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
>
>

Re: Deadlock observed

Posted by Carter Kozak <ck...@ckozak.net>.
> The issue here is that you are trying to log from the thread.

That doesn't match my understanding of the problem, but it's entirely possible I've missed something. The "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" thread in the example above is waiting for events to become available on the disruptor ringbuffer and is not in the process of logging anything itself as far as I can tell.

> The main question here is : why do you need to log from the logging thread ?

Sometimes the background thread must log, although in many cases we attempt to avoid it. Consider message parameters which must be extracted, calling Object.toString may result in a logger invocation. We have code in place to prevent recursive logging which can avoid some of these calls, as well as code log synchronously in some of these cases to avoid potentially  queueing between different ringbuffers on the same system, although in many cases we want to allow these calls to log because it provides valuable and expected information.

Best,
-Carter

On Mon, Mar 15, 2021, at 09:08, Anthony Maire wrote:
> Hello
> 
> The issue here is that you are trying to log from the thread. Since this
> thread is the only one that consume event from the disruptor, is the buffer
> is full, it deadlocks itself.
> It is not really a disruptor bug, it is an inappropriate usage of it which
> leads to this behavior.
> 
> In this thread, only log4j-related processing should occurs (loggers,
> appenders, filters and so on), so using a standard log4j logger for this is
> pretty weird. Using the statusLogger is more appropriate most of the time
> (and this logger as far as I know doesn't go through the disruptor, please
> correctly if I'm wrong)
> 
> The main question here is : why do you need to log from the logging thread ?
> 
> Best regards,
> Anthony
> 
> 
> Le lun. 15 mars 2021 à 13:50, Carter Kozak <ck...@ckozak.net> a écrit :
> 
> > This sounds a lot like an issue I discussed with the lmax-disruptor folks
> > here: https://github.com/LMAX-Exchange/disruptor/issues/307
> > I ended up switching my services to log synchronously rather than enqueue
> > when the buffer is full, which
> > can produce events out of order, however we log into a structured pipeline
> > that indexes based on event timestamp. The funny thing is that I haven't
> > seen any occurrences of this behavior since switching the queue-full policy
> > to synchronous rather than enqueue, so I'm inclined to believe there's a
> > disruptor bug somewhere, but I could be wrong.
> >
> > On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > > Yes unfortunately, all logging stopped and nothing was seen in the log
> > > file for about 2 hrs till service was restarted. But there is nothing
> > > pointing to a deadlock in the threads dump we git so far :( However,
> > > the threads dump captured was part of our standard kill -3 if the
> > > process isn't exiting normally. So it may not show the correct picture
> > > during shutdown.
> > >
> > > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <ra...@dslextreme.com>
> > wrote:
> > > >
> > > > One other thing. My expectation is that once the buffer is full you
> > should continue to log but at the rate your underlying system can handle.
> > This wouldn’t be a deadlock but if the rate at which you are logging is
> > higher than the system can write the result will be that the application
> > slows down.
> > > >
> > > > If you are truly seeing a deadlock then it is possible there is a
> > problem in the disruptor but I have very little experience with that and
> > would need Remko to weigh in.
> > > >
> > > > Ralph
> > > >
> > > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <
> > ralph.goers@dslextreme.com> wrote:
> > > > >
> > > > > Yes, it rings bells and really should have a slot on our FAQ page
> > (although the question really isn’t asked all that frequently).
> > > > >
> > > > > First it means that you are logging faster than your system can
> > handle. The proper response is to figure out why you are logging so much
> > and try to reduce it or use an alternative destination that is faster. For
> > example, if you are logging to a file you may need to improve the hardware
> > or increase the buffer size.
> > > > >
> > > > > As you observed your buffer has filled up. The default behavior is
> > for threads to wait until there is an empty slot.
> > > > >
> > > > > Remko is the real expert on this but I will take a stab at answering
> > your question. The threads below show that they are in
> > handleRingBufferFull. That method has 3 options to choose from based on how
> > you have configured the asyncQueueFullPolicy. You basically have 3 choices:
> > > > > Wait for an open slot (the default)
> > > > > Resort to logging synchronously.
> > > > > Discard log events.
> > > > >
> > > > > You should look at
> > https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> > <
> > https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>.
> > Note that option 2 above (synchronous) is only used by Log4j in certain
> > circumstances and there isn’t an option to specify that, so the only real
> > options are the default behavior, discarding, or create a custom
> > AsyncQueueFullPolicy.
> > > > >
> > > > > Ralph
> > > > >
> > > > >
> > > > >
> > > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com>
> > wrote:
> > > > >>
> > > > >> HI,
> > > > >>
> > > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
> > > > >> Using the following JRE args:
> > > > >> -DAsyncLogger.RingBufferSize=32768
> > > > >>
> > -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > > >> -Dlog4j.configurationFile=...
> > > > >>
> > > > >> The disruptor queue has filled up. And we've observed deadlock with
> > > > >> the interesting threads stuck like the following. Disruptor
> > processing
> > > > >> thread seems to be waiting to process events and all the other
> > threads
> > > > >> locked up on trying to publish the events.
> > > > >>
> > > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon
> > prio=5
> > > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
> > > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > > >>  java.lang.Thread.State: RUNNABLE
> > > > >>       at
> > > > >>
> > com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > > >>       at
> > > > >>
> > com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > > >>       at
> > > > >>
> > com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > > >>
> > > > >>  Locked ownable synchronizers:
> > > > >>       - None
> > > > >>
> > > > >>
> > > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> > > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> > > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native
> > Method)
> > > > >>       at
> > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > > >> /LockSupport.java:357)
> > > > >>       at
> > > > >>
> > com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > > >>       at
> > > > >>
> > com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > > >>       at
> > com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > > >>       at
> > com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > (Log4jLogger.java:179)
> > > > >>       ...
> > > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > > >>
> > > > >>  Locked ownable synchronizers:
> > > > >>       - None
> > > > >>
> > > > >> and all other threads are stuck like this:
> > > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
> > > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > > > >> [0x00007f49f6ad2000]
> > > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > > >>       - waiting to lock <0x0000000701017288> (a java.lang.Object)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > > >>       at
> > > > >>
> > org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> > (Log4jLogger.java:194)
> > > > >> ...
> > > > >>
> > > > >> Does this ring any bells?
> > > > >>
> > > > >> Thank you in advance!
> > > > >
> > > >
> > > >
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > For additional commands, e-mail: log4j-user-help@logging.apache.org
> >
> >
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Deadlock observed

Posted by Anthony Maire <ma...@gmail.com>.
Hello

The issue here is that you are trying to log from the thread. Since this
thread is the only one that consume event from the disruptor, is the buffer
is full, it deadlocks itself.
It is not really a disruptor bug, it is an inappropriate usage of it which
leads to this behavior.

In this thread, only log4j-related processing should occurs (loggers,
appenders, filters and so on), so using a standard log4j logger for this is
pretty weird. Using the statusLogger is more appropriate most of the time
(and this logger as far as I know doesn't go through the disruptor, please
correctly if I'm wrong)

The main question here is : why do you need to log from the logging thread ?

Best regards,
Anthony


Le lun. 15 mars 2021 à 13:50, Carter Kozak <ck...@ckozak.net> a écrit :

> This sounds a lot like an issue I discussed with the lmax-disruptor folks
> here: https://github.com/LMAX-Exchange/disruptor/issues/307
> I ended up switching my services to log synchronously rather than enqueue
> when the buffer is full, which
> can produce events out of order, however we log into a structured pipeline
> that indexes based on event timestamp. The funny thing is that I haven't
> seen any occurrences of this behavior since switching the queue-full policy
> to synchronous rather than enqueue, so I'm inclined to believe there's a
> disruptor bug somewhere, but I could be wrong.
>
> On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> > Yes unfortunately, all logging stopped and nothing was seen in the log
> > file for about 2 hrs till service was restarted. But there is nothing
> > pointing to a deadlock in the threads dump we git so far :( However,
> > the threads dump captured was part of our standard kill -3 if the
> > process isn't exiting normally. So it may not show the correct picture
> > during shutdown.
> >
> > On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <ra...@dslextreme.com>
> wrote:
> > >
> > > One other thing. My expectation is that once the buffer is full you
> should continue to log but at the rate your underlying system can handle.
> This wouldn’t be a deadlock but if the rate at which you are logging is
> higher than the system can write the result will be that the application
> slows down.
> > >
> > > If you are truly seeing a deadlock then it is possible there is a
> problem in the disruptor but I have very little experience with that and
> would need Remko to weigh in.
> > >
> > > Ralph
> > >
> > > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <
> ralph.goers@dslextreme.com> wrote:
> > > >
> > > > Yes, it rings bells and really should have a slot on our FAQ page
> (although the question really isn’t asked all that frequently).
> > > >
> > > > First it means that you are logging faster than your system can
> handle. The proper response is to figure out why you are logging so much
> and try to reduce it or use an alternative destination that is faster. For
> example, if you are logging to a file you may need to improve the hardware
> or increase the buffer size.
> > > >
> > > > As you observed your buffer has filled up. The default behavior is
> for threads to wait until there is an empty slot.
> > > >
> > > > Remko is the real expert on this but I will take a stab at answering
> your question. The threads below show that they are in
> handleRingBufferFull. That method has 3 options to choose from based on how
> you have configured the asyncQueueFullPolicy. You basically have 3 choices:
> > > > Wait for an open slot (the default)
> > > > Resort to logging synchronously.
> > > > Discard log events.
> > > >
> > > > You should look at
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html
> <
> https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>.
> Note that option 2 above (synchronous) is only used by Log4j in certain
> circumstances and there isn’t an option to specify that, so the only real
> options are the default behavior, discarding, or create a custom
> AsyncQueueFullPolicy.
> > > >
> > > > Ralph
> > > >
> > > >
> > > >
> > > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com>
> wrote:
> > > >>
> > > >> HI,
> > > >>
> > > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
> > > >> Using the following JRE args:
> > > >> -DAsyncLogger.RingBufferSize=32768
> > > >>
> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > > >> -Dlog4j.configurationFile=...
> > > >>
> > > >> The disruptor queue has filled up. And we've observed deadlock with
> > > >> the interesting threads stuck like the following. Disruptor
> processing
> > > >> thread seems to be waiting to process events and all the other
> threads
> > > >> locked up on trying to publish the events.
> > > >>
> > > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon
> prio=5
> > > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
> > > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > > >>  java.lang.Thread.State: RUNNABLE
> > > >>       at
> > > >>
> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > > >>       at
> > > >>
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > > >>       at
> > > >>
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > >>
> > > >>  Locked ownable synchronizers:
> > > >>       - None
> > > >>
> > > >>
> > > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> > > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> > > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native
> Method)
> > > >>       at
> java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > > >> /LockSupport.java:357)
> > > >>       at
> > > >>
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > > >>       at
> > > >>
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > > >>       at
> com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > > >>       at
> com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> (Log4jLogger.java:179)
> > > >>       ...
> > > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > > >>
> > > >>  Locked ownable synchronizers:
> > > >>       - None
> > > >>
> > > >> and all other threads are stuck like this:
> > > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
> > > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > > >> [0x00007f49f6ad2000]
> > > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > > >>       - waiting to lock <0x0000000701017288> (a java.lang.Object)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > > >>       at
> > > >>
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > > >>       at
> > > >>
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > > >>       at org.apache.logging.slf4j.Log4jLogger.info
> (Log4jLogger.java:194)
> > > >> ...
> > > >>
> > > >> Does this ring any bells?
> > > >>
> > > >> Thank you in advance!
> > > >
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > > For additional commands, e-mail: log4j-user-help@logging.apache.org
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > For additional commands, e-mail: log4j-user-help@logging.apache.org
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
>
>

Re: Deadlock observed

Posted by Carter Kozak <ck...@ckozak.net>.
This sounds a lot like an issue I discussed with the lmax-disruptor folks here: https://github.com/LMAX-Exchange/disruptor/issues/307
I ended up switching my services to log synchronously rather than enqueue when the buffer is full, which
can produce events out of order, however we log into a structured pipeline that indexes based on event timestamp. The funny thing is that I haven't seen any occurrences of this behavior since switching the queue-full policy to synchronous rather than enqueue, so I'm inclined to believe there's a disruptor bug somewhere, but I could be wrong.

On Sun, Mar 14, 2021, at 20:50, Leon Finker wrote:
> Yes unfortunately, all logging stopped and nothing was seen in the log
> file for about 2 hrs till service was restarted. But there is nothing
> pointing to a deadlock in the threads dump we git so far :( However,
> the threads dump captured was part of our standard kill -3 if the
> process isn't exiting normally. So it may not show the correct picture
> during shutdown.
> 
> On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <ra...@dslextreme.com> wrote:
> >
> > One other thing. My expectation is that once the buffer is full you should continue to log but at the rate your underlying system can handle. This wouldn’t be a deadlock but if the rate at which you are logging is higher than the system can write the result will be that the application slows down.
> >
> > If you are truly seeing a deadlock then it is possible there is a problem in the disruptor but I have very little experience with that and would need Remko to weigh in.
> >
> > Ralph
> >
> > > On Mar 14, 2021, at 12:26 AM, Ralph Goers <ra...@dslextreme.com> wrote:
> > >
> > > Yes, it rings bells and really should have a slot on our FAQ page (although the question really isn’t asked all that frequently).
> > >
> > > First it means that you are logging faster than your system can handle. The proper response is to figure out why you are logging so much and try to reduce it or use an alternative destination that is faster. For example, if you are logging to a file you may need to improve the hardware or increase the buffer size.
> > >
> > > As you observed your buffer has filled up. The default behavior is for threads to wait until there is an empty slot.
> > >
> > > Remko is the real expert on this but I will take a stab at answering your question. The threads below show that they are in handleRingBufferFull. That method has 3 options to choose from based on how you have configured the asyncQueueFullPolicy. You basically have 3 choices:
> > > Wait for an open slot (the default)
> > > Resort to logging synchronously.
> > > Discard log events.
> > >
> > > You should look at https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html <https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>. Note that option 2 above (synchronous) is only used by Log4j in certain circumstances and there isn’t an option to specify that, so the only real options are the default behavior, discarding, or create a custom AsyncQueueFullPolicy.
> > >
> > > Ralph
> > >
> > >
> > >
> > >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com> wrote:
> > >>
> > >> HI,
> > >>
> > >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
> > >> Using the following JRE args:
> > >> -DAsyncLogger.RingBufferSize=32768
> > >> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> > >> -Dlog4j.configurationFile=...
> > >>
> > >> The disruptor queue has filled up. And we've observed deadlock with
> > >> the interesting threads stuck like the following. Disruptor processing
> > >> thread seems to be waiting to process events and all the other threads
> > >> locked up on trying to publish the events.
> > >>
> > >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon prio=5
> > >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
> > >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> > >>  java.lang.Thread.State: RUNNABLE
> > >>       at
> > >> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> > >>       at
> > >> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> > >>       at
> > >> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > >>
> > >>  Locked ownable synchronizers:
> > >>       - None
> > >>
> > >>
> > >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> > >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> > >>  java.lang.Thread.State: TIMED_WAITING (parking)
> > >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
> > >>       at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> > >> /LockSupport.java:357)
> > >>       at
> > >> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> > >>       at
> > >> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> > >>       at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> > >>       at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > >>       - locked <0x0000000701017288> (a java.lang.Object)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> > >>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
> > >>       ...
> > >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> > >>
> > >>  Locked ownable synchronizers:
> > >>       - None
> > >>
> > >> and all other threads are stuck like this:
> > >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
> > >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> > >> [0x00007f49f6ad2000]
> > >>  java.lang.Thread.State: BLOCKED (on object monitor)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> > >>       - waiting to lock <0x0000000701017288> (a java.lang.Object)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> > >>       at
> > >> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> > >>       at
> > >> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> > >>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
> > >> ...
> > >>
> > >> Does this ring any bells?
> > >>
> > >> Thank you in advance!
> > >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> > For additional commands, e-mail: log4j-user-help@logging.apache.org
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Deadlock observed

Posted by Leon Finker <le...@gmail.com>.
Yes unfortunately, all logging stopped and nothing was seen in the log
file for about 2 hrs till service was restarted. But there is nothing
pointing to a deadlock in the threads dump we git so far :( However,
the threads dump captured was part of our standard kill -3 if the
process isn't exiting normally. So it may not show the correct picture
during shutdown.

On Sun, Mar 14, 2021 at 3:32 AM Ralph Goers <ra...@dslextreme.com> wrote:
>
> One other thing. My expectation is that once the buffer is full you should continue to log but at the rate your underlying system can handle. This wouldn’t be a deadlock but if the rate at which you are logging is higher than the system can write the result will be that the application slows down.
>
> If you are truly seeing a deadlock then it is possible there is a problem in the disruptor but I have very little experience with that and would need Remko to weigh in.
>
> Ralph
>
> > On Mar 14, 2021, at 12:26 AM, Ralph Goers <ra...@dslextreme.com> wrote:
> >
> > Yes, it rings bells and really should have a slot on our FAQ page (although the question really isn’t asked all that frequently).
> >
> > First it means that you are logging faster than your system can handle. The proper response is to figure out why you are logging so much and try to reduce it or use an alternative destination that is faster. For example, if you are logging to a file you may need to improve the hardware or increase the buffer size.
> >
> > As you observed your buffer has filled up. The default behavior is for threads to wait until there is an empty slot.
> >
> > Remko is the real expert on this but I will take a stab at answering your question. The threads below show that they are in handleRingBufferFull. That method has 3 options to choose from based on how you have configured the asyncQueueFullPolicy. You basically have 3 choices:
> > Wait for an open slot (the default)
> > Resort to logging synchronously.
> > Discard log events.
> >
> > You should look at https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html <https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>. Note that option 2 above (synchronous) is only used by Log4j in certain circumstances and there isn’t an option to specify that, so the only real options are the default behavior, discarding, or create a custom AsyncQueueFullPolicy.
> >
> > Ralph
> >
> >
> >
> >> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com> wrote:
> >>
> >> HI,
> >>
> >> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
> >> Using the following JRE args:
> >> -DAsyncLogger.RingBufferSize=32768
> >> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> >> -Dlog4j.configurationFile=...
> >>
> >> The disruptor queue has filled up. And we've observed deadlock with
> >> the interesting threads stuck like the following. Disruptor processing
> >> thread seems to be waiting to process events and all the other threads
> >> locked up on trying to publish the events.
> >>
> >> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon prio=5
> >> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
> >> nid=0x611a runnable  [0x00007f4bfdcfc000]
> >>  java.lang.Thread.State: RUNNABLE
> >>       at
> >> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
> >>       at
> >> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
> >>       at
> >> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
> >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> >>
> >>  Locked ownable synchronizers:
> >>       - None
> >>
> >>
> >> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> >> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
> >>  java.lang.Thread.State: TIMED_WAITING (parking)
> >>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
> >>       at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> >> /LockSupport.java:357)
> >>       at
> >> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
> >>       at
> >> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
> >>       at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
> >>       at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> >>       - locked <0x0000000701017288> (a java.lang.Object)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
> >>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
> >>       ...
> >>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> >>
> >>  Locked ownable synchronizers:
> >>       - None
> >>
> >> and all other threads are stuck like this:
> >> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
> >> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> >> [0x00007f49f6ad2000]
> >>  java.lang.Thread.State: BLOCKED (on object monitor)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
> >>       - waiting to lock <0x0000000701017288> (a java.lang.Object)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
> >>       at
> >> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
> >>       at
> >> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
> >>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
> >> ...
> >>
> >> Does this ring any bells?
> >>
> >> Thank you in advance!
> >
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
> For additional commands, e-mail: log4j-user-help@logging.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Deadlock observed

Posted by Ralph Goers <ra...@dslextreme.com>.
One other thing. My expectation is that once the buffer is full you should continue to log but at the rate your underlying system can handle. This wouldn’t be a deadlock but if the rate at which you are logging is higher than the system can write the result will be that the application slows down.

If you are truly seeing a deadlock then it is possible there is a problem in the disruptor but I have very little experience with that and would need Remko to weigh in.

Ralph

> On Mar 14, 2021, at 12:26 AM, Ralph Goers <ra...@dslextreme.com> wrote:
> 
> Yes, it rings bells and really should have a slot on our FAQ page (although the question really isn’t asked all that frequently).
> 
> First it means that you are logging faster than your system can handle. The proper response is to figure out why you are logging so much and try to reduce it or use an alternative destination that is faster. For example, if you are logging to a file you may need to improve the hardware or increase the buffer size. 
> 
> As you observed your buffer has filled up. The default behavior is for threads to wait until there is an empty slot. 
> 
> Remko is the real expert on this but I will take a stab at answering your question. The threads below show that they are in handleRingBufferFull. That method has 3 options to choose from based on how you have configured the asyncQueueFullPolicy. You basically have 3 choices:
> Wait for an open slot (the default)
> Resort to logging synchronously.
> Discard log events.
> 
> You should look at https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html <https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>. Note that option 2 above (synchronous) is only used by Log4j in certain circumstances and there isn’t an option to specify that, so the only real options are the default behavior, discarding, or create a custom AsyncQueueFullPolicy.
> 
> Ralph
> 
> 
> 
>> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com> wrote:
>> 
>> HI,
>> 
>> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
>> Using the following JRE args:
>> -DAsyncLogger.RingBufferSize=32768
>> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
>> -Dlog4j.configurationFile=...
>> 
>> The disruptor queue has filled up. And we've observed deadlock with
>> the interesting threads stuck like the following. Disruptor processing
>> thread seems to be waiting to process events and all the other threads
>> locked up on trying to publish the events.
>> 
>> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon prio=5
>> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
>> nid=0x611a runnable  [0x00007f4bfdcfc000]
>>  java.lang.Thread.State: RUNNABLE
>>       at
>> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
>>       at
>> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
>>       at
>> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
>>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
>> 
>>  Locked ownable synchronizers:
>>       - None
>> 
>> 
>> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
>> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
>>  java.lang.Thread.State: TIMED_WAITING (parking)
>>       at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
>>       at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
>> /LockSupport.java:357)
>>       at
>> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
>>       at
>> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
>>       at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
>>       at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
>>       - locked <0x0000000701017288> (a java.lang.Object)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
>>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
>>       ...
>>       at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
>> 
>>  Locked ownable synchronizers:
>>       - None
>> 
>> and all other threads are stuck like this:
>> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
>> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
>> [0x00007f49f6ad2000]
>>  java.lang.Thread.State: BLOCKED (on object monitor)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
>>       - waiting to lock <0x0000000701017288> (a java.lang.Object)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
>>       at
>> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
>>       at
>> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
>>       at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
>> ...
>> 
>> Does this ring any bells?
>> 
>> Thank you in advance!
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: log4j-user-unsubscribe@logging.apache.org
For additional commands, e-mail: log4j-user-help@logging.apache.org


Re: Deadlock observed

Posted by Ralph Goers <ra...@dslextreme.com>.
Yes, it rings bells and really should have a slot on our FAQ page (although the question really isn’t asked all that frequently).

First it means that you are logging faster than your system can handle. The proper response is to figure out why you are logging so much and try to reduce it or use an alternative destination that is faster. For example, if you are logging to a file you may need to improve the hardware or increase the buffer size. 

As you observed your buffer has filled up. The default behavior is for threads to wait until there is an empty slot. 

Remko is the real expert on this but I will take a stab at answering your question. The threads below show that they are in handleRingBufferFull. That method has 3 options to choose from based on how you have configured the asyncQueueFullPolicy. You basically have 3 choices:
Wait for an open slot (the default)
Resort to logging synchronously.
Discard log events.

You should look at https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html <https://logging.apache.org/log4j/2.x/log4j-core/apidocs/org/apache/logging/log4j/core/async/AsyncQueueFullPolicy.html>. Note that option 2 above (synchronous) is only used by Log4j in certain circumstances and there isn’t an option to specify that, so the only real options are the default behavior, discarding, or create a custom AsyncQueueFullPolicy.

Ralph



> On Mar 13, 2021, at 10:32 PM, Leon Finker <le...@gmail.com> wrote:
> 
> HI,
> 
> Using log4j2 1.13.1 with disruptor 3.4.2. Linux CentOS. JRE 11.0.5
> Using the following JRE args:
> -DAsyncLogger.RingBufferSize=32768
> -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
> -Dlog4j.configurationFile=...
> 
> The disruptor queue has filled up. And we've observed deadlock with
> the interesting threads stuck like the following. Disruptor processing
> thread seems to be waiting to process events and all the other threads
> locked up on trying to publish the events.
> 
> "Log4j2-TF-1-AsyncLogger[AsyncContext@5cb0d902]-1" #23 daemon prio=5
> os_prio=0 cpu=40099128.73ms elapsed=44102.95s tid=0x00007f4c2ca34000
> nid=0x611a runnable  [0x00007f4bfdcfc000]
>   java.lang.Thread.State: RUNNABLE
>        at
> com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
>        at
> com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
>        at
> com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
>        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> 
>   Locked ownable synchronizers:
>        - None
> 
> 
> "xxx" #268 prio=5 os_prio=0 cpu=7051742.32ms elapsed=43967.04s
> tid=0x00007f4c2e6ea800 nid=0x649d runnable  [0x00007f49f77dd000]
>   java.lang.Thread.State: TIMED_WAITING (parking)
>        at jdk.internal.misc.Unsafe.park(java.base@11.0.5/Native Method)
>        at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.5
> /LockSupport.java:357)
>        at
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
>        at
> com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
>        at com.lmax.disruptor.RingBuffer.publishEvent(RingBuffer.java:465)
>        at com.lmax.disruptor.dsl.Disruptor.publishEvent(Disruptor.java:331)
>        at
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
>        - locked <0x0000000701017288> (a java.lang.Object)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2016)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1875)
>        at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:179)
>        ...
>        at java.lang.Thread.run(java.base@11.0.5/Thread.java:834)
> 
>   Locked ownable synchronizers:
>        - None
> 
> and all other threads are stuck like this:
> "xx-nioW-16" #273 prio=5 os_prio=0 cpu=7439.15ms elapsed=43966.67s
> tid=0x00007f4c2e4c7000 nid=0x64a9 waiting for monitor entry
> [0x00007f49f6ad2000]
>   java.lang.Thread.State: BLOCKED (on object monitor)
>        at
> org.apache.logging.log4j.core.async.AsyncLoggerDisruptor.enqueueLogMessageWhenQueueFull(AsyncLoggerDisruptor.java:230)
>        - waiting to lock <0x0000000701017288> (a java.lang.Object)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.handleRingBufferFull(AsyncLogger.java:246)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.publish(AsyncLogger.java:230)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.logWithThreadLocalTranslator(AsyncLogger.java:225)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.access$000(AsyncLogger.java:67)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger$1.log(AsyncLogger.java:152)
>        at
> org.apache.logging.log4j.core.async.AsyncLogger.log(AsyncLogger.java:136)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.tryLogMessage(AbstractLogger.java:2198)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logMessageTrackRecursion(AbstractLogger.java:2152)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2135)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2022)
>        at
> org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1891)
>        at org.apache.logging.slf4j.Log4jLogger.info(Log4jLogger.java:194)
> ...
> 
> Does this ring any bells?
> 
> Thank you in advance!