You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tomcat.apache.org by bu...@apache.org on 2019/12/17 07:16:44 UTC

[Bug 64007] New: a deadlock with Poller run and cancelledKey

https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

            Bug ID: 64007
           Summary: a deadlock with Poller run and cancelledKey
           Product: Tomcat 9
           Version: 9.0.26
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Catalina
          Assignee: dev@tomcat.apache.org
          Reporter: langren_qing@163.com
  Target Milestone: -----

when i press test my application with spring boot 2.1.9.RELEASE and tomcat
9.0.26 i meet a deadlock. It's maybe a bug with tomcat. when i use tomcat
8.5.49 instead the bug is not appear. The follow is the stack i dump from my
application.

Found one Java-level deadlock:
=============================
"http-nio-8080-exec-455":
  waiting to lock monitor 0x00007f59c40475e8 (object 0x00000006c0b4fe28, a
java.util.HashSet),
  which is held by "http-nio-8080-ClientPoller"
"http-nio-8080-ClientPoller":
  waiting to lock monitor 0x00007f59680af4d8 (object 0x00000006de810f10, a
java.lang.Object),
  which is held by "http-nio-8080-exec-215"
"http-nio-8080-exec-215":
  waiting to lock monitor 0x00007f59c4057a98 (object 0x00000006c0b27028, a
java.util.HashSet),
  which is held by "http-nio-8080-BlockPoller"
"http-nio-8080-BlockPoller":
  waiting to lock monitor 0x00007f5998946158 (object 0x00000006de8588c0, a
java.lang.Object),
  which is held by "http-nio-8080-exec-378"
"http-nio-8080-exec-378":
  waiting to lock monitor 0x00007f59c40475e8 (object 0x00000006c0b4fe28, a
java.util.HashSet),
  which is held by "http-nio-8080-ClientPoller"

Java stack information for the threads listed above:
===================================================
"http-nio-8080-exec-455":
    at java.nio.channels.spi.AbstractSelector.cancel(AbstractSelector.java:91)
    - waiting to lock <0x00000006c0b4fe28> (a java.util.HashSet)
    at
java.nio.channels.spi.AbstractSelectionKey.cancel(AbstractSelectionKey.java:73)
    - locked <0x00000006de870a68> (a sun.nio.ch.SelectionKeyImpl)
    at
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:240)
    - locked <0x00000006de870a90> (a java.lang.Object)
    at
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
    - locked <0x00000006de870a28> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioChannel.close(NioChannel.java:104)
    at org.apache.tomcat.util.net.NioChannel.close(NioChannel.java:116)
    at
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doClose(NioEndpoint.java:1179)
    - locked <0x00000006c3ca9be8> (a org.apache.tomcat.util.net.NioChannel)
    at
org.apache.tomcat.util.net.SocketWrapperBase.close(SocketWrapperBase.java:394)
    at
org.apache.tomcat.util.net.NioEndpoint$Poller.cancelledKey(NioEndpoint.java:667)
    at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1592)
    at
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
    - locked <0x00000006de825cc0> (a
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)
"http-nio-8080-ClientPoller":
    at
java.nio.channels.spi.AbstractSelectableChannel.removeKey(AbstractSelectableChannel.java:130)
    - waiting to lock <0x00000006de810f10> (a java.lang.Object)
    at
java.nio.channels.spi.AbstractSelector.deregister(AbstractSelector.java:188)
    at sun.nio.ch.EPollSelectorImpl.implDereg(EPollSelectorImpl.java:191)
    at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:149)
    - locked <0x00000006c0b4fe28> (a java.util.HashSet)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:97)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000006c0b50068> (a sun.nio.ch.Util$3)
    - locked <0x00000006c0b50058> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000006c0b4fdd0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:708)
    at java.lang.Thread.run(Thread.java:748)
"http-nio-8080-exec-215":
    at java.nio.channels.spi.AbstractSelector.cancel(AbstractSelector.java:91)
    - waiting to lock <0x00000006c0b27028> (a java.util.HashSet)
    at
java.nio.channels.spi.AbstractSelectionKey.cancel(AbstractSelectionKey.java:73)
    - locked <0x00000006de810c70> (a sun.nio.ch.SelectionKeyImpl)
    at
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:240)
    - locked <0x00000006de810f10> (a java.lang.Object)
    at
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
    - locked <0x00000006de810c30> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioChannel.close(NioChannel.java:104)
    at org.apache.tomcat.util.net.NioChannel.close(NioChannel.java:116)
    at
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doClose(NioEndpoint.java:1179)
    - locked <0x00000006c2d7b8a0> (a org.apache.tomcat.util.net.NioChannel)
    at
org.apache.tomcat.util.net.SocketWrapperBase.close(SocketWrapperBase.java:394)
    at
org.apache.tomcat.util.net.NioEndpoint$Poller.cancelledKey(NioEndpoint.java:667)
    at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1592)
    at
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
    - locked <0x00000006de810c98> (a
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)
"http-nio-8080-BlockPoller":
    at
java.nio.channels.spi.AbstractSelectableChannel.removeKey(AbstractSelectableChannel.java:130)
    - waiting to lock <0x00000006de8588c0> (a java.lang.Object)
    at
java.nio.channels.spi.AbstractSelector.deregister(AbstractSelector.java:188)
    at sun.nio.ch.EPollSelectorImpl.implDereg(EPollSelectorImpl.java:191)
    at sun.nio.ch.SelectorImpl.processDeregisterQueue(SelectorImpl.java:149)
    - locked <0x00000006c0b27028> (a java.util.HashSet)
    at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:90)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000006c0b27108> (a sun.nio.ch.Util$3)
    - locked <0x00000006c0b270f8> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000006c0b26fc0> (a sun.nio.ch.EPollSelectorImpl)
    at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:105)
    at
org.apache.tomcat.util.net.NioBlockingSelector$BlockPoller.run(NioBlockingSelector.java:310)
"http-nio-8080-exec-378":
    at java.nio.channels.spi.AbstractSelector.cancel(AbstractSelector.java:91)
    - waiting to lock <0x00000006c0b4fe28> (a java.util.HashSet)
    at
java.nio.channels.spi.AbstractSelectionKey.cancel(AbstractSelectionKey.java:73)
    - locked <0x00000006de858648> (a sun.nio.ch.SelectionKeyImpl)
    at
java.nio.channels.spi.AbstractSelectableChannel.implCloseChannel(AbstractSelectableChannel.java:240)
    - locked <0x00000006de8588c0> (a java.lang.Object)
    at
java.nio.channels.spi.AbstractInterruptibleChannel.close(AbstractInterruptibleChannel.java:115)
    - locked <0x00000006de8585e0> (a java.lang.Object)
    at org.apache.tomcat.util.net.NioChannel.close(NioChannel.java:104)
    at org.apache.tomcat.util.net.NioChannel.close(NioChannel.java:116)
    at
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper.doClose(NioEndpoint.java:1179)
    - locked <0x00000006c3360628> (a org.apache.tomcat.util.net.NioChannel)
    at
org.apache.tomcat.util.net.SocketWrapperBase.close(SocketWrapperBase.java:394)
    at
org.apache.tomcat.util.net.NioEndpoint$Poller.cancelledKey(NioEndpoint.java:667)
    at
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1592)
    at
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
    - locked <0x00000006de858670> (a
org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
    at java.lang.Thread.run(Thread.java:748)

Found 1 deadlock.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #3 from Remy Maucherat <re...@apache.org> ---
Ok, so then cancelledKey should cancel the key first, and this would resolve
the "problem" as the channel close locks everything before cancelling that key
itself. This is what was done like that in 8.5 but this didn't look important.

Can you test it or give me your opinion on it ?
https://github.com/apache/tomcat/commit/9b1a8b67bffe462fc745b19e15ed59c37e2e1dcf

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #5 from Remy Maucherat <re...@apache.org> ---
Ok. If you run again into a deadlock, please add the updated deadlock traces as
an attachment to the BZ (otherwise it's harder to read).

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

Remy Maucherat <re...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #1 from Remy Maucherat <re...@apache.org> ---
NIO deadlocks internally in your trace while trying to close multiple
independent channels.

Some of the structures that might help the deadlock, in particular the
BlockPoller, are no longer used in trunk.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

Remy Maucherat <re...@apache.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEEDINFO                    |RESOLVED

--- Comment #10 from Remy Maucherat <re...@apache.org> ---
Ok, so let's say the regression is fixed. The fix will be in Tomcat 9.0.31+,
thanks for the help resolving the issue.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #4 from echo <la...@163.com> ---
I will test first. If there is progress i will recommend agin

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #6 from Remy Maucherat <re...@apache.org> ---
Looking at the most recent JDK sources, the root issue likely has been fixed by
these commits which remove all the unwanted internal NIO syncing:
https://github.com/openjdk/jdk/commit/33b921f25d8b5e6a50003daa6db31176437d29c0
https://github.com/openjdk/jdk/commit/38bdacafbcf3c96ac40e30098c0990f88a9edf8e

I'm ok with having a workaround fix, as it seems the first commit for this NIO
bug will be in 11, and the second one in 14.
Not 100% sure if the first one alone is enough to avoid the deadlock, but it
might well be. In the trace, if http-nio-8080-exec-215 does not hold the lock
in the AbstractSelectableChannel, then http-nio-8080-ClientPoller will be able
to get the sync on it.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #9 from echo <la...@163.com> ---
with your fixed code i package a new jar and then replace tomcat-embed-core.jar
in 9.0.26.and then i pressure my application 3 hours and all is fine. But with
the tomcat 9.0.26 press 1 hour the deadlock appear.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #2 from echo <la...@163.com> ---
but, in my opinion. just the NioEndpoint$Poller.cancelledKey and
NioEndpoint$Poller.run may cause the deadlock.  Poller.run finally call
AbstractSelector.cancel and Poller.cancelledKey finally call
AbstractSelector.deregister. The two method all need the conditions keyLock and
cancelledKeys.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #7 from echo <la...@163.com> ---
yes. thank you for your patient answer.I don't have a correct way to test. Just
pressure test and it's hard to reproduce the bug fixed or not.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org


[Bug 64007] a deadlock with Poller run and cancelledKey

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=64007

--- Comment #8 from Remy Maucherat <re...@apache.org> ---
The issue would have been introduced in 9.0.20, and it's not surprising it's
hard to reproduce.

It's interesting that with the new Java dev model they actually do this sort of
refactorings, cleanups and improvements on NIO, which is a very old piece of
code.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org