You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Mikhail Khludnev (JIRA)" <ji...@apache.org> on 2018/02/24 21:08:00 UTC

[jira] [Comment Edited] (SOLR-12027) ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.search.join.BlockJoinFacetDistribTest

    [ https://issues.apache.org/jira/browse/SOLR-12027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16375772#comment-16375772 ] 

Mikhail Khludnev edited comment on SOLR-12027 at 2/24/18 9:07 PM:
------------------------------------------------------------------

The problem looks like is caused by [ReservedThreadExecutor|https://github.com/eclipse/jetty.project/blob/e87775eb0d5ed0723a633ceda62deccde5143cf3/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L139] which has idle time for inactive thread 1 minute by default and I've found it's hard to adjust.
It doesn't sound weird but, that what you might see during test finishing: 
{code}
2018-02-24 10:34:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode):

"Attach Listener" #279 daemon prio=9 os_prio=31 tid=0x00007fa86d8ea000 nid=0x6407 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"qtp860938026-32" #32 prio=5 os_prio=31 tid=0x00007fa86d465800 nid=0x8003 waiting on condition [0x000070000290e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007be6fd818> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:308)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:373)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
        at java.lang.Thread.run(Thread.java:745)

"SUITE-BlockJoinFacetDistribTest-seed#[A8545A49DBAAAC5B]" #12 prio=5 os_prio=31 tid=0x00007fa86a9da800 nid=0x5c0f waiting on condition [0x00007000016d8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.checkThreadLeaks(ThreadLeakControl.java:582)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:439)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:705)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:139)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:626)

"JUnit4-serializer-daemon" #10 daemon prio=5 os_prio=31 tid=0x00007fa86a9c1800 nid=0x5603 waiting on condition [0x00007000015d5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at com.carrotsearch.ant.tasks.junit4.events.Serializer$1.run(Serializer.java:50)
...
{code}  
The only explanation is that {{interrupt()}} can not unpark that Jetty thread. A guy in a pub told me that [it might happen|https://bugs.openjdk.java.net/browse/JDK-8074773].
So, if I set {{@ThreadLeakLingering(linger = 80000)}} it keeps {{beast}} happy thousand times on this test.

My proposal is to set *lingering time above 1 min* for all Solr tests and for Lucene ones which hit the same issue. Opinions?     


was (Author: mkhludnev):
The problem looks like is caused by [ReservedThreadExecutor|https://github.com/eclipse/jetty.project/blob/e87775eb0d5ed0723a633ceda62deccde5143cf3/jetty-util/src/main/java/org/eclipse/jetty/util/thread/QueuedThreadPool.java#L139] which has idle time for inactive thread 1 minute by default and I've found it's hard to adjust.
It doesn't sound weird but, that what you might see during test finishing: 
{code}
2018-02-24 10:34:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.111-b14 mixed mode):

"Attach Listener" #279 daemon prio=9 os_prio=31 tid=0x00007fa86d8ea000 nid=0x6407 waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"qtp860938026-32" #32 prio=5 os_prio=31 tid=0x00007fa86d465800 nid=0x8003 waiting on condition [0x000070000290e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000007be6fd818> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:308)
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:373)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
        at java.lang.Thread.run(Thread.java:745)

"SUITE-BlockJoinFacetDistribTest-seed#[A8545A49DBAAAC5B]" #12 prio=5 os_prio=31 tid=0x00007fa86a9da800 nid=0x5c0f waiting on condition [0x00007000016d8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl.checkThreadLeaks(ThreadLeakControl.java:582)
        at com.carrotsearch.randomizedtesting.ThreadLeakControl$2.evaluate(ThreadLeakControl.java:439)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:705)
        at com.carrotsearch.randomizedtesting.RandomizedRunner.access$200(RandomizedRunner.java:139)
        at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:626)

"JUnit4-serializer-daemon" #10 daemon prio=5 os_prio=31 tid=0x00007fa86a9c1800 nid=0x5603 waiting on condition [0x00007000015d5000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
        at java.lang.Thread.sleep(Native Method)
        at com.carrotsearch.ant.tasks.junit4.events.Serializer$1.run(Serializer.java:50)
...
{code}  
The only explanation is that {{interrupt()}} can not unpark that Jetty thread. A guy in a pub told me that [it might happen|https://bugs.openjdk.java.net/browse/JDK-8074773].
So, if I set {{@ThreadLeakLingering(linger = 80000)}} it keeps {{beast}} happy thousand times on this test.

My proposal set *lingering time above 1 min* for all Solr tests and for Lucene ones which hit the same issue. Opinions?     

> ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.search.join.BlockJoinFacetDistribTest
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-12027
>                 URL: https://issues.apache.org/jira/browse/SOLR-12027
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Tests
>            Reporter: Mikhail Khludnev
>            Priority: Major
>
> I tried to look into the sub. The symptoms looks like. 
> {code}
> WARN  (jetty-closer-2-thread-2) [    ] o.e.j.u.t.QueuedThreadPool QueuedThreadPool@qtp860938026{STOPPING,8<=9<=10000,i=0,q=1} Couldn't stop Thread[qtp860938
> {code}
> The thread successfully handled one request before. Then we have:
> {code}
>  2> Feb 23, 2018 11:20:41 PM com.carrotsearch.randomizedtesting.ThreadLeakControl checkThreadLeaks
>   2> SEVERE: 1 thread leaked from SUITE scope at org.apache.solr.search.join.BlockJoinFacetDistribTest: 
>   2>    1) Thread[id=76, name=qtp860938026-76, state=TIMED_WAITING, group=TGRP-BlockJoinFacetDistribTest]
>   2>         at sun.misc.Unsafe.park(Native Method)
>   2>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   2>         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
>   2>         at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:308)
>   2>         at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:373)
>   2>         at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:708)
>   2>         at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:626)
>   2>         at java.lang.Thread.run(Thread.java:745)
> {code}
> and then
> {code}
>   2> SEVERE: There are still zombie threads that couldn't be terminated:
>   2>    1) Thread[id=76, name=qtp860938026-76, state=TIMED_WAITING, group=TGRP-BlockJoinFacetDistribTest]
>   2>         at sun.misc.Unsafe.park(Native Method)
>   2>         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> ...
> [23:19:41.186] ERROR   0.00s | BlockJoinFacetDistribTest (suite) <<<
>    > Throwable #1: com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.search.join.BlockJoinFacetDistribTest: 
>    >    1) Thread[id=76, name=qtp860938026-76, state=TIMED_WAITING, group=TGRP-BlockJoinFacetDistribTest]
>    >         at sun.misc.Unsafe.park(Native Method)
>    >         at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>    >         at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2163)
>    >         at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:308)
> {code} 
> This also happen to other tests as well, not deterministic, but more or less is reproduced with {{ant beast}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org