You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by "Terence Yim (JIRA)" <ji...@apache.org> on 2014/12/01 22:13:12 UTC

[jira] [Commented] (TWILL-110) Deadlock when shutting down runnable container

    [ https://issues.apache.org/jira/browse/TWILL-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14230463#comment-14230463 ] 

Terence Yim commented on TWILL-110:
-----------------------------------

This bug will surface when running with OpenJDK or JDK7, but not with Oracle JDK6. It is caused by the differences in the {{ThreadPoolExecutor}} implementation between Oracle JDK6 and OpenJDK/JDK7.

In short, the change involves changing the {{shutdown()}} method from an atomic operation (lock->check state->terminate->unlock), into two parts (check state->lock->terminate->unlock), which makes the {{terminate}} method might get called concurrently (as seen in the jstack above), causing deadlock.

> Deadlock when shutting down runnable container
> ----------------------------------------------
>
>                 Key: TWILL-110
>                 URL: https://issues.apache.org/jira/browse/TWILL-110
>             Project: Apache Twill
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.4.0-incubating
>            Reporter: Terence Yim
>            Assignee: Terence Yim
>            Priority: Blocker
>
> Deadlock was observed when a TwillRunnable container was shutting down, causing the process hanging.
> {noformat}
> Found one Java-level deadlock:
> =============================
> "Thread-5":
>   waiting for ownable synchronizer 0x00000000eb1a6ec0, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
>   which is held by "zk-client-EventThread"
> "zk-client-EventThread":
>   waiting for ownable synchronizer 0x00000000eb18c698, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
>   which is held by "Thread-5"
> Java stack information for the threads listed above:
> ===================================================
> "Thread-5":
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00000000eb1a6ec0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at java.util.concurrent.ThreadPoolExecutor.tryTerminate(ThreadPoolExecutor.java:700)
> at java.util.concurrent.ThreadPoolExecutor.shutdown(ThreadPoolExecutor.java:1397)
> at org.apache.twill.internal.zookeeper.DefaultZKClientService$ServiceDelegate.doStop(DefaultZKClientService.java:402)
> at com.google.common.util.concurrent.AbstractService.stop(AbstractService.java:198)
> at org.apache.twill.internal.zookeeper.DefaultZKClientService.stop(DefaultZKClientService.java:310)
> at org.apache.twill.zookeeper.ForwardingZKClientService.stop(ForwardingZKClientService.java:66)
> at org.apache.twill.common.Services$2.run(Services.java:131)
> at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:262)
> at com.google.common.util.concurrent.ExecutionList$RunnableExecutorPair.execute(ExecutionList.java:149)
> at com.google.common.util.concurrent.ExecutionList.add(ExecutionList.java:105)
> at com.google.common.util.concurrent.AbstractFuture.addListener(AbstractFuture.java:160)
> at org.apache.twill.common.Services.doChain(Services.java:106)
> at org.apache.twill.common.Services.chainStop(Services.java:61)
> at org.apache.twill.internal.ServiceMain$1.run(ServiceMain.java:69)
> "zk-client-EventThread":
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x00000000eb18c698> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at com.google.common.util.concurrent.AbstractService.notifyStopped(AbstractService.java:267)
> at org.apache.twill.internal.zookeeper.DefaultZKClientService$ServiceDelegate.access$600(DefaultZKClientService.java:372)
> at org.apache.twill.internal.zookeeper.DefaultZKClientService$ServiceDelegate$1.terminated(DefaultZKClientService.java:382)
> at java.util.concurrent.ThreadPoolExecutor.tryTerminate(ThreadPoolExecutor.java:704)
> at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1006)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1163)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> Found 1 deadlock.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)