You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jeff Zhang (JIRA)" <ji...@apache.org> on 2015/04/23 14:26:39 UTC

[jira] [Updated] (TEZ-2359) Deadlock in DAGAppMaster

     [ https://issues.apache.org/jira/browse/TEZ-2359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Zhang updated TEZ-2359:
----------------------------
    Priority: Critical  (was: Major)

> Deadlock in DAGAppMaster
> ------------------------
>
>                 Key: TEZ-2359
>                 URL: https://issues.apache.org/jira/browse/TEZ-2359
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Priority: Critical
>
> {code}
> Found one Java-level deadlock:
> =============================
> "Timer-1":
>   waiting for ownable synchronizer 0x00000007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> "Dispatcher thread: Central":
>   waiting to lock monitor 0x00007fb829866d18 (object 0x00000007cd5ab958, a org.apache.tez.dag.app.rm.YarnTaskSchedulerService),
>   which is held by "DelayedContainerManager"
> "DelayedContainerManager":
>   waiting for ownable synchronizer 0x00000007cd0f8a30, (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync),
>   which is held by "Dispatcher thread: Central"
> Java stack information for the threads listed above:
> ===================================================
> "Timer-1":
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007cd0f8a30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
> 	at org.apache.tez.dag.app.DAGAppMaster.checkAndHandleSessionTimeout(DAGAppMaster.java:2015)
> 	- locked <0x00000007cd0f2ff0> (a org.apache.tez.dag.app.DAGAppMaster)
> 	at org.apache.tez.dag.app.DAGAppMaster$3.run(DAGAppMaster.java:1825)
> 	at java.util.TimerThread.mainLoop(Timer.java:555)
> 	at java.util.TimerThread.run(Timer.java:505)
> "Dispatcher thread: Central":
> 	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.dagComplete(YarnTaskSchedulerService.java:842)
> 	- waiting to lock <0x00000007cd5ab958> (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
> 	at org.apache.tez.dag.app.rm.TaskSchedulerEventHandler.dagCompleted(TaskSchedulerEventHandler.java:566)
> 	at org.apache.tez.dag.app.DAGAppMaster.checkForCompletion(DAGAppMaster.java:832)
> 	at org.apache.tez.dag.app.DAGAppMaster.access$4800(DAGAppMaster.java:201)
> 	at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2362)
> 	at org.apache.tez.dag.app.DAGAppMaster$DAGFinishedTransition.transition(DAGAppMaster.java:2356)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
> 	at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
> 	- locked <0x00000007cd1d0208> (a org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine)
> 	at org.apache.tez.dag.app.DAGAppMaster.handle(DAGAppMaster.java:510)
> 	at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:879)
> 	at org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterEventHandler.handle(DAGAppMaster.java:868)
> 	at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:182)
> 	at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:113)
> 	at java.lang.Thread.run(Thread.java:745)
> "DelayedContainerManager":
> 	at sun.misc.Unsafe.park(Native Method)
> 	- parking to wait for  <0x00000007cd0f8a30> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
> 	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
> 	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
> 	at org.apache.tez.dag.app.DAGAppMaster.getState(DAGAppMaster.java:531)
> 	at org.apache.tez.dag.app.DAGAppMaster$RunningAppContext.getAMState(DAGAppMaster.java:1522)
> 	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.assignDelayedContainer(YarnTaskSchedulerService.java:585)
> 	- locked <0x00000007cd5ab958> (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
> 	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService.access$600(YarnTaskSchedulerService.java:82)
> 	at org.apache.tez.dag.app.rm.YarnTaskSchedulerService$DelayedContainerManager.run(YarnTaskSchedulerService.java:1877)
> 	- locked <0x00000007cd5ab958> (a org.apache.tez.dag.app.rm.YarnTaskSchedulerService)
> Found 1 deadlock.
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)