You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2016/02/13 01:48:18 UTC

[jira] [Created] (TEZ-3117) Deadlock in Edge and Vertex code

Bikas Saha created TEZ-3117:
-------------------------------

             Summary: Deadlock in Edge and Vertex code
                 Key: TEZ-3117
                 URL: https://issues.apache.org/jira/browse/TEZ-3117
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Yesha Vora
            Assignee: Bikas Saha


{code}
Java-level deadlocks detected
 
This means that some threads are blocked waiting to enter a synchronization block or
waiting to reenter a synchronization block after an Object.wait() call, where each thread
owns one monitor while trying to obtain another monitor already held by another thread.
 
Deadlock:


App Shared Pool - #1 is waiting to lock java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is held by Dispatcher thread {Central}
Dispatcher thread {Central} is waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool - #1


 
Deadlock:


Dispatcher thread {Central} is waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool - #1
App Shared Pool - #1 is waiting to lock java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is held by Dispatcher thread {Central}




Thread stacks


App Shared Pool - #1 [WAITING]
 sun.misc.Unsafe.park(native method)
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
 java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
 java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
 org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098)
 org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99)
 org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214)
 org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447)
 org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453)
 org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496)
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216)
 org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275)
 org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196)
 org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146)
 org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187)
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578)
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
 java.security.AccessController.doPrivileged(native method)
 javax.security.auth.Subject.doAs(Subject.java:422)
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
 org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
 java.util.concurrent.FutureTask.run(FutureTask.java:266)
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 java.lang.Thread.<null>(unknown source)


Dispatcher thread {Central} [BLOCKED; waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db]
 org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241)
 org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886)
 org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020)
 org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055)
 org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204)
 org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007)
 org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996)
 org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
 org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
 org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
 org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
 org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
 org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799)
 org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214)
 org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200)
 org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
 org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
 java.lang.Thread.<null>(unknown source)


Frozen threads found (potential deadlock)
 
It seems that the following threads have not changed their stack for more than 10 seconds.
These threads are possibly (but not necessarily!) in a deadlock or hung.
 
client DomainSocketWatcher <--- Frozen for at least 20m 33 sec
org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native)
org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52
org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() DomainSocketWatcher.java:511
java.lang.Thread.run() Thread.java:745




{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)