You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Bikas Saha (JIRA)" <ji...@apache.org> on 2016/02/13 02:17:18 UTC

[jira] [Commented] (TEZ-3117) Deadlock in Edge and Vertex code

    [ https://issues.apache.org/jira/browse/TEZ-3117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15145668#comment-15145668 ] 

Bikas Saha commented on TEZ-3117:
---------------------------------

[~hitesh] please review. Upcalling into vertex under lock is going to acquire the lock in the wrong direction. This is the only case. So this fix should work. Other than this the Edge does not make upcalls into Vertex.

The larger existing work item is to make vertex callbacks from vertex managers not happen synchronously.

> Deadlock in Edge and Vertex code
> --------------------------------
>
>                 Key: TEZ-3117
>                 URL: https://issues.apache.org/jira/browse/TEZ-3117
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Bikas Saha
>             Fix For: 0.7.1, 0.8.3
>
>         Attachments: TEZ-3117.1.patch
>
>
> {code}
> Java-level deadlocks detected
>  
> This means that some threads are blocked waiting to enter a synchronization block or
> waiting to reenter a synchronization block after an Object.wait() call, where each thread
> owns one monitor while trying to obtain another monitor already held by another thread.
>  
> Deadlock:
> App Shared Pool - #1 is waiting to lock java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is held by Dispatcher thread {Central}
> Dispatcher thread {Central} is waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool - #1
>  
> Deadlock:
> Dispatcher thread {Central} is waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db which is held by App Shared Pool - #1
> App Shared Pool - #1 is waiting to lock java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@18a7c819 which is held by Dispatcher thread {Central}
> Thread stacks
> App Shared Pool - #1 [WAITING]
>  sun.misc.Unsafe.park(native method)
>  java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>  java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
>  java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
>  java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.getTotalTasks(VertexImpl.java:1098)
>  org.apache.tez.dag.app.dag.impl.Edge$EdgeManagerPluginContextImpl.getDestinationVertexNumTasks(Edge.java:99)
>  org.apache.tez.dag.app.dag.impl.Edge.routingToBegin(Edge.java:214)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.setupEdgeRouting(VertexImpl.java:1447)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.unsetTasksNotYetScheduled(VertexImpl.java:1453)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.scheduleTasks(VertexImpl.java:1496)
>  org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerPluginContextImpl.scheduleTasks(VertexManager.java:216)
>  org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.handleSourceTaskFinished(InputReadyVertexManager.java:275)
>  org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onSourceTaskCompleted(InputReadyVertexManager.java:196)
>  org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.trySchedulingPendingCompletions(InputReadyVertexManager.java:146)
>  org.apache.tez.dag.library.vertexmanager.InputReadyVertexManager.onVertexStarted(InputReadyVertexManager.java:187)
>  org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventOnVertexStarted.invoke(VertexManager.java:578)
>  org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:647)
>  org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:642)
>  java.security.AccessController.doPrivileged(native method)
>  javax.security.auth.Subject.doAs(Subject.java:422)
>  org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>  org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:642)
>  org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:631)
>  java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  java.lang.Thread.<null>(unknown source)
> Dispatcher thread {Central} [BLOCKED; waiting to lock org.apache.tez.dag.app.dag.impl.Edge@3e6ba2db]
>  org.apache.tez.dag.app.dag.impl.Edge.getEdgeProperty(Edge.java:241)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.logVertexConfigurationDoneEvent(VertexImpl.java:1886)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.maybeSendConfiguredEvent(VertexImpl.java:3020)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.startVertex(VertexImpl.java:3055)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.access$4500(VertexImpl.java:204)
>  org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:3007)
>  org.apache.tez.dag.app.dag.impl.VertexImpl$StartTransition.transition(VertexImpl.java:2996)
>  org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>  org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>  org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>  org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>  org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:59)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1799)
>  org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:203)
>  org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2214)
>  org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2200)
>  org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:183)
>  org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:114)
>  java.lang.Thread.<null>(unknown source)
> Frozen threads found (potential deadlock)
>  
> It seems that the following threads have not changed their stack for more than 10 seconds.
> These threads are possibly (but not necessarily!) in a deadlock or hung.
>  
> client DomainSocketWatcher <--- Frozen for at least 20m 33 sec
> org.apache.hadoop.net.unix.DomainSocketWatcher.doPoll0(int, DomainSocketWatcher$FdSet) DomainSocketWatcher.java (native)
> org.apache.hadoop.net.unix.DomainSocketWatcher.access$900(int, DomainSocketWatcher$FdSet) DomainSocketWatcher.java:52
> org.apache.hadoop.net.unix.DomainSocketWatcher$2.run() DomainSocketWatcher.java:511
> java.lang.Thread.run() Thread.java:745
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)