You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2015/04/12 00:04:12 UTC

[jira] [Updated] (TEZ-2310) AM Deadlock in VertexImpl

     [ https://issues.apache.org/jira/browse/TEZ-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated TEZ-2310:
----------------------------
    Attachment: TEZ-2310-0.patch

> AM Deadlock in VertexImpl
> -------------------------
>
>                 Key: TEZ-2310
>                 URL: https://issues.apache.org/jira/browse/TEZ-2310
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Daniel Dai
>             Fix For: 0.7.0
>
>         Attachments: TEZ-2310-0.patch
>
>
> See the following deadlock in testing:
> Thread#1:
> {code}
> Daemon Thread [App Shared Pool - #3] (Suspended)	
> 	owns: VertexManager$VertexManagerPluginContextImpl  (id=327)	
> 	owns: ShuffleVertexManager  (id=328)	
> 	owns: VertexManager  (id=329)	
> 	waiting for: VertexManager$VertexManagerPluginContextImpl  (id=326)	
> 	VertexManager$VertexManagerPluginContextImpl.onStateUpdated(VertexStateUpdate) line: 344	
> 	StateChangeNotifier$ListenerContainer.sendStateUpdate(VertexStateUpdate) line: 138	
> 	StateChangeNotifier$ListenerContainer.access$100(StateChangeNotifier$ListenerContainer, VertexStateUpdate) line: 122	
> 	StateChangeNotifier.sendStateUpdate(TezVertexID, VertexStateUpdate) line: 116	
> 	StateChangeNotifier.stateChanged(TezVertexID, VertexStateUpdate) line: 106	
> 	VertexImpl.maybeSendConfiguredEvent() line: 3385	
> 	VertexImpl.doneReconfiguringVertex() line: 1634	
> 	VertexManager$VertexManagerPluginContextImpl.doneReconfiguringVertex() line: 339	
> 	ShuffleVertexManager.schedulePendingTasks(int) line: 561	
> 	ShuffleVertexManager.schedulePendingTasks() line: 620	
> 	ShuffleVertexManager.handleVertexStateUpdate(VertexStateUpdate) line: 731	
> 	ShuffleVertexManager.onVertexStateUpdated(VertexStateUpdate) line: 744	
> 	VertexManager$VertexManagerEventOnVertexStateUpdate.invoke() line: 527	
> 	VertexManager$VertexManagerEvent$1.run() line: 612	
> 	VertexManager$VertexManagerEvent$1.run() line: 607	
> 	AccessController.doPrivileged(PrivilegedExceptionAction<T>, AccessControlContext) line: not available [native method]	
> 	Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415	
> 	UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548	
> 	VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 607	
> 	VertexManager$VertexManagerEventOnVertexStateUpdate(VertexManager$VertexManagerEvent).call() line: 596	
> 	ListenableFutureTask<V>(FutureTask<V>).run() line: 262	
> 	ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145	
> 	ThreadPoolExecutor$Worker.run() line: 615	
> 	Thread.run() line: 745	
> {code}
> Thread #2
> {code}
> Daemon Thread [App Shared Pool - #2] (Suspended)	
> 	owns: VertexManager$VertexManagerPluginContextImpl  (id=326)	
> 	owns: PigGraceShuffleVertexManager  (id=344)	
> 	owns: VertexManager  (id=345)	
> 	Unsafe.park(boolean, long) line: not available [native method]	
> 	LockSupport.park(Object) line: 186	
> 	ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).parkAndCheckInterrupt() line: 834	
> 	ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).doAcquireShared(int) line: 964	
> 	ReentrantReadWriteLock$NonfairSync(AbstractQueuedSynchronizer).acquireShared(int) line: 1282	
> 	ReentrantReadWriteLock$ReadLock.lock() line: 731	
> 	VertexImpl.getTotalTasks() line: 952	
> 	VertexManager$VertexManagerPluginContextImpl.getVertexNumTasks(String) line: 162	
> 	PigGraceShuffleVertexManager(ShuffleVertexManager).updateSourceTaskCount() line: 435	
> 	PigGraceShuffleVertexManager(ShuffleVertexManager).onVertexStarted(Map<String,List<Integer>>) line: 353	
> 	VertexManager$VertexManagerEventOnVertexStarted.invoke() line: 541	
> 	VertexManager$VertexManagerEvent$1.run() line: 612	
> 	VertexManager$VertexManagerEvent$1.run() line: 607	
> 	AccessController.doPrivileged(PrivilegedExceptionAction<T>, AccessControlContext) line: not available [native method]	
> 	Subject.doAs(Subject, PrivilegedExceptionAction<T>) line: 415	
> 	UserGroupInformation.doAs(PrivilegedExceptionAction<T>) line: 1548	
> 	VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 607	
> 	VertexManager$VertexManagerEventOnVertexStarted(VertexManager$VertexManagerEvent).call() line: 596	
> 	ListenableFutureTask<V>(FutureTask<V>).run() line: 262	
> 	ThreadPoolExecutor.runWorker(ThreadPoolExecutor$Worker) line: 1145	
> 	ThreadPoolExecutor$Worker.run() line: 615	
> 	Thread.run() line: 745	
> {code}
> What happens is thread #1 holding a writeLock (VertexImpl:1628) and enter into a synchronized block (ShuffleVertexManager.onVertexStateUpdated), in the mean time, thread #2 already in the synchronized block (ShuffleVertexManager.onVertexStarted) and try to get a readLock(VertexImpl:952). Holding a lock and then enter a synchronized block might be dangerous. 
> I attach a patch which avoiding that and then deadlock goes away. Not sure if that is the right fix or if any other patterns like this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)