You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Zhiyuan Yang (JIRA)" <ji...@apache.org> on 2017/07/06 06:57:00 UTC
[jira] [Commented] (TEZ-3297) Deadlock scenario in AM during
ShuffleVertexManager auto reduce
[ https://issues.apache.org/jira/browse/TEZ-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16076050#comment-16076050 ]
Zhiyuan Yang commented on TEZ-3297:
-----------------------------------
Add 0.8.4 and 0.9.0 to 'Fix Version' since patch was committed to these branch also.
> Deadlock scenario in AM during ShuffleVertexManager auto reduce
> ---------------------------------------------------------------
>
> Key: TEZ-3297
> URL: https://issues.apache.org/jira/browse/TEZ-3297
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Zhiyuan Yang
> Assignee: Rajesh Balamohan
> Priority: Critical
> Fix For: 0.7.2, 0.9.0, 0.8.4
>
> Attachments: am_log, TEZ-3297.1.patch, TEZ-3297.2.branch-0.7.patch, TEZ-3297.2.patch, thread_dump
>
>
> Here is what's happening in the attached thread dump.
> App Pool thread #9 does the auto reduce on V2 and initializes the new edge manager, it holds the V2 write lock and wants read lock of source vertex V1.
> At the same time, another App Pool thread #2 schedules a task of V1 and gets the output spec, so it holds the V1 read lock and wants V2 read lock.
> Also, dispatcher thread wants the V1 write lock to begin the state machine transition. Since dispatcher thread is at the head of V1 ReadWriteLock queue, thread #9 cannot get V1 read lock even thread #2 is holding V1 read lock.
> This is a circular lock scenario. #2 blocks dispatcher, dispatcher blocks #9, and #9 blocks #2.
> There is no problem with ReadWriteLock behavior in this case. Please see this java bug report, http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6816565.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)