You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Mike Liddell (JIRA)" <ji...@apache.org> on 2013/06/20 00:04:24 UTC

[jira] [Commented] (TEZ-141) DAG does not kill running vertices when going into failed state

    [ https://issues.apache.org/jira/browse/TEZ-141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13688520#comment-13688520 ] 

Mike Liddell commented on TEZ-141:
----------------------------------

Main fixes:
- when vertexImpl receives a task failure event, it synchronously transitions to KILL_WAIT and sends kill messages to all tasks. 
  - after last task has responded, the vertex transitions to KILLED state.
- when dagImpl receives a vertex failure event, it synchronously transitions to KILL_WAIT and sends kill messages to all vertices. This cascades to kill tasks.
  - after last vertex has responded, the dag transitions to KILLED state.

 - tasks for these scenarios enabled
 - added TaskEventHandlers to testDagImpl and testVertexImpl so that the tasks can respond to kill messages and force the transitions from KILL_WAIT to KILLED as needed for test postconditions.


Regarding the synchronous calls to handle()
When responding to a failed child item (eg a failed task in the case of VertexImpl), it is necessary to do a synchronous transition to KILLWAIT to ensure consistent behavior -- I tried to get things working by sending a V_KILL through the dispatcher, but it wasn't successful as there were various races.  
Rather than call this.handle(.. xyz_KILL) synchronously, we could just perform the necessary state changes directly.  However, the sync call has better code-reuse and I think this is the cleanest solution.
                
> DAG does not kill running vertices when going into failed state 
> ----------------------------------------------------------------
>
>                 Key: TEZ-141
>                 URL: https://issues.apache.org/jira/browse/TEZ-141
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hitesh Shah
>            Assignee: Mike Liddell
>              Labels: TEZ-0.2.0, TEZ-1
>         Attachments: TEZ-141.1.patch, TEZ-141.2.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira