You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Hitesh Shah (JIRA)" <ji...@apache.org> on 2015/05/04 19:55:07 UTC

[jira] [Comment Edited] (TEZ-2404) Handle DataMovementEvent before its TaskAttemptCompletedEvent

    [ https://issues.apache.org/jira/browse/TEZ-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526931#comment-14526931 ] 

Hitesh Shah edited comment on TEZ-2404 at 5/4/15 5:54 PM:
----------------------------------------------------------

BUmping up priority as this means recovery is potentially broken. 

[~zjffdu] It looks like we need a recovery related test to ensure that all data movements events are always stored before a task completion event.


was (Author: hitesh):
BUmping up priority as this means recovery is potentially broken. 

[~zjffdu] It looks like we need a recovery related test to ensure that data movements events are always stored before a task completion event.

> Handle DataMovementEvent before its TaskAttemptCompletedEvent
> -------------------------------------------------------------
>
>                 Key: TEZ-2404
>                 URL: https://issues.apache.org/jira/browse/TEZ-2404
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: TEZ-2404-1.patch, TEZ-2404-2.patch
>
>
> TEZ-2325 route TASK_ATTEMPT_COMPLETED_EVENT directly to the attempt, but it would cause recovery issue. Recovery need that DataMovement event is handled before TaskAttemptCompletedEvent, otherwise DataMovement event may be lost in recovering and cause the its dependent tasks hang.
> 2 Ways to fix this issue.
> 1. Still route TaskAtttemptCompletedEvent in Vertex
> 2. route DataMovementEvent before TaskAttemptCompeltedEvent in TezTaskAttemptListener



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)