You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2015/05/07 20:01:59 UTC

[jira] [Commented] (TEZ-2426) Task input not complete before sending Task completed event

    [ https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533094#comment-14533094 ] 

Siddharth Seth commented on TEZ-2426:
-------------------------------------

[~bikassaha] - do you have additional logs - the entire AM log specifically. There seems to be a discrepancy in the AM / task log times as well. Assuming the nodes are out of sync. 

I can see how the exception happens during execution of the next task - since we don't join on the eventRouter thread.
However, I'm not sure how the FAILED message will go through for the previous attempt as a result of this. It should have gone through for the currently running task. If it went for the previous task - the AM should have thrown an error related to an invalid taskAttemptId. That leads me to believe something else is broken at the same time.

> Task input not complete before sending Task completed event
> -----------------------------------------------------------
>
>                 Key: TEZ-2426
>                 URL: https://issues.apache.org/jira/browse/TEZ-2426
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Priority: Critical
>         Attachments: am.log, container.log
>
>
> Sequence of events
> 1) Task A starts in a container
> 2) Task A complete event comes to AM
> 3) Task B starts in the same container
> 4) Task A's input calls some method on its context. Crashes with NPE
> 5) The crash sends an input failed event for Task A to the AM
> 6) Task A state machine crashes saying cannot handle failed after success
> In some cases, it could be that status update event is also sent after completion, though not sure if its related to the failed event being sent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)