You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2015/05/07 20:01:59 UTC
[jira] [Commented] (TEZ-2426) Task input not complete before
sending Task completed event
[ https://issues.apache.org/jira/browse/TEZ-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533094#comment-14533094 ]
Siddharth Seth commented on TEZ-2426:
-------------------------------------
[~bikassaha] - do you have additional logs - the entire AM log specifically. There seems to be a discrepancy in the AM / task log times as well. Assuming the nodes are out of sync.
I can see how the exception happens during execution of the next task - since we don't join on the eventRouter thread.
However, I'm not sure how the FAILED message will go through for the previous attempt as a result of this. It should have gone through for the currently running task. If it went for the previous task - the AM should have thrown an error related to an invalid taskAttemptId. That leads me to believe something else is broken at the same time.
> Task input not complete before sending Task completed event
> -----------------------------------------------------------
>
> Key: TEZ-2426
> URL: https://issues.apache.org/jira/browse/TEZ-2426
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Bikas Saha
> Priority: Critical
> Attachments: am.log, container.log
>
>
> Sequence of events
> 1) Task A starts in a container
> 2) Task A complete event comes to AM
> 3) Task B starts in the same container
> 4) Task A's input calls some method on its context. Crashes with NPE
> 5) The crash sends an input failed event for Task A to the AM
> 6) Task A state machine crashes saying cannot handle failed after success
> In some cases, it could be that status update event is also sent after completion, though not sure if its related to the failed event being sent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)