You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2013/12/12 21:47:07 UTC

[jira] [Commented] (TEZ-675) Pre-empted taskAttempt gets marked as FAILED instead of KILLED

    [ https://issues.apache.org/jira/browse/TEZ-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13846719#comment-13846719 ] 

Siddharth Seth commented on TEZ-675:
------------------------------------

Bikas, AMContainer is setup not to send commands (TA_KILL) to the TaskAttempt. Instead it just sends state updated (TERMINATED), and lets the TaskAttempt decide what needs to be done. Do you mind updating the patch to stick with this model ? Either a new CONTAINER_PREEMPTED state or just piggyback the container_preemption status on the TERMINATING / TERMINATED messages.

> Pre-empted taskAttempt gets marked as FAILED instead of KILLED
> --------------------------------------------------------------
>
>                 Key: TEZ-675
>                 URL: https://issues.apache.org/jira/browse/TEZ-675
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Yesha Vora
>            Assignee: Bikas Saha
>             Fix For: 0.3.0
>
>         Attachments: TEZ-675.1.patch
>
>
> Scenario: Run GridMix v3 test suit. 
> One of the Node managers has 'Container killed by the ApplicationMaster' error
> ============
> INFO  nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:getNodeStatusAndUpdateContainersInContext(314)) - Sending out status for container: container_id { app_attempt_id { application_id { id: 561 cluster_timestamp: 1386198516843 } attemptId: 1 } id: 46 } state: C_COMPLETE diagnostics: "Container killed by the ApplicationMaster.\nException from container-launch: \norg.apache.hadoop.util.Shell$ExitCodeException: \n\tat org.apache.hadoop.util.Shell.runCommand(Shell.java:464)\n\tat org.apache.hadoop.util.Shell.run(Shell.java:379)\n\tat org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:589)\n\tat org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)\n\tat org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)\n\tat java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)\n\tat java.util.concurrent.FutureTask.run(FutureTask.java:138)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)\n\tat java.lang.Thread.run(Thread.java:662)\n\n\n" exit_status: 255
> ============
> This task running on container_1386198516843_0561_01_000046, which is the container for which this log message was seen, was pre-empted by the AM to make space for an upstream task which needed to run.
> Here, taskAttempt gets marked as FAILED instead of KILLED



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)