You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2015/07/30 17:07:05 UTC

[jira] [Commented] (YARN-3998) Add retry-times to let NM re-launch container when it fails to run

    [ https://issues.apache.org/jira/browse/YARN-3998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647755#comment-14647755 ] 

Jason Lowe commented on YARN-3998:
----------------------------------

Is this really a feature that YARN needs to provide?  To me this is basically a case of container re-use which the application itself can control.  A primitive example would be an application that launches a container that wraps the real task in a wrapper shell script or Java program that spawns the real task and will respawn it some number of times if the real task fails before failing the entire container.  I'm not sure YARN is the best place to put this functionality.

> Add retry-times to let NM re-launch container when it fails to run
> ------------------------------------------------------------------
>
>                 Key: YARN-3998
>                 URL: https://issues.apache.org/jira/browse/YARN-3998
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Jun Gong
>            Assignee: Jun Gong
>
> I'd like to add a field(retry-times) in ContainerLaunchContext. When AM launches containers, it could specify the value. Then NM will re-launch the container 'retry-times' times when it fails to run(e.g.exit code is not 0). 
> It will save a lot of time. It avoids container localization. RM does not need to re-schedule the container. And local files in container's working directory will be left for re-use.(If container have downloaded some big files, it does not need to re-download them when running again.) 
> We find it is useful in systems like Storm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)