You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Sherry Guo (JIRA)" <ji...@apache.org> on 2015/10/08 23:18:26 UTC

[jira] [Commented] (SLIDER-930) Incorporate Yarn feature of resetting AM failure count into Slider AM

    [ https://issues.apache.org/jira/browse/SLIDER-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949411#comment-14949411 ] 

Sherry Guo commented on SLIDER-930:
-----------------------------------

Hi,
I'm new to Slider and I have worked on some code for this.  I'm attaching a patch if you think you might be able to use it.  We have an internal requirement for this feature and would like to get it into our next release if possible.
Thanks,

> Incorporate Yarn feature of resetting AM failure count into Slider AM
> ---------------------------------------------------------------------
>
>                 Key: SLIDER-930
>                 URL: https://issues.apache.org/jira/browse/SLIDER-930
>             Project: Slider
>          Issue Type: Bug
>          Components: appmaster
>    Affects Versions: Slider 0.80
>            Reporter: Gour Saha
>            Assignee: thomas liu
>             Fix For: Slider 0.90
>
>
> YARN-611 provides this feature. Currently Slider apps are bound by the number set for yarn.resourcemanager.am.max-retries in the cluster. By default this value is set to 2, which is very low for long running services.
> Slider AM should use the feature provided in YARN-611 and set an interval after which the failure count will be reset to 0.
> I believe the API to call on ApplicationSubmissionContext is attemptFailuresValidityInterval. To start with Slider can set it to 5 mins which should be a reasonable default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)