You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2015/10/13 20:01:05 UTC

[jira] [Commented] (SLIDER-870) use timeline server as a historical source of failure information

    [ https://issues.apache.org/jira/browse/SLIDER-870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955365#comment-14955365 ] 

Steve Loughran commented on SLIDER-870:
---------------------------------------

Without a stable client library for ATS, I don't want to go near this

> use timeline server as a historical source of failure information
> -----------------------------------------------------------------
>
>                 Key: SLIDER-870
>                 URL: https://issues.apache.org/jira/browse/SLIDER-870
>             Project: Slider
>          Issue Type: Sub-task
>          Components: appmaster, client
>    Affects Versions: Slider 0.80
>            Reporter: Steve Loughran
>             Fix For: Slider 1.0.0
>
>
> We lose failure history when an AM dies; this hurts reporting and doesn't allow the collection of long-term statistics.
> We can use the timeline server for this information, saving events on failure, then querying it on AM restart to rebuild that history & re-use it in decision making. 
> They can also be presented to the user in (a) the web UI and (b) from the command line —even while a cluster is not running.
> Finally, stats on node failures could be aggregated across applications, possibly even across users. This would identify hotspots for node unreliability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)