You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2014/12/05 12:23:12 UTC

[jira] [Commented] (SLIDER-701) Support alerts for Slider Apps

    [ https://issues.apache.org/jira/browse/SLIDER-701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14235387#comment-14235387 ] 

Steve Loughran commented on SLIDER-701:
---------------------------------------

the/SLIDER-319 codahale metrics stuff (in 0.70) lets us publish things in the AM to monitoring tools; that gives us metrics & health, but not direct alerts. If enabled, slider can already forward events to ganglia, from whence nagios can monitor and alert. There's no tests for this —this could be something we could cover in an integration test. The new Web UI tests do already grab the JSON-published metrics which we could then parse and compare with perceived state.

There's not much monitored yet though; we need to identify what to monitor and instrument the code

*Liveness* is something we could consider doing checks in slider (e.g. hit root HDFS web UI regularly). If tools exist in the cluster already then publishing the binding info could be enough. Most of that is in the registry. What we would have to do is let apps map from endpoints to instances, and tell slider to kill nodes that are considered failing.



> Support alerts for Slider Apps
> ------------------------------
>
>                 Key: SLIDER-701
>                 URL: https://issues.apache.org/jira/browse/SLIDER-701
>             Project: Slider
>          Issue Type: Task
>          Components: agent, app-package, appmaster
>    Affects Versions: Slider 0.70
>            Reporter: Sumit Mohanty
>            Assignee: Sumit Mohanty
>             Fix For: Slider 0.70
>
>
> Traditional deployment of apps typically include alerts configured for alerting systems such as Nagios. This includes configuring the alerting system to check various data points such as live port, jmx data, etc. For a slider app similar configurations may be defined while being aware of the fact that the application components may move during the life time of the application. Additionally, YARN/Slider provides several status information (e.g. live component instance count) that can be used for alerts.
> This task covers investigation into various alerting infrastructure and providing recommendation or solution for specific alerting infrastructures for Slider apps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)