You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Hari Sekhon (JIRA)" <ji...@apache.org> on 2018/07/30 12:19:00 UTC

[jira] [Created] (AMBARI-24381) Ambari Extensible Monitoring - use Nagios Plugins format and make extensible for users to extend checking

Hari Sekhon created AMBARI-24381:
------------------------------------

             Summary: Ambari Extensible Monitoring - use Nagios Plugins format and make extensible for users to extend checking
                 Key: AMBARI-24381
                 URL: https://issues.apache.org/jira/browse/AMBARI-24381
             Project: Ambari
          Issue Type: Improvement
          Components: ambari-server
    Affects Versions: 2.6.0
            Reporter: Hari Sekhon


Improve Ambari Monitoring to be extensible taking standard format Nagios Plugins (the industry standard format for extensible checks which operate across a large number of monitoring systems) and allow users to extend Ambari checks and contribute them back in to the core to improve monitoring.

I know Ambari used to use Nagios core and replaced it with custom monitoring management - I'm not suggesting to use Nagios core itself, only Nagios Plugins format for community re-use and extensibility.

Tie this in to Rolling Restarts, such that users can add extra monitoring checks at any layer.

See AMBARI-24380 where Ambari didn't check RegionServers restarted successfully before continuing to take more down. It would be quicker and easier to fix this if the framework was more extensibly engineered, and using checks that are standard format for re-use and extensibility is key as users could quickly and easily add checks in to general health monitoring or rolling restarts to stop Ambari taking down successive nodes without checking the health of prior nodes etc.

You can also find lots of 3rd party plugins that vendors or users could extend Ambari health checks with as well, such as:

[https://github.com/harisekhon/nagios-plugins]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)