You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Yurii Shylov (JIRA)" <ji...@apache.org> on 2015/02/03 18:54:36 UTC

[jira] [Created] (AMBARI-9458) HDFS, YARN, and HBase Slave Health Alert Definitions

Yurii Shylov created AMBARI-9458:
------------------------------------

             Summary: HDFS, YARN, and HBase Slave Health Alert Definitions
                 Key: AMBARI-9458
                 URL: https://issues.apache.org/jira/browse/AMBARI-9458
             Project: Ambari
          Issue Type: Task
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Yurii Shylov
            Assignee: Yurii Shylov
             Fix For: 2.0.0


When a slave component, such as a DataNode, encounters some catastrophic problem like a heap allocation error, and no longer can perform its work, the NameNode marks this DataNode as being unhealthy.

The current alert definitions only check for the DataNode process being alive, which is still technically is. We need to add new alert definitions for:

- HDFS/DataNode (runs on NameNode, query is to NameNode JMX)
- YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX)
- HBase/RegionServer (runs on HBase Master, queries HBase Master JMX)

Which will check for slaves that are in some sort of bad state. Depending on the JMX structures that need to be queried, these can either be METRIC or SCRIPT style alert definitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)