You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2018/05/17 09:12:00 UTC
[jira] [Commented] (AMBARI-23872) New Alert JSON Is Invalid When Sent To Agents

    [ https://issues.apache.org/jira/browse/AMBARI-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478784#comment-16478784 ] 

Hudson commented on AMBARI-23872:
---------------------------------

SUCCESS: Integrated in Jenkins build Ambari-trunk-Commit #9273 (See [https://builds.apache.org/job/Ambari-trunk-Commit/9273/])
AMBARI-23872. New Alert JSON Is Invalid When Sent To Agents (aonishuk) (aonishuk: [https://gitbox.apache.org/repos/asf?p=ambari.git&a=commit&h=d8e400c9250b081f19b4f694ab33b61d17a368f7])
* (edit) ambari-server/src/main/java/org/apache/ambari/server/state/alert/MetricSource.java


> New Alert JSON Is Invalid When Sent To Agents
> ---------------------------------------------
>
>                 Key: AMBARI-23872
>                 URL: https://issues.apache.org/jira/browse/AMBARI-23872
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Andrew Onischuk
>            Assignee: Andrew Onischuk
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.7.0
>
>         Attachments: AMBARI-23872.patch, AMBARI-23872.patch, AMBARI-23872.patch
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> STR:
>   * Set a simple cluster with HDFS
>   * Attempt to create a new Alert:
>     
>     
>     
>     POST http://{{ambari-server}}:8080/api/v1/clusters/c1/alert_definitions
>     
>     {
>       "AlertDefinition": {
>         "component_name": "NAMENODE",
>         "description": "This service-level alert is triggered if the total number of volume failures across the cluster is greater than the configured critical threshold.",
>         "enabled": true,
>         "help_url": null,
>         "ignore_host": false,
>         "interval": 2,
>         "label": "NameNode Volume Failures",
>         "name": "namenode_volume_failures",
>         "scope": "ANY",
>         "service_name": "HDFS",
>         "source": {
>           "jmx": {
>             "property_list": [
>               "Hadoop:service=NameNode,name=FSNamesystemState/VolumeFailuresTotal"
>             ],
>             "value": "{0}"
>           },
>           "reporting": {
>             "ok": {
>               "text": "There are {0} volume failures"
>             },
>             "warning": {
>               "text": "There are {0} volume failures",
>               "value": 1
>             },
>             "critical": {
>               "text": "There are {0} volume failures",
>               "value": 1
>             },
>             "units": "Volume(s)"
>           },
>           "type": "METRIC",
>           "uri": {
>             "http": "{{hdfs-site/dfs.namenode.http-address}}",
>             "https": "{{hdfs-site/dfs.namenode.https-address}}",
>             "https_property": "{{hdfs-site/dfs.http.policy}}",
>             "https_property_value": "HTTPS_ONLY",
>             "kerberos_keytab": "{{hdfs-site/dfs.web.authentication.kerberos.keytab}}",
>             "kerberos_principal": "{{hdfs-site/dfs.web.authentication.kerberos.principal}}",
>             "default_port": 0,
>             "connection_timeout": 5,
>             "high_availability": {
>               "nameservice": "{{hdfs-site/dfs.internal.nameservices}}",
>               "alias_key": "{{hdfs-site/dfs.ha.namenodes.{{ha-nameservice}}}}",
>               "http_pattern": "{{hdfs-site/dfs.namenode.http-address.{{ha-nameservice}}.{{alias}}}}",
>               "https_pattern": "{{hdfs-site/dfs.namenode.https-address.{{ha-nameservice}}.{{alias}}}}"
>             }
>           }
>         }
>       }
>     }
>     
> This alert will not be scheduled on the agent correctly:
>     
>     
>     
>     ERROR 2018-05-16 20:11:55,186 AlertSchedulerHandler.py:307 - [AlertScheduler] Unable to load an invalid alert definition. It will be skipped.
>     Traceback (most recent call last):
>       File "/usr/lib/ambari-agent/lib/ambari_agent/AlertSchedulerHandler.py", line 287, in __json_to_callable
>         alert = MetricAlert(json_definition, source, self.config)
>       File "/usr/lib/ambari-agent/lib/ambari_agent/alerts/metric_alert.py", line 52, in __init__
>         self.metric_info = JmxMetric(alert_source_meta['jmx'])
>       File "/usr/lib/ambari-agent/lib/ambari_agent/alerts/metric_alert.py", line 288, in __init__
>         self.property_list = jmx_info['property_list']
>     KeyError: 'property_list'
>     
> Looking at `/var/lib/ambari-agent/cache/cluster_cache/alerts.json`, we can see
> that `property_list` was changed into `propertyList`.
>     
>     
>     
>             "name": "namenode_volume_failures",
>             "componentName": "NAMENODE",
>             "description": "This service-level alert is triggered if the total number of volume failures across the cluster is greater than the configured critical threshold.",
>             "interval": 2,
>             "clusterId": 2,
>             "label": "NameNode Volume Failures",
>             "ignore_host": false,
>             "source": {
>               "jmx": {
>                 "urlSuffix": "/jmx",
>                 "propertyList": [
>                   "Hadoop:service=NameNode,name=FSNamesystemState/VolumeFailuresTotal"
>                 ],
>     



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)