You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Aravindan Vijayan (JIRA)" <ji...@apache.org> on 2018/06/26 16:59:00 UTC

[jira] [Resolved] (AMBARI-24179) Ambari Metrics Service check fails after deleting a host

     [ https://issues.apache.org/jira/browse/AMBARI-24179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aravindan Vijayan resolved AMBARI-24179.
----------------------------------------
    Resolution: Duplicate

> Ambari Metrics Service check fails after deleting a host
> --------------------------------------------------------
>
>                 Key: AMBARI-24179
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24179
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics
>    Affects Versions: 2.7.0
>            Reporter: Srikanth Janardhan
>            Assignee: Aravindan Vijayan
>            Priority: Critical
>             Fix For: 2.7.0
>
>
> ambari metrics service check failed immediately after deleting a host:
> {code:java}
> stderr: 
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 304, in 
>     AMSServiceCheck().execute()
>   File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
>     method(env)
>   File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
>     return fn(*args, **kwargs)
>   File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 184, in service_check
>     raise Fail("All metrics collectors are unavailable.")
> resource_management.core.exceptions.Fail: All metrics collectors are unavailable.
>  stdout:
> 2018-06-25 04:42:25,088 - Using hadoop conf dir: /usr/hdp/3.0.0.0-1541/hadoop/conf
> 2018-06-25 04:42:25,095 - checked_call['hostid'] {}
> 2018-06-25 04:42:25,100 - checked_call returned (0, '1bac1213')
> 2018-06-25 04:42:25,102 - Ambari Metrics service check was started.
> 2018-06-25 04:42:25,121 - Generated metrics for host ctr-e138-1518143905142-378410-01-000009.hwx.site :
> {
>   "metrics": [
>     {
>       "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
>       "appid": "amssmoketestfake",
>       "hostname": "ctr-e138-1518143905142-378410-01-000012.hwx.site",
>       "starttime": 1529901745000,
>       "metrics": {
>         "1529901745000": 0.602995821312,
>         "1529901746000": 1529901745000
>       }
>     }
>   ]
> }
> 2018-06-25 04:42:25,122 - Connecting (POST) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics/
> 2018-06-25 04:42:25,132 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site: 200 OK
> 2018-06-25 04:42:25,133 - Http data: 
> 2018-06-25 04:42:25,133 - Metrics were saved.
> 2018-06-25 04:42:25,133 - Connecting (GET) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&hostname=ctr-e138-1518143905142-378410-01-000012.hwx.site&precision=seconds&grouped=false&startTime=1529901685000&appId=amssmoketestfake&endTime=1529901806000
> 2018-06-25 04:42:25,138 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:25,138 - Http data: {"metrics":[]}
> 2018-06-25 04:42:25,138 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:25,139 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:42:35,154 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:35,154 - Http data: {"metrics":[]}
> 2018-06-25 04:42:35,154 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:35,155 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:42:45,170 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:45,170 - Http data: {"metrics":[]}
> 2018-06-25 04:42:45,171 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:45,171 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:42:55,186 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:55,186 - Http data: {"metrics":[]}
> 2018-06-25 04:42:55,187 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:55,187 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:43:05,204 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:43:05,204 - Http data: {"metrics":[]}
> 2018-06-25 04:43:05,205 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:43:05,205 - Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
> 2018-06-25 04:43:05,207 - Exception while running function '>' for 'ctr-e138-1518143905142-378410-01-000009.hwx.site'. Reason : Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
> Command failed after 1 tries
> {code}
> *Subsequent service check passed, though.*
> The issue looks not related to the host deletion, seems like sometimes the data posted during service check is not saved by collector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)