You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Srikanth Janardhan (JIRA)" <ji...@apache.org> on 2018/06/25 15:14:00 UTC
[jira] [Created] (AMBARI-24179) Ambari Metrics Service check fails
after deleting a host
Srikanth Janardhan created AMBARI-24179:
-------------------------------------------
Summary: Ambari Metrics Service check fails after deleting a host
Key: AMBARI-24179
URL: https://issues.apache.org/jira/browse/AMBARI-24179
Project: Ambari
Issue Type: Bug
Components: ambari-metrics
Affects Versions: 2.7.0
Reporter: Srikanth Janardhan
Assignee: Dmytro Sen
Fix For: 2.7.0
ambari metrics service check failed immediately after deleting a host:
{code:java}
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 304, in
AMSServiceCheck().execute()
File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
method(env)
File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 184, in service_check
raise Fail("All metrics collectors are unavailable.")
resource_management.core.exceptions.Fail: All metrics collectors are unavailable.
stdout:
2018-06-25 04:42:25,088 - Using hadoop conf dir: /usr/hdp/3.0.0.0-1541/hadoop/conf
2018-06-25 04:42:25,095 - checked_call['hostid'] {}
2018-06-25 04:42:25,100 - checked_call returned (0, '1bac1213')
2018-06-25 04:42:25,102 - Ambari Metrics service check was started.
2018-06-25 04:42:25,121 - Generated metrics for host ctr-e138-1518143905142-378410-01-000009.hwx.site :
{
"metrics": [
{
"metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
"appid": "amssmoketestfake",
"hostname": "ctr-e138-1518143905142-378410-01-000012.hwx.site",
"starttime": 1529901745000,
"metrics": {
"1529901745000": 0.602995821312,
"1529901746000": 1529901745000
}
}
]
}
2018-06-25 04:42:25,122 - Connecting (POST) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics/
2018-06-25 04:42:25,132 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site: 200 OK
2018-06-25 04:42:25,133 - Http data:
2018-06-25 04:42:25,133 - Metrics were saved.
2018-06-25 04:42:25,133 - Connecting (GET) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&hostname=ctr-e138-1518143905142-378410-01-000012.hwx.site&precision=seconds&grouped=false&startTime=1529901685000&appId=amssmoketestfake&endTime=1529901806000
2018-06-25 04:42:25,138 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:25,138 - Http data: {"metrics":[]}
2018-06-25 04:42:25,138 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:25,139 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:35,154 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:35,154 - Http data: {"metrics":[]}
2018-06-25 04:42:35,154 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:35,155 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:45,170 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:45,170 - Http data: {"metrics":[]}
2018-06-25 04:42:45,171 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:45,171 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:55,186 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:55,186 - Http data: {"metrics":[]}
2018-06-25 04:42:55,187 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:55,187 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:43:05,204 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:43:05,204 - Http data: {"metrics":[]}
2018-06-25 04:43:05,205 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:43:05,205 - Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
2018-06-25 04:43:05,207 - Exception while running function '>' for 'ctr-e138-1518143905142-378410-01-000009.hwx.site'. Reason : Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
Command failed after 1 tries
{code}
*Subsequent service check passed, though.*
The issue looks not related to the host deletion, seems like sometimes the data posted during service check is not saved by collector.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)