You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Aravindan Vijayan (JIRA)" <ji...@apache.org> on 2018/06/26 16:59:00 UTC
[jira] [Assigned] (AMBARI-24179) Ambari Metrics Service check fails
after deleting a host
[ https://issues.apache.org/jira/browse/AMBARI-24179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aravindan Vijayan reassigned AMBARI-24179:
------------------------------------------
Assignee: Aravindan Vijayan (was: Dmytro Sen)
> Ambari Metrics Service check fails after deleting a host
> --------------------------------------------------------
>
> Key: AMBARI-24179
> URL: https://issues.apache.org/jira/browse/AMBARI-24179
> Project: Ambari
> Issue Type: Bug
> Components: ambari-metrics
> Affects Versions: 2.7.0
> Reporter: Srikanth Janardhan
> Assignee: Aravindan Vijayan
> Priority: Critical
> Fix For: 2.7.0
>
>
> ambari metrics service check failed immediately after deleting a host:
> {code:java}
> stderr:
> Traceback (most recent call last):
> File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 304, in
> AMSServiceCheck().execute()
> File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
> method(env)
> File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
> return fn(*args, **kwargs)
> File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 184, in service_check
> raise Fail("All metrics collectors are unavailable.")
> resource_management.core.exceptions.Fail: All metrics collectors are unavailable.
> stdout:
> 2018-06-25 04:42:25,088 - Using hadoop conf dir: /usr/hdp/3.0.0.0-1541/hadoop/conf
> 2018-06-25 04:42:25,095 - checked_call['hostid'] {}
> 2018-06-25 04:42:25,100 - checked_call returned (0, '1bac1213')
> 2018-06-25 04:42:25,102 - Ambari Metrics service check was started.
> 2018-06-25 04:42:25,121 - Generated metrics for host ctr-e138-1518143905142-378410-01-000009.hwx.site :
> {
> "metrics": [
> {
> "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
> "appid": "amssmoketestfake",
> "hostname": "ctr-e138-1518143905142-378410-01-000012.hwx.site",
> "starttime": 1529901745000,
> "metrics": {
> "1529901745000": 0.602995821312,
> "1529901746000": 1529901745000
> }
> }
> ]
> }
> 2018-06-25 04:42:25,122 - Connecting (POST) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics/
> 2018-06-25 04:42:25,132 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site: 200 OK
> 2018-06-25 04:42:25,133 - Http data:
> 2018-06-25 04:42:25,133 - Metrics were saved.
> 2018-06-25 04:42:25,133 - Connecting (GET) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&hostname=ctr-e138-1518143905142-378410-01-000012.hwx.site&precision=seconds&grouped=false&startTime=1529901685000&appId=amssmoketestfake&endTime=1529901806000
> 2018-06-25 04:42:25,138 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:25,138 - Http data: {"metrics":[]}
> 2018-06-25 04:42:25,138 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:25,139 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:42:35,154 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:35,154 - Http data: {"metrics":[]}
> 2018-06-25 04:42:35,154 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:35,155 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:42:45,170 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:45,170 - Http data: {"metrics":[]}
> 2018-06-25 04:42:45,171 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:45,171 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:42:55,186 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:42:55,186 - Http data: {"metrics":[]}
> 2018-06-25 04:42:55,187 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:42:55,187 - Values weren't stored yet. Retrying in 10 seconds.
> 2018-06-25 04:43:05,204 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
> 2018-06-25 04:43:05,204 - Http data: {"metrics":[]}
> 2018-06-25 04:43:05,205 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
> 2018-06-25 04:43:05,205 - Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
> 2018-06-25 04:43:05,207 - Exception while running function '>' for 'ctr-e138-1518143905142-378410-01-000009.hwx.site'. Reason : Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
> Command failed after 1 tries
> {code}
> *Subsequent service check passed, though.*
> The issue looks not related to the host deletion, seems like sometimes the data posted during service check is not saved by collector.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)