You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Srikanth Janardhan (JIRA)" <ji...@apache.org> on 2018/06/25 15:14:00 UTC

[jira] [Created] (AMBARI-24179) Ambari Metrics Service check fails after deleting a host

Srikanth Janardhan created AMBARI-24179:
-------------------------------------------

             Summary: Ambari Metrics Service check fails after deleting a host
                 Key: AMBARI-24179
                 URL: https://issues.apache.org/jira/browse/AMBARI-24179
             Project: Ambari
          Issue Type: Bug
          Components: ambari-metrics
    Affects Versions: 2.7.0
            Reporter: Srikanth Janardhan
            Assignee: Dmytro Sen
             Fix For: 2.7.0


ambari metrics service check failed immediately after deleting a host:
{code:java}
stderr: 
Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 304, in 
    AMSServiceCheck().execute()
  File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 353, in execute
    method(env)
  File "/usr/lib/ambari-agent/lib/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/stacks/HDP/3.0/services/AMBARI_METRICS/package/scripts/service_check.py", line 184, in service_check
    raise Fail("All metrics collectors are unavailable.")
resource_management.core.exceptions.Fail: All metrics collectors are unavailable.
 stdout:
2018-06-25 04:42:25,088 - Using hadoop conf dir: /usr/hdp/3.0.0.0-1541/hadoop/conf
2018-06-25 04:42:25,095 - checked_call['hostid'] {}
2018-06-25 04:42:25,100 - checked_call returned (0, '1bac1213')
2018-06-25 04:42:25,102 - Ambari Metrics service check was started.
2018-06-25 04:42:25,121 - Generated metrics for host ctr-e138-1518143905142-378410-01-000009.hwx.site :

{
  "metrics": [
    {
      "metricname": "AMBARI_METRICS.SmokeTest.FakeMetric",
      "appid": "amssmoketestfake",
      "hostname": "ctr-e138-1518143905142-378410-01-000012.hwx.site",
      "starttime": 1529901745000,
      "metrics": {
        "1529901745000": 0.602995821312,
        "1529901746000": 1529901745000
      }
    }
  ]
}
2018-06-25 04:42:25,122 - Connecting (POST) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics/
2018-06-25 04:42:25,132 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site: 200 OK
2018-06-25 04:42:25,133 - Http data: 
2018-06-25 04:42:25,133 - Metrics were saved.
2018-06-25 04:42:25,133 - Connecting (GET) to ctr-e138-1518143905142-378410-01-000009.hwx.site:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&hostname=ctr-e138-1518143905142-378410-01-000012.hwx.site&precision=seconds&grouped=false&startTime=1529901685000&appId=amssmoketestfake&endTime=1529901806000
2018-06-25 04:42:25,138 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:25,138 - Http data: {"metrics":[]}
2018-06-25 04:42:25,138 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:25,139 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:35,154 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:35,154 - Http data: {"metrics":[]}
2018-06-25 04:42:35,154 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:35,155 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:45,170 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:45,170 - Http data: {"metrics":[]}
2018-06-25 04:42:45,171 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:45,171 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:42:55,186 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:42:55,186 - Http data: {"metrics":[]}
2018-06-25 04:42:55,187 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:42:55,187 - Values weren't stored yet. Retrying in 10 seconds.
2018-06-25 04:43:05,204 - Http response for host ctr-e138-1518143905142-378410-01-000009.hwx.site : 200 OK
2018-06-25 04:43:05,204 - Http data: {"metrics":[]}
2018-06-25 04:43:05,205 - Metrics were retrieved from host ctr-e138-1518143905142-378410-01-000009.hwx.site
2018-06-25 04:43:05,205 - Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.
2018-06-25 04:43:05,207 - Exception while running function '>' for 'ctr-e138-1518143905142-378410-01-000009.hwx.site'. Reason : Ambari Metrics service check failed on collector host ctr-e138-1518143905142-378410-01-000009.hwx.site. Reason : Values 0.602995821312 and 1529901745000 were not found in the response.

Command failed after 1 tries
{code}
*Subsequent service check passed, though.*

The issue looks not related to the host deletion, seems like sometimes the data posted during service check is not saved by collector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)