You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Yusaku Sako (JIRA)" <ji...@apache.org> on 2015/09/15 00:20:45 UTC

[jira] [Updated] (AMBARI-9894) Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects

     [ https://issues.apache.org/jira/browse/AMBARI-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yusaku Sako updated AMBARI-9894:
--------------------------------
    Fix Version/s: 2.0.0

> Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects
> ---------------------------------------------------------
>
>                 Key: AMBARI-9894
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9894
>             Project: Ambari
>          Issue Type: Bug
>            Reporter: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-9894.patch
>
>
> 3-node cluster
> Configured ResourceManager HA. Three alerts are now Unknown:
> - ResourceManager RPC Latency. Has two instances as expected but each is unknown "No JSON object could be decoded".
> - NodeManger Health Summary. Has two instances as expected but each is unknown "No JSON object could be decoded".
> - ResourceManager CPU Utiliz. Has two instances as expected but each is unknown "No JSON object could be decoded".
> Both RMs are running and I can quick llink over to RMUI + JMX.
> The reason this fails is because YARN forwards requests for the standby RM to the active one. In this scenario, the alert gets back an HTTP 200 response that looks like:
> {noformat}
> This is standby RM. Redirecting to the current active RM: http://c6403.ambari.apache.org:8088/
> {noformat}
> Unfortunately, this is a refresh header redirect which is not able to be handled by the metric alert. The reason that the alerts work is that after the VMs restarted, the original RM became active again. 
> There are a few issues here:
> - YARN doesn't do HA in the same way that other services like HDFS do. As a result, there's no config property that could let the alert know what to do or which hosts to contact.
> - YARN actually forwards after an HTTP 200 to the active node, which doesn't jive with how alerts works.
> This is a definite problem and requires some further investigation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)