You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@ambari.apache.org by "Dmytro Grinenko (JIRA)" <ji...@apache.org> on 2018/11/30 06:57:04 UTC

[jira] [Updated] (AMBARI-24531) Persistent critical "NameNode High Availability Health" alert after installing with 3 NameNodes

     [ https://issues.apache.org/jira/browse/AMBARI-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmytro Grinenko updated AMBARI-24531:
-------------------------------------
    Attachment: AMBARI-24531.patch

> Persistent critical "NameNode High Availability Health" alert after installing with 3 NameNodes
> -----------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-24531
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24531
>             Project: Ambari
>          Issue Type: Bug
>          Components: alerts
>    Affects Versions: 2.7.0
>         Environment: sles12sp2
>            Reporter: Zack Marsh
>            Priority: Major
>         Attachments: AMBARI-24531.patch
>
>
> After installing Hadoop with 3 NameNodes, there's a persistent alert in the Ambari UI for the HDFS service:
> {code:java}
> NameNode High Availability Health:
> Active['hdp2.labs.teradata.com:50070'], Standby['hdp1.labs.teradata.com:50070', 'hdp3.labs.teradata.com:50070'], Unknown[]
> {code}
> This appears to stem from the alert_ha_namenode_health.py script, in which the NameNode topology is deemed unhealthy if there's not exactly 1 Standby NameNode.
> Excerpt from the alert_ha_namenode_health.py script:
> {code:java}
> # there's only one scenario here; there is exactly 1 active and 1 standby
>   is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == 1
>   result_label = 'Active{0}, Standby{1}, Unknown{2}'.format(str(active_namenodes),
>     str(standby_namenodes), str(unknown_namenodes))
>   if is_topology_healthy:
>     # if there is exactly 1 active and 1 standby NN
>     return (RESULT_STATE_OK, [result_label])
>   else:
>     # other scenario
>     return (RESULT_STATE_CRITICAL, [result_label]){code}
>  
> Currently using the following workaround:
>  
> 1. Replacing the following line in {{alert_ha_namenode_health.py}}:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == 1{code}
> With:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == len(nn_unique_ids)-1{code}
> 2. Restart Ambari Server
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)