You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Dmytro Grinenko (JIRA)" <ji...@apache.org> on 2018/11/30 06:57:04 UTC
[jira] [Updated] (AMBARI-24531) Persistent critical "NameNode High
Availability Health" alert after installing with 3 NameNodes
[ https://issues.apache.org/jira/browse/AMBARI-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Dmytro Grinenko updated AMBARI-24531:
-------------------------------------
Attachment: AMBARI-24531.patch
> Persistent critical "NameNode High Availability Health" alert after installing with 3 NameNodes
> -----------------------------------------------------------------------------------------------
>
> Key: AMBARI-24531
> URL: https://issues.apache.org/jira/browse/AMBARI-24531
> Project: Ambari
> Issue Type: Bug
> Components: alerts
> Affects Versions: 2.7.0
> Environment: sles12sp2
> Reporter: Zack Marsh
> Priority: Major
> Attachments: AMBARI-24531.patch
>
>
> After installing Hadoop with 3 NameNodes, there's a persistent alert in the Ambari UI for the HDFS service:
> {code:java}
> NameNode High Availability Health:
> Active['hdp2.labs.teradata.com:50070'], Standby['hdp1.labs.teradata.com:50070', 'hdp3.labs.teradata.com:50070'], Unknown[]
> {code}
> This appears to stem from the alert_ha_namenode_health.py script, in which the NameNode topology is deemed unhealthy if there's not exactly 1 Standby NameNode.
> Excerpt from the alert_ha_namenode_health.py script:
> {code:java}
> # there's only one scenario here; there is exactly 1 active and 1 standby
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == 1
> result_label = 'Active{0}, Standby{1}, Unknown{2}'.format(str(active_namenodes),
> str(standby_namenodes), str(unknown_namenodes))
> if is_topology_healthy:
> # if there is exactly 1 active and 1 standby NN
> return (RESULT_STATE_OK, [result_label])
> else:
> # other scenario
> return (RESULT_STATE_CRITICAL, [result_label]){code}
>
> Currently using the following workaround:
>
> 1. Replacing the following line in {{alert_ha_namenode_health.py}}:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == 1{code}
> With:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == len(nn_unique_ids)-1{code}
> 2. Restart Ambari Server
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)