You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Alejandro Fernandez <af...@hortonworks.com> on 2015/09/23 23:34:29 UTC
Re: Review Request 38651: AMBARI-13194. Alert definition when
DataNode data dirs become unmounted
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/38651/
-----------------------------------------------------------
(Updated Sept. 23, 2015, 9:34 p.m.)
Review request for Ambari, Andrew Onischuk, Dmitro Lisnichenko, Jonathan Hurley, Nate Cole, Sumit Mohanty, Srimanth Gunturi, and Sid Wagle.
Changes
-------
Added more logic to compare the actual mount points against the expected in the history file.
Summary (updated)
-----------------
AMBARI-13194. Alert definition when DataNode data dirs become unmounted
Bugs: AMBARI-13194
https://issues.apache.org/jira/browse/AMBARI-13194
Repository: ambari
Description
-------
Ambari uses the dfs.datanode.data.dir.mount.file property in HDFS, whose value is typically /etc/hadoop/conf/dfs_data_dir_mount.hist
to track the mount points for each of the data dirs.
E.g.,
{code}
/hadoop01/data,/device1
/hadoop02/data,/device2
/hadoop03/data,/ # this one is on root, the others are all on mount points.
{code}
Whenever a drive becomes unmounted, Ambari detects that it was previously on a mount and will not create that data dir; HDFS can still tolerate the failure if dfs.datanode.failed.volumes.tolerated is greater than 0.
Now, if the /etc/hadoop/conf/dfs_data_dir_mount.hist file is deleted, then Ambari won't have this knowledge, and will create the datadir (even if it's on the root partition).
To improve tracking, create an alert definition that checks the following
* warning status if the /etc/hadoop/conf/dfs_data_dir_mount.hist file is deleted
* critical status if at least one of the data dirs is mounted on the root partition, and at least one data dir is on a mount
Diffs (updated)
-----
ambari-agent/src/test/python/resource_management/TestDatanodeHelper.py e348cc4
ambari-common/src/main/python/resource_management/core/providers/system.py 213adc5
ambari-common/src/main/python/resource_management/libraries/functions/dfs_datanode_helper.py a05e162
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json 2fcacc8
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/alerts/alert_datanode_unmounted_data_dir.py PRE-CREATION
ambari-server/src/test/python/stacks/2.0.6/HDFS/test_alert_datanode_unmounted_data_dir.py PRE-CREATION
Diff: https://reviews.apache.org/r/38651/diff/
Testing
-------
* Python unit tests passed
* Verified that the alert worked on several hosts for all 3 types of statuses (WARNING, CRITICAL, OK)
* Also checked that it did not run on a host without DataNode, and it did run once I added DataNode to that host
Thanks,
Alejandro Fernandez