You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "JiangHua Zhu (Jira)" <ji...@apache.org> on 2022/06/02 05:19:00 UTC

[jira] [Created] (HDFS-16614) Improve balancer operation strategy and performance

JiangHua Zhu created HDFS-16614:
-----------------------------------

Summary: Improve balancer operation strategy and performance
Key: HDFS-16614
URL: https://issues.apache.org/jira/browse/HDFS-16614
Project: Hadoop HDFS
Issue Type: Improvement
Components: balancer &amp; mover, namenode
Affects Versions: 3.3.0
Reporter: JiangHua Zhu
Attachments: image-2022-06-02-13-18-33-213.png

When the Balancer program is run, it does some work in the following order:
1. Obtain available datanode information from NameNode.
2. Classify and calculate the average utilization according to StorageType. Here, some sets will be obtained in combination with the set thresholds: overUtilized, aboveAvgUtilized, belowAvgUtilized, and underUtilized.
3. According to some calculations, the source and target related to the transfer data are obtained. The source is used for the source end, and the target is used for the data receiving end.
4. Start the data transfer work in parallel.
In this process, run iteratively. In this process, the threshold is unified and applied to all StorageTypes, which seems to be a bit rough, because one of the StorageTypes cannot be distinguished, which is based on the currently supported heterogeneous storage.

There is an online cluster with more than 2000 nodes, and there is an imbalance in node storage. E.g:
!image-2022-06-02-13-18-33-213.png!

Here, the average utilization of the cluster is 78%, but the utilization of most nodes is between 85% and 90%. When the balancer is turned on, we find that 85% of the nodes are working as sources. In this case, we think it is not reasonable, because it will occupy more network resources in the cluster, and it will be beneficial to the normal work of the cluster to do some effective restrictions.
So here are some changes to make:
1. When the balancer is running, it should try to prompt the threshold related to StorageType. For example [[DISK, 10%], [SSD, 8%]...]
2. Support to set threshold according to StorageType and work.
3. Add an option to prohibit nodes below the threshold from joining the Source set. This is to allow nodes with high utilization to transfer data as soon as possible, which is good for balance.
4. Add new support. If there are a lot of datanode usage in the cluster, it should remain unchanged. For example, the utilization rate of 40% of the nodes in the cluster is 75% to 80%, and these nodes should not join the Source set. Of course this support needs to be specified by the user at runtime.

--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org