You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Yuanbo Liu (Jira)" <ji...@apache.org> on 2022/07/13 02:35:00 UTC
[jira] [Created] (HDFS-16657) Changing pool-level lock to volume-level lock for invalidation of blocks
Yuanbo Liu created HDFS-16657:
---------------------------------
Summary: Changing pool-level lock to volume-level lock for invalidation of blocks
Key: HDFS-16657
URL: https://issues.apache.org/jira/browse/HDFS-16657
Project: Hadoop HDFS
Issue Type: Sub-task
Reporter: Yuanbo Liu
Attachments: image-2022-07-13-10-25-37-383.png, image-2022-07-13-10-27-01-386.png, image-2022-07-13-10-27-44-258.png
Recently we see that the heartbeating of dn become slow in a very busy cluster, here is the chart:
!image-2022-07-13-10-25-37-383.png!
After getting jstack of the dn, we find that dn heartbeat stuck in invalidation of blocks:
!image-2022-07-13-10-27-01-386.png!
!image-2022-07-13-10-27-44-258.png!
The key code is:
{code:java}
// code placeholder
try {
File blockFile = new File(info.getBlockURI());
if (blockFile != null && blockFile.getParentFile() == null) {
errors.add("Failed to delete replica " + invalidBlks[i]
+ ". Parent not found for block file: " + blockFile);
continue;
}
} catch(IllegalArgumentException e) {
LOG.warn("Parent directory check failed; replica " + info
+ " is not backed by a local file");
} {code}
DN is trying to locate parent path of block file, thus there is a disk I/O in pool-level lock. When the disk becomes very busy with high io wait, All the pending threads will be blocked by the pool-level lock, and the time of heartbeat is high. We proposal to change the pool-level lock to volume-level lock for block invalidation
cc: [~hexiaoqiao] [~Aiphag0]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org