You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "zhengchenyu (Jira)" <ji...@apache.org> on 2021/06/15 03:39:00 UTC

[jira] [Created] (HDFS-16070) DataTransfer block storm when datanode's io is busy.

zhengchenyu created HDFS-16070:
----------------------------------

             Summary: DataTransfer block storm when datanode's io is busy.
                 Key: HDFS-16070
                 URL: https://issues.apache.org/jira/browse/HDFS-16070
             Project: Hadoop HDFS
          Issue Type: Improvement
    Affects Versions: 3.2.1, 3.3.0
            Reporter: zhengchenyu


When I speed up the decommission, I found that some datanode's io is busy, then I found host's load is very high, and ten thousands data transfer thread are running. 
Then I find log like below.
{code}
# 启动线程的日志
2021-06-08 13:42:37,620 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.52:9866
2021-06-08 13:52:36,345 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.7.31:9866
2021-06-08 14:02:37,197 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.201.4.49:9866, datanodeUuid=6c55b7cb-f8ef-445b-9cca-d82b5b077ed1, infoPort=9864, infoSecurePort=0, ipcPort=9867, storageInfo=lv=-57;cid=CID-37e80bd5-733a-4d7b-ba3d-b46269573c72;nsid=215490653;c=1584525570797) Starting thread to transfer BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 to 10.201.16.50:9866
# 发送完成的标记
2021-06-08 13:54:08,134 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.7.52:9866
2021-06-08 14:10:47,170 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DataTransfer, at bd-tz1-hadoop-004049.zeus.lianjia.com:9866: Transmitted BP-852924019-10.201.1.32-1584525570797:blk_-9223372036449848858_30963611 (numBytes=7457424) to /10.201.16.50:9866
{code}
You will see last datatranfser thread was done on 13:54:08, but next datatranfser was start at 13:52:36. 
If datatranfser was not done in 10min(pending timeout + check interval), then next datatranfser for same block will be running. Then disk and network are heavy.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org