You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "liuyanyu (Jira)" <ji...@apache.org> on 2020/06/11 09:19:00 UTC

[jira] [Created] (HDFS-15407) Hedged read will not work if a datanode slow for a long time

liuyanyu created HDFS-15407:
-------------------------------

             Summary: Hedged read will not work if a datanode slow for a long time
                 Key: HDFS-15407
                 URL: https://issues.apache.org/jira/browse/HDFS-15407
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: 3.1.1, datanode
    Affects Versions: 3.1.1
            Reporter: liuyanyu
            Assignee: liuyanyu


I use cgroups to limit the datanode node IO to 1024Byte/s, use hedged read to read the file, (where dfs.client.hedged.read.threadpool.size is set to 5, dfs.client.hedged.read.threshold.millis is set to 500), the first 5 buffer read timeout, switch other datenode nodes to read successfully. Then stuck for a long time because of SocketTimeoutException. Log as follows

2020-06-11 16:40:07,832 | INFO  | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:08,562 | INFO  | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:09,102 | INFO  | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:09,642 | INFO  | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:10,182 | INFO  | main | Waited 500ms to read from DatanodeInfoWithStorage[xx.xx.xx.28:25009,DS-9c843ac6-4ea1-4791-a1af-54c1ae3d5daf,DISK]; spawning hedged read | DFSInputStream.java:1188
2020-06-11 16:40:10,182 | INFO  | main | Execution rejected, Executing in current thread | DFSClient.java:3049
2020-06-11 16:40:10,219 | INFO  | main | Execution rejected, Executing in current thread | DFSClient.java:3049
2020-06-11 16:50:07,638 | WARN  | hedgedRead-0 | I/O error constructing remote block reader. | BlockReaderFactory.java:764
java.net.SocketTimeoutException: 600000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.xx.113:62750 remote=/xx.xx.xx.28:25009]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
	at java.io.FilterInputStream.read(FilterInputStream.java:83)
	at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:551)
	at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:418)
	at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:853)
	at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:749)
	at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:379)
	at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:661)
	at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1063)
	at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1035)
	at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1031)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2020-06-11 16:50:07,638 | WARN  | hedgedRead-0 | Connection failure: Failed to connect to /xx.xx.xx.28:25009 for file /testhdfs/test2.jar for block BP-1820384660-xx.xx.xx.74-1585533043013:blk_1082582662_8861386:java.net.SocketTimeoutException: 600000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/xx.xx.xx.113:62750 remote=/xx.xx.xx.28:25009] | DFSInputStream.java:1118
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org