You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/09/11 00:24:44 UTC

[jira] Updated: (HADOOP-3831) slow-reading dfs clients do not recover from datanode-write-timeouts

     [ https://issues.apache.org/jira/browse/HADOOP-3831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Raghu Angadi updated HADOOP-3831:
---------------------------------

    Attachment: HADOOP-3831.patch

Hairong, updated patch adds this comment. Does this help ?:


{code}[...]
     /* we retry current node only once. So this is set to true just here.
       * Intention is to handle one common case of an error that is not a
       * failure on datanode or client : when DataNode closes the connection
       * since client is idle. If there are other cases of "non-errors" then
       * then a datanode might be retried by setting this to true again.
       */
 {code}

> slow-reading dfs clients do not recover from datanode-write-timeouts
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3831
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3831
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.1
>            Reporter: Christian Kunz
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3831.patch, HADOOP-3831.patch, HADOOP-3831.patch, HADOOP-3831.patch
>
>
> Some of our applications read through certain files from dfs (using libhdfs) much slower than through others, such that they trigger the write timeout introduced in 0.17.x into the datanodes. Eventually they fail.
> Dfs clients should be able to recover from such a situation.
> In the meantime, would setting
> dfs.datanode.socket.write.timeout=0
> in hadoop-site.xml help?
> Here are the exceptions I see:
> DataNode:
> 2008-07-24 00:12:40,167 WARN org.apache.hadoop.dfs.DataNode: xxx:50010:Got exception while serving blk_3304550638094049
> 753 to /yyy:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.
> SocketChannel[connected local=/xxx:50010 remote=/yyy:42542]
>         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:170)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:144)
>         at org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:105)
>         at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105) 
>         at java.io.DataOutputStream.write(DataOutputStream.java:90)
>         at org.apache.hadoop.dfs.DataNode$BlockSender.sendChunks(DataNode.java:1774)
>         at org.apache.hadoop.dfs.DataNode$BlockSender.sendBlock(DataNode.java:1813)
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.readBlock(DataNode.java:1039) 
>         at org.apache.hadoop.dfs.DataNode$DataXceiver.run(DataNode.java:968)
>         at java.lang.Thread.run(Thread.java:619)
> DFS Client:
> 08/07/24 00:13:28 WARN dfs.DFSClient: Exception while reading from blk_3304550638094049753 of zzz from xxx:50010: java.io.IOException: Premeture EOF from inputStream
>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:100)
>     at org.apache.hadoop.dfs.DFSClient$BlockReader.readChunk(DFSClient.java:967)
>     at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:236)
>     at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:191)
>     at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:159)
>     at org.apache.hadoop.dfs.DFSClient$BlockReader.read(DFSClient.java:829)
>     at org.apache.hadoop.dfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1352)
>     at org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1388)
>     at java.io.DataInputStream.read(DataInputStream.java:83)
> 08/07/24 00:13:28 INFO dfs.DFSClient: Could not obtain block blk_3304550638094049753 from any node:  java.io.IOException: No live nodes contain current block

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.