You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/03/28 22:50:24 UTC
[jira] Commented: (HADOOP-3124) DFS data node should not use hard
coded 10 minutes as write timeout.
[ https://issues.apache.org/jira/browse/HADOOP-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12583206#action_12583206 ]
Raghu Angadi commented on HADOOP-3124:
--------------------------------------
> 3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
This is already the case : timeout is : 10 min + 5sec * (number of datanodes - position in the pipeline). Client's postion is 0, first datanode's position is 1 etc.
+1 for making this a config.
Note that there was no timeout for this before 0.17, it client would get stuck forever. 10 min was added as a very conservative value. What should be the default?
Though not relevant here, probably we need different write timeouts while receiving a block and while sending a block.
I am curious to know if you have any info on why one of the datanode's was not able to read for 10minutes.
> DFS data node should not use hard coded 10 minutes as write timeout.
> --------------------------------------------------------------------
>
> Key: HADOOP-3124
> URL: https://issues.apache.org/jira/browse/HADOOP-3124
> Project: Hadoop Core
> Issue Type: Bug
> Reporter: Runping Qi
>
> This problem happens in 0.17 trunk
> I saw reducers waited 10 minutes for writing data to dfs and got timeout.
> The client retries again and timeouted after another 19 minutes.
> After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline.
> I thing we have three issues:
> 1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline.
> 2. The timeout value should not be hard coded.
> 3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
> A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
> For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.