You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2008/04/10 00:10:06 UTC

[jira] Issue Comment Edited: (HADOOP-3124) DFS data node should not use hard coded 10 minutes as write timeout.

    [ https://issues.apache.org/jira/browse/HADOOP-3124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12587399#action_12587399 ] 

rangadi edited comment on HADOOP-3124 at 4/9/08 3:08 PM:
--------------------------------------------------------------

2 minutes is fine for writes.. though it does not really improve much. Would it matter in the absence of HADOOP-3132? 

I am more concerned about clients reading from DFS since this timeout current applies to those connections as well. Currently DFSClient treats these connection failures as real errors and will try different datanode. I think we need to fix DFSClient before being more aggressive about this timeout.

0.17 would be the first release that has such a timeout. I am not sure if we should have an aggressive value in the first release.

That said, I am not strongly opposed to reducing it. 

      was (Author: rangadi):
    
2 minutes is fine for writes.. though it does not really improve much. Would it matter in the absence of HADOOP-3132? 

I am more concerned about clients reading from DFS since this timeout current applies to those connections as well. Currently DFSClient treats these connection failures are real errors and will try different datanode. I think we need to fix DFSClient before being more aggressive about this timeout.

0.17 would be the first release that has such a timeout. I am not sure if we should have an aggressive value. 

That said, I am not strongly opposed to reducing it. 
  
> DFS data node should not use hard coded 10 minutes as write timeout.
> --------------------------------------------------------------------
>
>                 Key: HADOOP-3124
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3124
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.0
>            Reporter: Runping Qi
>            Assignee: Raghu Angadi
>         Attachments: HADOOP-3124.patch
>
>
> This problem happens in 0.17 trunk
> I saw reducers waited 10 minutes for writing data to dfs and got timeout.
> The client retries again and timeouted after another 19 minutes.
> After looking into the code, it seems that the dfs data node uses 10 minutes as timeout for wtiting data into the data node pipeline.
> I thing we have three issues:
> 1. The 10 minutes timeout value is too big for writing a chunk of data (64K) through the data node pipeline.
> 2. The timeout value should not be hard coded.
> 3. Different datanodes in a pipeline should use different timeout values for writing to the downstream.
> A reasonable one maybe (20 secs * numOfDataNodesInTheDownStreamPipe).
> For example, if the replication factor is 3, the client uses 60 secs, the first data node use 40 secs, the second datanode use 20secs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.