You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Todd Lipcon (Created) (JIRA)" <ji...@apache.org> on 2011/09/28 06:15:45 UTC
[jira] [Created] (HDFS-2378) recoverBlock timeout in DFSClient
should be longer
recoverBlock timeout in DFSClient should be longer
--------------------------------------------------
Key: HDFS-2378
URL: https://issues.apache.org/jira/browse/HDFS-2378
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs client
Affects Versions: 0.20.206.0, 0.23.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical
Fix For: 0.20.206.0, 0.23.0
In a failure scenario when one of the datanodes in a pipeline has "frozen" (eg hard swapping or disk controller issues) we sometimes see timeouts in the call to recoverBlock(). This is because recoverBlock's implementation sends several RPCs internally (to the NN and to other nodes in the pipeline) with the same timeout. Since the timeouts are equal, the "outer" call times out first. The retry then fails since recovery is already in progress, or already finished.
The best fix would be to make recoverBlock idempotent so the retry doesn't fail, but in the absence of that we can likely fix this issue by increasing the timeout to be equal to the sum of the timeouts of the underlying recovery calls.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-2378) recoverBlock timeout in DFSClient
should be longer
Posted by "Uma Maheswara Rao G (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HDFS-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Uma Maheswara Rao G resolved HDFS-2378.
---------------------------------------
Resolution: Duplicate
Since the patch in HDFS-2637 already got +1.
After discussing with Todd, this can be duplicated. Marking it as duplicate of HDFS-2637.
> recoverBlock timeout in DFSClient should be longer
> --------------------------------------------------
>
> Key: HDFS-2378
> URL: https://issues.apache.org/jira/browse/HDFS-2378
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs client
> Affects Versions: 0.23.0, 1.1.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Priority: Critical
> Fix For: 0.24.0
>
>
> In a failure scenario when one of the datanodes in a pipeline has "frozen" (eg hard swapping or disk controller issues) we sometimes see timeouts in the call to recoverBlock(). This is because recoverBlock's implementation sends several RPCs internally (to the NN and to other nodes in the pipeline) with the same timeout. Since the timeouts are equal, the "outer" call times out first. The retry then fails since recovery is already in progress, or already finished.
> The best fix would be to make recoverBlock idempotent so the retry doesn't fail, but in the absence of that we can likely fix this issue by increasing the timeout to be equal to the sum of the timeouts of the underlying recovery calls.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira