You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/30 22:24:40 UTC
[jira] [Resolved] (HDFS-1225) Block lost when primary crashes in
recoverBlock
[ https://issues.apache.org/jira/browse/HDFS-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HDFS-1225.
------------------------------------
Resolution: Incomplete
append got overhauled in 2.x. closing.
> Block lost when primary crashes in recoverBlock
> -----------------------------------------------
>
> Key: HDFS-1225
> URL: https://issues.apache.org/jira/browse/HDFS-1225
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 0.20-append
> Reporter: Thanh Do
>
> - Summary: Block is lost if primary datanode crashes in the middle tryUpdateBlock.
>
> - Setup:
> # available datanode = 2
> # replica = 2
> # disks / datanode = 1
> # failures = 1
> # failure type = crash
> When/where failure happens = (see below)
>
> - Details:
> Suppose we have 2 datanodes: dn1 and dn2 and dn1 is primary.
> Client appends to blk_X_1001 and crash happens during dn1.recoverBlock,
> at the point after blk_X_1001.meta is renamed to blk_X_1001.meta_tmp1002
> **Interesting**, this case, the block X is lost eventually. Why?
> After dn1.recoverBlock crashes at rename, what left at dn1 current directory is:
> 1) blk_X
> 2) blk_X_1001.meta_tmp1002
> ==> this is an invalid block, because it has no meta file associated with it.
> dn2 (after dn1 crash) now contains:
> 1) blk_X
> 2) blk_X_1002.meta
> (note that the rename at dn2 is completed, because dn1 called dn2.updateBlock() before
> calling its own updateBlock())
> But the command namenode.commitBlockSynchronization is not reported to namenode,
> because dn1 is crashed. Therefore, from namenode point of view, the block X has GS 1001.
> Hence, the block is lost.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (thanhdo@cs.wisc.edu) and
> Haryadi Gunawi (haryadi@eecs.berkeley.edu)
--
This message was sent by Atlassian JIRA
(v6.2#6252)