You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Varun Sharma <va...@pinterest.com> on 2013/04/19 23:28:18 UTC

Meaning of UNDER_RECOVERY blocks

Hi,

I had an instance where a datanode died while writing the block I am using
Hadoop 2.0 patched with HDFS 3703 for stale node detection every 20 seconds.

The block being written to, went into the UNDER_RECOVERY state looking at
the namenode logs and there were several internalRecoverLease() calls
because there were readers on that blcok. I had a couple of questions about
the code;

1) I see that when a block is UNDER_RECOVERY, it is added to recoverBlocks
for each dataNodeDescriptor that holds the block. Then a recoverBlock call
is issued to each primary data node. What does the recoverBlock call do on
a datanode - does it sync the block on that node to other 2 data nodes. In
my case one of the data node is unreachable, what is the behaviour in such
a case ?

2) When a client wants to read a block which is "UNDER_RECOVERY" - do we
continue to suggest all 3 data nodes as replicas for reads or we pick the
one which is marked as primary for the block recovery ?

Thanks

Re: Meaning of UNDER_RECOVERY blocks

Posted by Varun Sharma <va...@pinterest.com>.
Would be nice if someone could help out with this - it looks like a trivial
question - but seems like some blocks are being lost for us when datanodes
fail...

Varun


On Fri, Apr 19, 2013 at 2:28 PM, Varun Sharma <va...@pinterest.com> wrote:

> Hi,
>
> I had an instance where a datanode died while writing the block I am using
> Hadoop 2.0 patched with HDFS 3703 for stale node detection every 20 seconds.
>
> The block being written to, went into the UNDER_RECOVERY state looking at
> the namenode logs and there were several internalRecoverLease() calls
> because there were readers on that blcok. I had a couple of questions about
> the code;
>
> 1) I see that when a block is UNDER_RECOVERY, it is added to recoverBlocks
> for each dataNodeDescriptor that holds the block. Then a recoverBlock call
> is issued to each primary data node. What does the recoverBlock call do on
> a datanode - does it sync the block on that node to other 2 data nodes. In
> my case one of the data node is unreachable, what is the behaviour in such
> a case ?
>
> 2) When a client wants to read a block which is "UNDER_RECOVERY" - do we
> continue to suggest all 3 data nodes as replicas for reads or we pick the
> one which is marked as primary for the block recovery ?
>
> Thanks
>