You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Wei Wu <wo...@gmail.com> on 2011/07/07 15:14:52 UTC

NameNode stuck in safemode without few missing blocks

Hi,

We encountered a strange situation when restarting NameNode: it can not
leave safe mode automatically. "The ratio of reported blocks 0.9986 has not
reached the threshold 0.999". Our cluster has totally 83,276,820 blocks. So,
if the counter is right, we are missing about 116,587 blocks. But fsck
reported 83,276,779 blocks were healthy and 37 blocks in open files. Only 4
blocks were marked as corrupt because its length is shorter than existing
ones. If the fsck result is believable, we got ratio higher than 0.999999
and the threshold was reached.

I think maybe the counter of blockSafe didn't function accurately. Is that
possible? Our case is similar to the situation described in jira:
https://issues.apache.org/jira/browse/HADOOP-2159 (our Hadoop release
already included this patch).

Any suggestions?

Wei