You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Pratyush Banerjee <pr...@aol.com> on 2008/11/14 05:39:20 UTC

NameNode does not come out of Safemode automatically in Hadoop-0.17.2

Hi All,

We have been using hadoop-0.17.2 for some time now and we just had a 
case of namenode crash due to disk being full.
In order to get the namenode up again with minimal loss of data, we had 
to manually edit the edits file in a Hex editor and restart the namenode.

However after restarting, the namenode went to the safe mode (as 
expected), but it has been hours since it is like that, and it has not 
yet come out of the  safemode.
We can obviously force it to come out but should it not come out 
automatically ?
Even after 12 hours of remaining in safemode the ratio of reported block 
size is still stuck at  0.9768.

Running fsck on / in the hdfs does report about some corrupt files.
 
What is the  issue which is blocking namenode form coming out of 
safemode ? If we have to do it manually (hadoop dfsadmin -safemode 
leave) then what procedure do we follow in the process to ensure data 
safety ?

thanks and regards,

Pratyush

Re: NameNode does not come out of Safemode automatically in Hadoop-0.17.2

Posted by Pratyush Banerjee <pr...@aol.com>.

Thanks Lohit,

We were able to clean up the stuff with minimal data loss.

Pratyush
lohit wrote:
> Namenode does not come out of safemode until it gets confirmation from datanodes about the blocks it has. Namenode has a view of filesystem and its blocks, it expects those blocks to be reported by datanode until which it decides that the filesystem is not ready to use yet. You can exit the namenode from safemode and do 'dfs fsck / -files -blocks -locations' which will tell you the missing blocks and their locations where it expects from. Check if those nodes are up and running an they have those blocks. 
> Thanks,
> Lohit
>
>
>
> ----- Original Message ----
> From: Pratyush Banerjee <pr...@aol.com>
> To: core-user@hadoop.apache.org
> Sent: Thursday, November 13, 2008 8:39:20 PM
> Subject: NameNode does not come out of Safemode automatically in Hadoop-0.17.2
>
> Hi All,
>
> We have been using hadoop-0.17.2 for some time now and we just had a case of namenode crash due to disk being full.
> In order to get the namenode up again with minimal loss of data, we had to manually edit the edits file in a Hex editor and restart the namenode.
>
> However after restarting, the namenode went to the safe mode (as expected), but it has been hours since it is like that, and it has not yet come out of the  safemode.
> We can obviously force it to come out but should it not come out automatically ?
> Even after 12 hours of remaining in safemode the ratio of reported block size is still stuck at  0.9768.
>
> Running fsck on / in the hdfs does report about some corrupt files.
>
> What is the  issue which is blocking namenode form coming out of safemode ? If we have to do it manually (hadoop dfsadmin -safemode leave) then what procedure do we follow in the process to ensure data safety ?
>
> thanks and regards,
>
> Pratyush
>
>

Re: NameNode does not come out of Safemode automatically in Hadoop-0.17.2

Posted by lohit <lo...@yahoo.com>.

Namenode does not come out of safemode until it gets confirmation from datanodes about the blocks it has. Namenode has a view of filesystem and its blocks, it expects those blocks to be reported by datanode until which it decides that the filesystem is not ready to use yet. You can exit the namenode from safemode and do 'dfs fsck / -files -blocks -locations' which will tell you the missing blocks and their locations where it expects from. Check if those nodes are up and running an they have those blocks. 
Thanks,
Lohit

----- Original Message ----
From: Pratyush Banerjee <pr...@aol.com>
To: core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 8:39:20 PM
Subject: NameNode does not come out of Safemode automatically in Hadoop-0.17.2

Hi All,

We have been using hadoop-0.17.2 for some time now and we just had a case of namenode crash due to disk being full.
In order to get the namenode up again with minimal loss of data, we had to manually edit the edits file in a Hex editor and restart the namenode.

However after restarting, the namenode went to the safe mode (as expected), but it has been hours since it is like that, and it has not yet come out of the  safemode.
We can obviously force it to come out but should it not come out automatically ?
Even after 12 hours of remaining in safemode the ratio of reported block size is still stuck at  0.9768.

Running fsck on / in the hdfs does report about some corrupt files.

What is the  issue which is blocking namenode form coming out of safemode ? If we have to do it manually (hadoop dfsadmin -safemode leave) then what procedure do we follow in the process to ensure data safety ?

thanks and regards,

Pratyush