You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Sakthivel Murugasamy <sa...@gmail.com> on 2011/07/15 11:09:30 UTC

Re: Namenode not get started. Reason: FSNamesystem initialization failed.

Dear Team,

I have loaded 3.6GB of compressed(bz2) directly into Hive, after that I ran
a simple "*select query*", namenode got crashed.
There after not able to start namenode.

Envoronment:

   - CentOS release 5.5 (Final), Hadoop Version: 0.20.2
   - Cluster size: 18 node
   - NameNode & SecondaryNamenode are in the same Machine

It seems editlogs/fsimage got corrupted, I haven't take any backup
separately, below is the exception

2011-07-14 23:37:43,378 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.FileNotFoundException: File does not exist:
/opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@

*Please find detailed exception in namenode's log file attached.*

Earlier, I have also posted in JIRA,
https://issues.apache.org/jira/browse/HADOOP-7458 , Jakob Homan directed me
to post in hdfs user's list.

Will there be any backup in SecondaryNamenode? Could you please assist me to
recover Namenode from this issue?


Thanks,
Sakthivel

Re: Namenode not get started. Reason: FSNamesystem initialization failed.

Posted by Sakthivel Murugasamy <sa...@gmail.com>.

Hi All,

Thank you so much for your valuable solutions!

*Problem got resolved, but significant time+data loss*(since we were running
on an experimental basis, reloaded fewer GB of the data). I used
-importCheckpoint option.

I just would like to tell you the possible scenario/reason of editlog
corruption might have happened(correct me if I am wrong),

Below were the typical configurations in hdfs-site.xml

   - hadoop.tmp.dir : */opt/data*/tmp
   - dfs.name.dir : */opt/data*/name
   - dfs.data.dir : */opt/data*/data
   - mapred.local.dir  : ${hadoop.tmp.dir}/mapred/local

*/opt/data *is an mounted storage, size is 50GB. Namenode,
SecondaryNamenode( ${hadoop.tmp.dir}/dfs/namesecondary) & Datanode
directories were configured within */opt/data *itself.

Once I moved 3.6GB compressed(bz2) file, I guess */opt/data *memory usage of
this dir. could have been 100%(I checked($df -h) after this incident). Then,
I ran Hive with simple* "Select" *query, its job.jar files also needs to be
created within the same directory which already has no space. So this is how
the editlog corruption could have been occurred.

This is really a good learning for me! Now I have changed that
configurations.

Thanks again,
Sakthivel


On Fri, Jul 15, 2011 at 4:47 PM, Brahma Reddy <br...@huawei.com>wrote:

>  Hi,****
>
> ** **
>
> **1)       **This can be achieved either by copying the relevant storage
> directory to a new name node ,****
>
> **2)       **or, if the secondary is taking over as the new name node
> daemon .by using the –import checkpoint option when starting the name node
> daemon. The –importcheckopoint option will load the name node metadata from
> the latest checkpoint in the directory defined by the *fs.chekpoint.dir*property, but only if there is no metadata in the dfs.name.dir,so there is
> no risk of overwriting precious data****
>
>  ****
>
> Regards****
>
> Brahma Reddy****
>
> ** **
>
>
> ***************************************************************************************
> This e-mail and attachments contain confidential information from HUAWEI,
> which is intended only for the person or entity whose address is listed
> above. Any use of the information contained herein in any way (including,
> but not limited to, total or partial disclosure, reproduction, or
> dissemination) by persons other than the intended recipient's) is
> prohibited. If you receive this e-mail in error, please notify the sender by
> phone or email immediately and delete it!****
>   ------------------------------
>
> *From:* Sakthivel Murugasamy [mailto:sakthiinfotec@gmail.com]
> *Sent:* Friday, July 15, 2011 2:40 PM
>
> *To:* hdfs-user@hadoop.apache.org
> *Subject:* Re: Namenode not get started. Reason: FSNamesystem
> initialization failed.****
>
> ** **
>
> Dear Team,
>
> I have loaded 3.6GB of compressed(bz2) directly into Hive, after that I ran
> a simple "*select query*", namenode got crashed.
> There after not able to start namenode.
>
> Envoronment: ****
>
>    - CentOS release 5.5 (Final), Hadoop Version: 0.20.2****
>    - Cluster size: 18 node ****
>    - NameNode & SecondaryNamenode are in the same Machine****
>
> It seems editlogs/fsimage got corrupted, I haven't take any backup
> separately, below is the exception
>
> 2011-07-14 23:37:43,378 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
> initialization failed.
> java.io.FileNotFoundException: File does not exist:
> /opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@
>
> *Please find detailed exception in namenode's log file attached.*
>
> Earlier, I have also posted in JIRA,
> https://issues.apache.org/jira/browse/HADOOP-7458 , Jakob Homan directed
> me to post in hdfs user's list.
>
> Will there be any backup in SecondaryNamenode? Could you please assist me
> to recover Namenode from this issue?
>
>
> Thanks,
> Sakthivel ****
>

RE: Namenode not get started. Reason: FSNamesystem initialization failed.

Posted by Brahma Reddy <br...@huawei.com>.

Hi,

 

1)       This can be achieved either by copying the relevant storage
directory to a new name node ,

2)       or, if the secondary is taking over as the new name node daemon .by
using the -import checkpoint option when starting the name node daemon. The
-importcheckopoint option will load the name node metadata from the latest
checkpoint in the directory defined by the fs.chekpoint.dir property, but
only if there is no metadata in the dfs.name.dir,so there is no risk of
overwriting precious data

 

Regards

Brahma Reddy

 

****************************************************************************
***********
This e-mail and attachments contain confidential information from HUAWEI,
which is intended only for the person or entity whose address is listed
above. Any use of the information contained herein in any way (including,
but not limited to, total or partial disclosure, reproduction, or
dissemination) by persons other than the intended recipient's) is
prohibited. If you receive this e-mail in error, please notify the sender by
phone or email immediately and delete it!

  _____  

From: Sakthivel Murugasamy [mailto:sakthiinfotec@gmail.com] 
Sent: Friday, July 15, 2011 2:40 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Namenode not get started. Reason: FSNamesystem initialization
failed.

 

Dear Team,

I have loaded 3.6GB of compressed(bz2) directly into Hive, after that I ran
a simple "select query", namenode got crashed.
There after not able to start namenode.

Envoronment: 

*	CentOS release 5.5 (Final), Hadoop Version: 0.20.2
*	Cluster size: 18 node 
*	NameNode & SecondaryNamenode are in the same Machine

It seems editlogs/fsimage got corrupted, I haven't take any backup
separately, below is the exception

2011-07-14 23:37:43,378 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.FileNotFoundException: File does not exist:
/opt/data/tmp/mapred/system/job_201107041958_0120/j^@^@^@^@^@^@

Please find detailed exception in namenode's log file attached.

Earlier, I have also posted in JIRA,
https://issues.apache.org/jira/browse/HADOOP-7458 , Jakob Homan directed me
to post in hdfs user's list.

Will there be any backup in SecondaryNamenode? Could you please assist me to
recover Namenode from this issue?


Thanks,
Sakthivel