You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2008/03/13 21:36:33 UTC

Question about recovering from a corrupted namenode 0.16.0

The namenode ran out of disk space and on restart was throwing the error 
at the end of this message.

We copied in the edit.tmp to edit from the secondary, and copied in 
srcimage to fsimage, and removed edit.new and our file system started up
and /appears/ to be intact.

What is the proper procedure, we didn't find any details on the wiki.

Namenode error:
2008-03-13 13:19:32,493 ERROR org.apache.hadoop.dfs.NameNode: 
java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:507)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:744)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:624)
    at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
    at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
    at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)



-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

RE: Question about recovering from a corrupted namenode 0.16.0

Posted by dhruba Borthakur <dh...@yahoo-inc.com>.
Your procedure is right:

1. Copy edit.tmp from secondary to edit on primary
2. Copy srcimage from secondary to fsimage on primary 
3. remove edits.new on primary
4. restart cluster, put in Safemode, fsck /

However, the above steps are not foolproof because the transactions that
occured between the time when the last checkpoint was taken by the
secondary and when the disk became full are lost. This could cause some
blocks to go missing too, because the last checkpoint might refer to
blocks that are no longer present. If the fsck does not report any
missing blocks, then you are good to go.

Thanks,
dhruba

-----Original Message-----
From: Jason Venner [mailto:jason@attributor.com] 
Sent: Thursday, March 13, 2008 1:37 PM
To: core-user@hadoop.apache.org
Subject: Question about recovering from a corrupted namenode 0.16.0

The namenode ran out of disk space and on restart was throwing the error

at the end of this message.

We copied in the edit.tmp to edit from the secondary, and copied in 
srcimage to fsimage, and removed edit.new and our file system started up
and /appears/ to be intact.

What is the proper procedure, we didn't find any details on the wiki.

Namenode error:
2008-03-13 13:19:32,493 ERROR org.apache.hadoop.dfs.NameNode: 
java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at
org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:507)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:744)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:624)
    at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
    at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
    at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)



-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested