You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Robert Dyer <ps...@gmail.com> on 2013/06/04 08:12:34 UTC

HDFS edit log NPE

I recently upgraded from 1.0.4 to 1.1.2.  Now however my HDFS won't start
up.  There appears to be something wrong in the edits file.

Obviously I can roll back to a previous checkpoint, however it appears
checkpointing has been failing for some time and my last check point is
over a month old.

Is there a way to manually edit/inspect the edits file in 1.1.2 so I can
fix this?  What is causing this bug?

-------------------------------------------

2013-06-04 01:07:15,952 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 1111
2013-06-04 01:07:16,071 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 7
2013-06-04 01:07:16,073 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 270269 loaded in 0 seconds.
2013-06-04 01:07:16,075 ERROR org.apache.hadoop.hdfs.server.common.Storage:
Error replaying edit log at offset 132
Recent opcode offsets: 5 14
java.lang.NullPointerException
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1124)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1136)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1021)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1008)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:756)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,077 ERROR
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem
initialization failed.
java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,078 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Error
replaying edit log at offset 132
Recent opcode offsets: 5 14
        at
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)

2013-06-04 01:07:16,078 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Re: HDFS edit log NPE

Posted by Tsz Wo Sze <sz...@yahoo.com>.
Is it an operation error on upgrade since the edit is non-empty?  The 
original image and edit should be still available.  If it is the case, I suggest to start NN with 1.0.4 so that the edit becomes empty, and then try upgrade again.


> Recent opcode offsets: 5 14

BTW, opcode 5 is OP_DATANODE_ADD which was deprecated long time ago.  It 
seems that v1.1.2 cannot understand v1.0.4 edit.  Otherwise, the 
edit log is corrupted.

Hope it helps.
Tsz-Wo




________________________________
 From: Robert Dyer <ps...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Tuesday, June 4, 2013 2:12 PM
Subject: HDFS edit log NPE
 


I recently upgraded from 1.0.4 to 1.1.2.  Now however my HDFS won't start up.  There appears to be something wrong in the edits file.

Obviously I can roll back to a previous checkpoint, however it appears checkpointing has been failing for some time and my last check point is over a month old.

Is there a way to manually edit/inspect the edits file in 1.1.2 so I can fix this?  What is causing this bug?

-------------------------------------------

2013-06-04 01:07:15,952 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1111
2013-06-04 01:07:16,071 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7
2013-06-04 01:07:16,073 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 270269 loaded in 0 seconds.
2013-06-04 01:07:16,075 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1124)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1136)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1021)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1008)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:756)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,077 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,078 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)

2013-06-04 01:07:16,078 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Re: HDFS edit log NPE

Posted by Tsz Wo Sze <sz...@yahoo.com>.
Is it an operation error on upgrade since the edit is non-empty?  The 
original image and edit should be still available.  If it is the case, I suggest to start NN with 1.0.4 so that the edit becomes empty, and then try upgrade again.


> Recent opcode offsets: 5 14

BTW, opcode 5 is OP_DATANODE_ADD which was deprecated long time ago.  It 
seems that v1.1.2 cannot understand v1.0.4 edit.  Otherwise, the 
edit log is corrupted.

Hope it helps.
Tsz-Wo




________________________________
 From: Robert Dyer <ps...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Tuesday, June 4, 2013 2:12 PM
Subject: HDFS edit log NPE
 


I recently upgraded from 1.0.4 to 1.1.2.  Now however my HDFS won't start up.  There appears to be something wrong in the edits file.

Obviously I can roll back to a previous checkpoint, however it appears checkpointing has been failing for some time and my last check point is over a month old.

Is there a way to manually edit/inspect the edits file in 1.1.2 so I can fix this?  What is causing this bug?

-------------------------------------------

2013-06-04 01:07:15,952 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1111
2013-06-04 01:07:16,071 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7
2013-06-04 01:07:16,073 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 270269 loaded in 0 seconds.
2013-06-04 01:07:16,075 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1124)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1136)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1021)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1008)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:756)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,077 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,078 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)

2013-06-04 01:07:16,078 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Re: HDFS edit log NPE

Posted by Tsz Wo Sze <sz...@yahoo.com>.
Is it an operation error on upgrade since the edit is non-empty?  The 
original image and edit should be still available.  If it is the case, I suggest to start NN with 1.0.4 so that the edit becomes empty, and then try upgrade again.


> Recent opcode offsets: 5 14

BTW, opcode 5 is OP_DATANODE_ADD which was deprecated long time ago.  It 
seems that v1.1.2 cannot understand v1.0.4 edit.  Otherwise, the 
edit log is corrupted.

Hope it helps.
Tsz-Wo




________________________________
 From: Robert Dyer <ps...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Tuesday, June 4, 2013 2:12 PM
Subject: HDFS edit log NPE
 


I recently upgraded from 1.0.4 to 1.1.2.  Now however my HDFS won't start up.  There appears to be something wrong in the edits file.

Obviously I can roll back to a previous checkpoint, however it appears checkpointing has been failing for some time and my last check point is over a month old.

Is there a way to manually edit/inspect the edits file in 1.1.2 so I can fix this?  What is causing this bug?

-------------------------------------------

2013-06-04 01:07:15,952 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1111
2013-06-04 01:07:16,071 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7
2013-06-04 01:07:16,073 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 270269 loaded in 0 seconds.
2013-06-04 01:07:16,075 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1124)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1136)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1021)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1008)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:756)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,077 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,078 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)

2013-06-04 01:07:16,078 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

Re: HDFS edit log NPE

Posted by Tsz Wo Sze <sz...@yahoo.com>.
Is it an operation error on upgrade since the edit is non-empty?  The 
original image and edit should be still available.  If it is the case, I suggest to start NN with 1.0.4 so that the edit becomes empty, and then try upgrade again.


> Recent opcode offsets: 5 14

BTW, opcode 5 is OP_DATANODE_ADD which was deprecated long time ago.  It 
seems that v1.1.2 cannot understand v1.0.4 edit.  Otherwise, the 
edit log is corrupted.

Hope it helps.
Tsz-Wo




________________________________
 From: Robert Dyer <ps...@gmail.com>
To: "user@hadoop.apache.org" <us...@hadoop.apache.org> 
Sent: Tuesday, June 4, 2013 2:12 PM
Subject: HDFS edit log NPE
 


I recently upgraded from 1.0.4 to 1.1.2.  Now however my HDFS won't start up.  There appears to be something wrong in the edits file.

Obviously I can roll back to a previous checkpoint, however it appears checkpointing has been failing for some time and my last check point is over a month old.

Is there a way to manually edit/inspect the edits file in 1.1.2 so I can fix this?  What is causing this bug?

-------------------------------------------

2013-06-04 01:07:15,952 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 1111
2013-06-04 01:07:16,071 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 7
2013-06-04 01:07:16,073 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 270269 loaded in 0 seconds.
2013-06-04 01:07:16,075 ERROR org.apache.hadoop.hdfs.server.common.Storage: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
java.lang.NullPointerException
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1124)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1136)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1021)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedMkdir(FSDirectory.java:1008)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:756)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,077 ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem initialization failed.
java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)
2013-06-04 01:07:16,078 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.io.IOException: Error replaying edit log at offset 132
Recent opcode offsets: 5 14
        at org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:84)
        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:929)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1025)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:841)
        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:377)
        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:411)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:379)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:284)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:536)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1410)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1419)

2013-06-04 01:07:16,078 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: