You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by mouradk <mo...@googlemail.com> on 2012/07/30 19:30:09 UTC

Fix a corrupt edits file?

Hello all,

I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside.

This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem?

Thanks for your help.

Original error log:
TARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Long.parseLong(Long.java:419)
    at java.lang.Long.parseLong(Long.java:468)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:



Mouradk


Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


Re: Fix a corrupt edits file?

Posted by Kihwal Lee <ki...@yahoo-inc.com>.
Probably the last entry is partial or is complete but not terminated
properly. You need to hexedit the file in order to correct the error. You
can also pull HDFS-1378 and figure out the offset where you can put
OP_INVALID (0xff). HDFS-3055 implements the interactive recovery mode,
which makes it even easier.

Kihwal




On 7/30/12 12:30 PM, "mouradk" <mo...@googlemail.com> wrote:

>Hello all,
>
>I have just had a problem with a NameNode restart and someone on the
>mailing list kindly suggested that the edits file was corrupted. I have
>made a backup copy of the file and checked my
>/namesecondary/previous.checkpoint but the edits file there is empty 4kb
>with ????? inside.
>
>This suggest to me that I cannot recover from the secondaryNameNode? How
>do you fix this problem?
>
>Thanks for your help.
>
>Original error log:
>TARTUP_MSG:   build
>=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
>911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>************************************************************/
>2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
>Initializing RPC Metrics with hostName=NameNode, port=50001
>2012-07-30 16:02:23,656 INFO
>org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at:
>localhost/127.0.0.1:50001
>2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>Initializing JVM Metrics with processName=NameNode, sessionId=null
>2012-07-30 16:02:23,660 INFO
>org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
>Initializing NameNodeMeterics using context
>object:org.apache.hadoop.metrics.spi.NullContext
>2012-07-30 16:02:23,714 INFO
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>2012-07-30 16:02:23,714 INFO
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>2012-07-30 16:02:23,714 INFO
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
>isPermissionEnabled=false
>2012-07-30 16:02:23,721 INFO
>org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
>Initializing FSNamesystemMetrics using context
>object:org.apache.hadoop.metrics.spi.NullContext
>2012-07-30 16:02:23,723 INFO
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
>FSNamesystemStatusMBean
>2012-07-30 16:02:23,756 INFO
>org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
>2012-07-30 16:02:23,833 INFO
>org.apache.hadoop.hdfs.server.common.Storage: Number of files under
>construction = 2
>2012-07-30 16:02:23,835 INFO
>org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400
>loaded in 0 seconds.
>2012-07-30 16:02:23,844 ERROR
>org.apache.hadoop.hdfs.server.namenode.NameNode:
>java.lang.NumberFormatException: For input string: "1343506"
>    at 
>java.lang.NumberFormatException.forInputString(NumberFormatException.java:
>48)
>    at java.lang.Long.parseLong(Long.java:419)
>    at java.lang.Long.parseLong(Long.java:468)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1
>273)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.jav
>a:775)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:99
>2)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:81
>2)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSIma
>ge.java:364)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory
>.java:87)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesyste
>m.java:311)
>    at 
>org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.ja
>va:292)
>    at 
>org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:2
>01)
>    at 
>org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>    at 
>org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.ja
>va:956)
>    at 
>org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>
>2012-07-30 16:02:23,845 INFO
>org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
>
>Mouradk
>
>
>Mouradk
>Sent with Sparrow (http://www.sparrowmailapp.com/?sig)