You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by mouradk <mo...@googlemail.com> on 2012/07/30 18:26:12 UTC

Cannot restart Namenode after disk full

Dear all,

We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:


TARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Long.parseLong(Long.java:419)
    at java.lang.Long.parseLong(Long.java:468)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:


Your help is much appreciated!!


Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


Re: Fixing a corrupt edits file?

Posted by Terrence Martin <tm...@physics.ucsd.edu>.
You do not fix the edits file. :) When this exact issue has occurred 
here I have had to revert to my SNN copy of the hadoop database.

For us it is not too bad as at most the lost time is around 30 minutes 
or less. The reason is we run our merges from the SNN pretty frequently.

Terrence


On 7/30/2012 10:35 AM, mouradk wrote:
> Hi Terrence,
>
> Thanks for your reply. How do I go about fixing the edits file in the NameNode. Your help is much appreciated!!
>
> Thanks
>
> Mourad
>
> Mouradk
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Monday, 30 July 2012 at 18:33, Terrence Martin wrote:
>
>> The purpose for the secondary name node is to assist in the merging of
>> the edits file (and an edits.new if it exists) into the main hadoop
>> file. The reason the edits file is 0 on the SNN is that is because that
>> is the proper state after the edits file has been merged with the main
>> database file.
>>
>> In other words an empty edits file on the SNN is what you want.
>>
>> Terrence
>>
>>
>> On 7/30/2012 10:29 AM, mouradk wrote:
>>> Hello all,
>>>
>>> I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside.
>>>
>>> This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem?
>>>
>>> Thanks for your help.
>>>
>>> Original error log:
>>> TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>> ************************************************************/
>>> 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
>>> 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
>>> 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
>>> 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
>>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
>>> 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
>>> 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
>>> 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
>>> 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
>>> 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
>>> 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
>>> at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>> at java.lang.Long.parseLong(Long.java:419)
>>> at java.lang.Long.parseLong(Long.java:468)
>>> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>>> at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>>> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>>> at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>>> at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>>> at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>>> at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>>>
>>> 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>>
>>>
>>>
>>> Mouradk
>


Re: Fixing a corrupt edits file?

Posted by mouradk <mo...@googlemail.com>.
Hi Terrence,

Thanks for your reply. How do I go about fixing the edits file in the NameNode. Your help is much appreciated!!

Thanks

Mourad 

Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, 30 July 2012 at 18:33, Terrence Martin wrote:

> The purpose for the secondary name node is to assist in the merging of 
> the edits file (and an edits.new if it exists) into the main hadoop 
> file. The reason the edits file is 0 on the SNN is that is because that 
> is the proper state after the edits file has been merged with the main 
> database file.
> 
> In other words an empty edits file on the SNN is what you want.
> 
> Terrence
> 
> 
> On 7/30/2012 10:29 AM, mouradk wrote:
> > Hello all,
> > 
> > I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside.
> > 
> > This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem?
> > 
> > Thanks for your help.
> > 
> > Original error log:
> > TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> > ************************************************************/
> > 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> > 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> > 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> > 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> > 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> > 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> > 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> > 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> > 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> > 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
> > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> > at java.lang.Long.parseLong(Long.java:419)
> > at java.lang.Long.parseLong(Long.java:468)
> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
> > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
> > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> > 
> > 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > 
> > 
> > 
> > Mouradk 


Re: Fixing a corrupt edits file?

Posted by Terrence Martin <tm...@physics.ucsd.edu>.
The purpose for the secondary name node is to assist in the merging of 
the edits file (and an edits.new if it exists) into the main hadoop 
file. The reason the edits file is 0 on the SNN is that is because that 
is the proper state after the edits file has been merged with the main 
database file.

In other words an empty edits file on the SNN is what you want.

Terrence


On 7/30/2012 10:29 AM, mouradk wrote:
> Hello all,
>
> I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside.
>
> This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem?
>
> Thanks for your help.
>
> Original error log:
> TARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
>      at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>      at java.lang.Long.parseLong(Long.java:419)
>      at java.lang.Long.parseLong(Long.java:468)
>      at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>      at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>      at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>      at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>      at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>      at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>      at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>      at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>      at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>      at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>      at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>
> 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
>
> Mouradk
>
>


Re: Fixing a corrupt edits file?

Posted by Arun C Murthy <ac...@hortonworks.com>.
Can you, please, use hdfs-dev@ for these discussion?

general@ is only used for project announcements etc.

thanks,
Arun

On Jul 30, 2012, at 10:29 AM, mouradk wrote:

> Hello all,
> 
> I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside.
> 
> This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem?
> 
> Thanks for your help.
> 
> Original error log:
> TARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
>    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>    at java.lang.Long.parseLong(Long.java:419)
>    at java.lang.Long.parseLong(Long.java:468)
>    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> 
> 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> 
> 
> 
> Mouradk
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Fixing a corrupt edits file?

Posted by mouradk <mo...@googlemail.com>.
Hello all,

I have just had a problem with a NameNode restart and someone on the mailing list kindly suggested that the edits file was corrupted. I have made a backup copy of the file and checked my /namesecondary/previous.checkpoint but the edits file there is empty 4kb with ????? inside.

This suggest to me that I cannot recover from the secondaryNameNode? How do you fix this problem?

Thanks for your help.

Original error log:
TARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Long.parseLong(Long.java:419)
    at java.lang.Long.parseLong(Long.java:468)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
    at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
    at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:



Mouradk


RE: Cannot restart Namenode after disk full

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Hi Mourad,

 I think you are hitting this issue HDFS-1594.
On disk full case there is a chance of getting the corruptions like this in our experience.
We are moving the namenode to safemode automatically when disk is filled.
This was committed only in hadoop-2/trunk and  not in hadoop-1 versions.

Regards,
Uma

________________________________________
From: mouradk [mouradk78@googlemail.com]
Sent: Tuesday, July 31, 2012 4:23 PM
To: general@hadoop.apache.org
Subject: Re: Cannot restart Namenode after disk full

Thanks for the advise Ryan, will certainly roll out the new releases in the near future. This is my first post on the channel, thanks all for your support.

Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, 31 July 2012 at 03:13, Ryan Rawson wrote:

> And for gods sake please please please please PLEASE upgrade to a
> newer version of hadoop and hbase!
>
> That variant of Hadoop is broken for HBase -- you lost data for
> certain -- and that HBase version is excessively old. The HBase team
> is on the 3rd major release since the variant you are running. 0.90,
> 0.92 and now 0.94. All have significant performance, stability and
> other improvements over 0.20.
>
> No one should ever run HBase on 0.20.x. You require at least
> 0.20-branch-append.
>
> On Mon, Jul 30, 2012 at 11:22 AM, Adam Brown <adam@hortonworks.com (mailto:adam@hortonworks.com)> wrote:
> > Can you send me your edit log file?
> >
> > adam@hortonworks.com (mailto:adam@hortonworks.com)
> >
> >
> >
> > On Mon, Jul 30, 2012 at 9:36 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
> > > Hi Adam,
> > >
> > > Thanks for your prompt reply. I am not sure how to attempt to Restore from SecondaryNameNode. When I restart hadoop, the NameNode shutdowns as per the log, but secondaryNameNode is launched.
> > >
> > > $jps
> > > 23675 RunJar
> > > 23225 TaskTracker
> > > 23023 SecondaryNameNode
> > > 22886 DataNode
> > > 4985 GossipRouter
> > > 30870 WOBootstrap
> > > 24684 Jps
> > > 5887 WOBootstrap
> > > 23100 JobTracker
> > > 24460 WOBootstrap
> > > 5838 WOBootstrap
> > > 26648 WOBootstrap
> > >
> > >
> > > I have read on a few threads about repairing the edits file but I am afraid I am not too sure how to attempt it.
> > >
> > > Many thanks,
> > >
> > > Mouradk
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > >
> > >
> > > On Monday, 30 July 2012 at 17:27, Adam Brown wrote:
> > >
> > > > HI Mouradk,
> > > >
> > > > looks like your edit log is corrupt
> > > >
> > > > can you recover from a secondary namenode?
> > > >
> > > > -Adam
> > > >
> > > > On Mon, Jul 30, 2012 at 9:26 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
> > > > > Dear all,
> > > > >
> > > > > We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:
> > > > >
> > > > >
> > > > > TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> > > > > ************************************************************/
> > > > > 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> > > > > 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> > > > > 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> > > > > 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> > > > > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> > > > > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> > > > > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> > > > > 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> > > > > 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> > > > > 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> > > > > 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> > > > > 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> > > > > 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
> > > > > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> > > > > at java.lang.Long.parseLong(Long.java:419)
> > > > > at java.lang.Long.parseLong(Long.java:468)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> > > > >
> > > > > 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > > > >
> > > > >
> > > > > Your help is much appreciated!!
> > > > >
> > > > >
> > > > > Mouradk
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > >
> > > > >
> > > > > Mouradk
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > >
> > > > >
> > > > > Mouradk
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Adam Brown
> > > > Enablement Engineer
> > > > Hortonworks
> > > >
> > >
> > >
> >
> >
> >
> >
> > --
> > Adam Brown
> > Enablement Engineer
> > Hortonworks
> >
>
>
>

Re: Cannot restart Namenode after disk full

Posted by mouradk <mo...@googlemail.com>.
Thanks for the advise Ryan, will certainly roll out the new releases in the near future. This is my first post on the channel, thanks all for your support.

Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Tuesday, 31 July 2012 at 03:13, Ryan Rawson wrote:

> And for gods sake please please please please PLEASE upgrade to a
> newer version of hadoop and hbase!
> 
> That variant of Hadoop is broken for HBase -- you lost data for
> certain -- and that HBase version is excessively old. The HBase team
> is on the 3rd major release since the variant you are running. 0.90,
> 0.92 and now 0.94. All have significant performance, stability and
> other improvements over 0.20.
> 
> No one should ever run HBase on 0.20.x. You require at least
> 0.20-branch-append.
> 
> On Mon, Jul 30, 2012 at 11:22 AM, Adam Brown <adam@hortonworks.com (mailto:adam@hortonworks.com)> wrote:
> > Can you send me your edit log file?
> > 
> > adam@hortonworks.com (mailto:adam@hortonworks.com)
> > 
> > 
> > 
> > On Mon, Jul 30, 2012 at 9:36 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
> > > Hi Adam,
> > > 
> > > Thanks for your prompt reply. I am not sure how to attempt to Restore from SecondaryNameNode. When I restart hadoop, the NameNode shutdowns as per the log, but secondaryNameNode is launched.
> > > 
> > > $jps
> > > 23675 RunJar
> > > 23225 TaskTracker
> > > 23023 SecondaryNameNode
> > > 22886 DataNode
> > > 4985 GossipRouter
> > > 30870 WOBootstrap
> > > 24684 Jps
> > > 5887 WOBootstrap
> > > 23100 JobTracker
> > > 24460 WOBootstrap
> > > 5838 WOBootstrap
> > > 26648 WOBootstrap
> > > 
> > > 
> > > I have read on a few threads about repairing the edits file but I am afraid I am not too sure how to attempt it.
> > > 
> > > Many thanks,
> > > 
> > > Mouradk
> > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > 
> > > 
> > > On Monday, 30 July 2012 at 17:27, Adam Brown wrote:
> > > 
> > > > HI Mouradk,
> > > > 
> > > > looks like your edit log is corrupt
> > > > 
> > > > can you recover from a secondary namenode?
> > > > 
> > > > -Adam
> > > > 
> > > > On Mon, Jul 30, 2012 at 9:26 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
> > > > > Dear all,
> > > > > 
> > > > > We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:
> > > > > 
> > > > > 
> > > > > TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> > > > > ************************************************************/
> > > > > 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> > > > > 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> > > > > 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> > > > > 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> > > > > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> > > > > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> > > > > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> > > > > 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> > > > > 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> > > > > 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> > > > > 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> > > > > 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> > > > > 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
> > > > > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> > > > > at java.lang.Long.parseLong(Long.java:419)
> > > > > at java.lang.Long.parseLong(Long.java:468)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
> > > > > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
> > > > > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> > > > > 
> > > > > 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > > > > 
> > > > > 
> > > > > Your help is much appreciated!!
> > > > > 
> > > > > 
> > > > > Mouradk
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > 
> > > > > 
> > > > > Mouradk
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > 
> > > > > 
> > > > > Mouradk
> > > > > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > --
> > > > Adam Brown
> > > > Enablement Engineer
> > > > Hortonworks
> > > > 
> > > 
> > > 
> > 
> > 
> > 
> > 
> > --
> > Adam Brown
> > Enablement Engineer
> > Hortonworks
> > 
> 
> 
> 



Re: Cannot restart Namenode after disk full

Posted by Ryan Rawson <ry...@gmail.com>.
And for gods sake please please please please PLEASE upgrade to a
newer version of hadoop and hbase!

That variant of Hadoop is broken for HBase -- you lost data for
certain -- and that HBase version is excessively old.  The HBase team
is on the 3rd major release since the variant you are running.  0.90,
0.92 and now 0.94.  All have significant performance, stability and
other improvements over 0.20.

No one should ever run HBase on 0.20.x.  You require at least
0.20-branch-append.

On Mon, Jul 30, 2012 at 11:22 AM, Adam Brown <ad...@hortonworks.com> wrote:
> Can you send me your edit log file?
>
> adam@hortonworks.com
>
>
>
> On Mon, Jul 30, 2012 at 9:36 AM, mouradk <mo...@googlemail.com> wrote:
>> Hi Adam,
>>
>> Thanks for your prompt reply. I am not sure how to attempt to Restore from SecondaryNameNode. When I restart hadoop, the NameNode shutdowns as per the log, but secondaryNameNode is launched.
>>
>> $jps
>> 23675 RunJar
>> 23225 TaskTracker
>> 23023 SecondaryNameNode
>> 22886 DataNode
>> 4985 GossipRouter
>> 30870 WOBootstrap
>> 24684 Jps
>> 5887 WOBootstrap
>> 23100 JobTracker
>> 24460 WOBootstrap
>> 5838 WOBootstrap
>> 26648 WOBootstrap
>>
>>
>> I have read on a few threads about repairing the edits file but I am afraid I am not too sure how to attempt it.
>>
>> Many thanks,
>>
>> Mouradk
>> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>
>>
>> On Monday, 30 July 2012 at 17:27, Adam Brown wrote:
>>
>>> HI Mouradk,
>>>
>>> looks like your edit log is corrupt
>>>
>>> can you recover from a secondary namenode?
>>>
>>> -Adam
>>>
>>> On Mon, Jul 30, 2012 at 9:26 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
>>> > Dear all,
>>> >
>>> > We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:
>>> >
>>> >
>>> > TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>>> > ************************************************************/
>>> > 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
>>> > 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
>>> > 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
>>> > 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
>>> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>>> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>>> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
>>> > 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
>>> > 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
>>> > 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
>>> > 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
>>> > 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
>>> > 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
>>> > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>>> > at java.lang.Long.parseLong(Long.java:419)
>>> > at java.lang.Long.parseLong(Long.java:468)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>>> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>>> >
>>> > 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>> >
>>> >
>>> > Your help is much appreciated!!
>>> >
>>> >
>>> > Mouradk
>>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>> >
>>> >
>>> > Mouradk
>>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>> >
>>> >
>>> > Mouradk
>>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>>> >
>>>
>>>
>>>
>>>
>>> --
>>> Adam Brown
>>> Enablement Engineer
>>> Hortonworks
>>>
>>>
>>>
>>
>>
>
>
>
> --
> Adam Brown
> Enablement Engineer
> Hortonworks

Re: Cannot restart Namenode after disk full

Posted by Adam Brown <ad...@hortonworks.com>.
Can you send me your edit log file?

adam@hortonworks.com



On Mon, Jul 30, 2012 at 9:36 AM, mouradk <mo...@googlemail.com> wrote:
> Hi Adam,
>
> Thanks for your prompt reply. I am not sure how to attempt to Restore from SecondaryNameNode. When I restart hadoop, the NameNode shutdowns as per the log, but secondaryNameNode is launched.
>
> $jps
> 23675 RunJar
> 23225 TaskTracker
> 23023 SecondaryNameNode
> 22886 DataNode
> 4985 GossipRouter
> 30870 WOBootstrap
> 24684 Jps
> 5887 WOBootstrap
> 23100 JobTracker
> 24460 WOBootstrap
> 5838 WOBootstrap
> 26648 WOBootstrap
>
>
> I have read on a few threads about repairing the edits file but I am afraid I am not too sure how to attempt it.
>
> Many thanks,
>
> Mouradk
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> On Monday, 30 July 2012 at 17:27, Adam Brown wrote:
>
>> HI Mouradk,
>>
>> looks like your edit log is corrupt
>>
>> can you recover from a secondary namenode?
>>
>> -Adam
>>
>> On Mon, Jul 30, 2012 at 9:26 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
>> > Dear all,
>> >
>> > We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:
>> >
>> >
>> > TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
>> > ************************************************************/
>> > 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
>> > 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
>> > 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
>> > 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
>> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
>> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
>> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
>> > 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
>> > 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
>> > 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
>> > 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
>> > 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
>> > 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
>> > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>> > at java.lang.Long.parseLong(Long.java:419)
>> > at java.lang.Long.parseLong(Long.java:468)
>> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>> > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>> > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>> > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>> >
>> > 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> >
>> >
>> > Your help is much appreciated!!
>> >
>> >
>> > Mouradk
>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> >
>> >
>> > Mouradk
>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> >
>> >
>> > Mouradk
>> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>> >
>>
>>
>>
>>
>> --
>> Adam Brown
>> Enablement Engineer
>> Hortonworks
>>
>>
>>
>
>



-- 
Adam Brown
Enablement Engineer
Hortonworks

Re: Cannot restart Namenode after disk full

Posted by mouradk <mo...@googlemail.com>.
Hi Adam,

Thanks for your prompt reply. I am not sure how to attempt to Restore from SecondaryNameNode. When I restart hadoop, the NameNode shutdowns as per the log, but secondaryNameNode is launched.   

$jps
23675 RunJar
23225 TaskTracker
23023 SecondaryNameNode
22886 DataNode
4985 GossipRouter
30870 WOBootstrap
24684 Jps
5887 WOBootstrap
23100 JobTracker
24460 WOBootstrap
5838 WOBootstrap
26648 WOBootstrap


I have read on a few threads about repairing the edits file but I am afraid I am not too sure how to attempt it.

Many thanks,

Mouradk
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Monday, 30 July 2012 at 17:27, Adam Brown wrote:

> HI Mouradk,
> 
> looks like your edit log is corrupt
> 
> can you recover from a secondary namenode?
> 
> -Adam
> 
> On Mon, Jul 30, 2012 at 9:26 AM, mouradk <mouradk78@googlemail.com (mailto:mouradk78@googlemail.com)> wrote:
> > Dear all,
> > 
> > We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:
> > 
> > 
> > TARTUP_MSG: build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> > ************************************************************/
> > 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> > 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> > 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> > 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> > 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> > 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> > 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> > 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> > 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> > 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> > 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
> > at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
> > at java.lang.Long.parseLong(Long.java:419)
> > at java.lang.Long.parseLong(Long.java:468)
> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
> > at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
> > at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
> > at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
> > at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
> > at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> > 
> > 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > 
> > 
> > Your help is much appreciated!!
> > 
> > 
> > Mouradk
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> > 
> > Mouradk
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> > 
> > Mouradk
> > Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
> > 
> 
> 
> 
> 
> -- 
> Adam Brown
> Enablement Engineer
> Hortonworks
> 
> 
> 



Re: Cannot restart Namenode after disk full

Posted by Adam Brown <ad...@hortonworks.com>.
HI Mouradk,

looks like your edit log is corrupt

can you recover from a secondary namenode?

-Adam

On Mon, Jul 30, 2012 at 9:26 AM, mouradk <mo...@googlemail.com> wrote:
> Dear all,
>
> We are running a hadoop 0.20.2 single node with hbase 0.20.4 and cannot restart namenode after the disk got full. I have freed more space but cannot restart Namenode and get the following error:
>
>
> TARTUP_MSG:   build =https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 2012-07-30 16:02:23,649 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=NameNode, port=50001
> 2012-07-30 16:02:23,656 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: localhost/127.0.0.1:50001
> 2012-07-30 16:02:23,659 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2012-07-30 16:02:23,660 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hadoop,hadoop
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2012-07-30 16:02:23,714 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> 2012-07-30 16:02:23,721 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2012-07-30 16:02:23,723 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> 2012-07-30 16:02:23,756 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 533
> 2012-07-30 16:02:23,833 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 2
> 2012-07-30 16:02:23,835 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 55400 loaded in 0 seconds.
> 2012-07-30 16:02:23,844 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NumberFormatException: For input string: "1343506"
>     at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
>     at java.lang.Long.parseLong(Long.java:419)
>     at java.lang.Long.parseLong(Long.java:468)
>     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.readLong(FSEditLog.java:1273)
>     at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:775)
>     at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:992)
>     at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:812)
>     at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:364)
>     at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>
> 2012-07-30 16:02:23,845 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
> Your help is much appreciated!!
>
>
> Mouradk
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> Mouradk
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
>
> Mouradk
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>



-- 
Adam Brown
Enablement Engineer
Hortonworks