You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by ANKUR GOEL <an...@corp.aol.com> on 2008/11/13 13:34:15 UTC

Namenode Failure

Hi Folks,
             We have been running hadoop-0.17.2 release on a 50 machine 
cluster and we recently experience a namenode failure because of disk 
becoming full. The node is unable to start-up now and throws the 
following exception

2008-11-13 06:41:18,618 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 9000
2008-11-13 06:41:18,619 ERROR org.apache.hadoop.dfs.NameNode: 
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
        at 
org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
        at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)
        at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)
        at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)
        at org.apache.hadoop.dfs.FSImage.doUpgrade(FSImage.java:250)
        at 
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:217)
        at 
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
        at 
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)
        at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:255)
        at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)
 
What is the best way to recover this failure with minimal data loss ?
I could not find instructions on wiki or anywhere else for release 
0.17.2 to do recovery using files from secondary namenode ?

Any help is greatly appreciated.

Thanks
-Ankur

Re: Namenode Failure

Posted by lohit <lo...@yahoo.com>.
Hi Ankur,

We have had this kind of failure reported by others earlier on this list.
This might help you

http://markmail.org/message/u6l6lwus33oeivcd

Thanks,
Lohit



----- Original Message ----
From: ANKUR GOEL <an...@corp.aol.com>
To: core-dev@hadoop.apache.org; core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 4:34:15 AM
Subject: Namenode Failure

Hi Folks,
            We have been running hadoop-0.17.2 release on a 50 machine cluster and we recently experience a namenode failure because of disk becoming full. The node is unable to start-up now and throws the following exception

2008-11-13 06:41:18,618 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
2008-11-13 06:41:18,619 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException
       at java.io.DataInputStream.readFully(DataInputStream.java:180)
       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
       at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
       at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)
       at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)
       at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)
       at org.apache.hadoop.dfs.FSImage.doUpgrade(FSImage.java:250)
       at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:217)
       at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
       at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)
       at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:255)
       at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)

What is the best way to recover this failure with minimal data loss ?
I could not find instructions on wiki or anywhere else for release 0.17.2 to do recovery using files from secondary namenode ?

Any help is greatly appreciated.

Thanks
-Ankur


Re: Namenode Failure

Posted by lohit <lo...@yahoo.com>.
Hi Ankur,

We have had this kind of failure reported by others earlier on this list.
This might help you

http://markmail.org/message/u6l6lwus33oeivcd

Thanks,
Lohit



----- Original Message ----
From: ANKUR GOEL <an...@corp.aol.com>
To: core-dev@hadoop.apache.org; core-user@hadoop.apache.org
Sent: Thursday, November 13, 2008 4:34:15 AM
Subject: Namenode Failure

Hi Folks,
            We have been running hadoop-0.17.2 release on a 50 machine cluster and we recently experience a namenode failure because of disk becoming full. The node is unable to start-up now and throws the following exception

2008-11-13 06:41:18,618 INFO org.apache.hadoop.ipc.Server: Stopping server on 9000
2008-11-13 06:41:18,619 ERROR org.apache.hadoop.dfs.NameNode: java.io.EOFException
       at java.io.DataInputStream.readFully(DataInputStream.java:180)
       at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
       at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
       at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)
       at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)
       at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)
       at org.apache.hadoop.dfs.FSImage.doUpgrade(FSImage.java:250)
       at org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:217)
       at org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)
       at org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)
       at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:255)
       at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)

What is the best way to recover this failure with minimal data loss ?
I could not find instructions on wiki or anywhere else for release 0.17.2 to do recovery using files from secondary namenode ?

Any help is greatly appreciated.

Thanks
-Ankur