You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Sajid Syed <sa...@gmail.com> on 2014/12/18 05:11:53 UTC

Name Node HA ERROR

Hi All,

I have configured CDH4 with HA. It was working fine for some time and now I
started seeing this error and namenode had failed over to secondary server.


2014-12-17 08:44:31,847 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error
replaying edit log at offset 0.  Expected transaction ID was 1
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:146)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:92)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:744)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:660)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:274)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:741)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:531)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:445)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:621)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:606)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1177)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241)
Caused by:
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
got premature end-of-file at txid 0; expected file to go up to 9
        at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:195)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:132)
        at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:179)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)


Caused by:
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
got premature end-of-file at txid 0; expected file to go up to 9
        at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:195)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:132)
        at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:179)
        at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:133)
        ... 12 more
2014-12-17 08:44:31,849 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2014-12-17 08:44:31,852 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:


Thanks
Sajid

Re: Name Node HA ERROR

Posted by Andras POTOCZKY <an...@bsdaemon.hu>.

Hi

It seems both namenodes was active for a period or the standby node 
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that 
node again. Be careful because if you do a namenode format again you 
will lost your datas on the hdfs.

"If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the 
contents of your NameNode metadata directories to the other, unformatted 
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on 
the unformatted NameNode. Running this command will also ensure that the 
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain 
sufficient edits transactions to be able to start both NameNodes."

Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Andras

On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and 
> now I started seeing this error and namenode had failed over to 
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error 
> replaying edit log at offset 0.  Expected transaction ID was 1

Re: Name Node HA ERROR

Posted by Andras POTOCZKY <an...@bsdaemon.hu>.

Hi

It seems both namenodes was active for a period or the standby node 
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that 
node again. Be careful because if you do a namenode format again you 
will lost your datas on the hdfs.

"If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the 
contents of your NameNode metadata directories to the other, unformatted 
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on 
the unformatted NameNode. Running this command will also ensure that the 
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain 
sufficient edits transactions to be able to start both NameNodes."

Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Andras

On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and 
> now I started seeing this error and namenode had failed over to 
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error 
> replaying edit log at offset 0.  Expected transaction ID was 1

Re: Name Node HA ERROR

Posted by Andras POTOCZKY <an...@bsdaemon.hu>.

Hi

It seems both namenodes was active for a period or the standby node 
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that 
node again. Be careful because if you do a namenode format again you 
will lost your datas on the hdfs.

"If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the 
contents of your NameNode metadata directories to the other, unformatted 
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on 
the unformatted NameNode. Running this command will also ensure that the 
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain 
sufficient edits transactions to be able to start both NameNodes."

Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Andras

On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and 
> now I started seeing this error and namenode had failed over to 
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error 
> replaying edit log at offset 0.  Expected transaction ID was 1

Re: Name Node HA ERROR

Posted by Andras POTOCZKY <an...@bsdaemon.hu>.

Hi

It seems both namenodes was active for a period or the standby node 
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that 
node again. Be careful because if you do a namenode format again you 
will lost your datas on the hdfs.

"If you have already formatted the NameNode, or are converting a 
non-HA-enabled cluster to be HA-enabled, you should now copy over the 
contents of your NameNode metadata directories to the other, unformatted 
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on 
the unformatted NameNode. Running this command will also ensure that the 
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain 
sufficient edits transactions to be able to start both NameNodes."

Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/

Andras

On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and 
> now I started seeing this error and namenode had failed over to 
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode 
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error 
> replaying edit log at offset 0.  Expected transaction ID was 1