You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sajid Syed <sa...@gmail.com> on 2014/12/18 05:11:53 UTC
Name Node HA ERROR
Hi All,
I have configured CDH4 with HA. It was working fine for some time and now I
started seeing this error and namenode had failed over to secondary server.
2014-12-17 08:44:31,847 FATAL
org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode join
org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error
replaying edit log at offset 0. Expected transaction ID was 1
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:146)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:92)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:744)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:660)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:274)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:741)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:531)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:403)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:445)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:621)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:606)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1177)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1241)
Caused by:
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
got premature end-of-file at txid 0; expected file to go up to 9
at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:195)
at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:132)
at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:179)
at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
Caused by:
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream$PrematureEOFException:
got premature end-of-file at txid 0; expected file to go up to 9
at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:195)
at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.skipUntil(EditLogInputStream.java:132)
at
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream.nextOp(RedundantEditLogInputStream.java:179)
at
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream.readOp(EditLogInputStream.java:75)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:133)
... 12 more
2014-12-17 08:44:31,849 INFO org.apache.hadoop.util.ExitUtil: Exiting with
status 1
2014-12-17 08:44:31,852 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
Thanks
Sajid
Re: Name Node HA ERROR
Posted by Andras POTOCZKY <an...@bsdaemon.hu>.
Hi
It seems both namenodes was active for a period or the standby node
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that
node again. Be careful because if you do a namenode format again you
will lost your datas on the hdfs.
"If you have already formatted the NameNode, or are converting a
non-HA-enabled cluster to be HA-enabled, you should now copy over the
contents of your NameNode metadata directories to the other, unformatted
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on
the unformatted NameNode. Running this command will also ensure that the
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain
sufficient edits transactions to be able to start both NameNodes."
Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/
Andras
On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and
> now I started seeing this error and namenode had failed over to
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error
> replaying edit log at offset 0. Expected transaction ID was 1
Re: Name Node HA ERROR
Posted by Andras POTOCZKY <an...@bsdaemon.hu>.
Hi
It seems both namenodes was active for a period or the standby node
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that
node again. Be careful because if you do a namenode format again you
will lost your datas on the hdfs.
"If you have already formatted the NameNode, or are converting a
non-HA-enabled cluster to be HA-enabled, you should now copy over the
contents of your NameNode metadata directories to the other, unformatted
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on
the unformatted NameNode. Running this command will also ensure that the
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain
sufficient edits transactions to be able to start both NameNodes."
Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/
Andras
On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and
> now I started seeing this error and namenode had failed over to
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error
> replaying edit log at offset 0. Expected transaction ID was 1
Re: Name Node HA ERROR
Posted by Andras POTOCZKY <an...@bsdaemon.hu>.
Hi
It seems both namenodes was active for a period or the standby node
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that
node again. Be careful because if you do a namenode format again you
will lost your datas on the hdfs.
"If you have already formatted the NameNode, or are converting a
non-HA-enabled cluster to be HA-enabled, you should now copy over the
contents of your NameNode metadata directories to the other, unformatted
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on
the unformatted NameNode. Running this command will also ensure that the
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain
sufficient edits transactions to be able to start both NameNodes."
Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/
Andras
On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and
> now I started seeing this error and namenode had failed over to
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error
> replaying edit log at offset 0. Expected transaction ID was 1
Re: Name Node HA ERROR
Posted by Andras POTOCZKY <an...@bsdaemon.hu>.
Hi
It seems both namenodes was active for a period or the standby node
process was stopped for long time.
Tip: on the standby node try to backup the fsimage and bootstrap that
node again. Be careful because if you do a namenode format again you
will lost your datas on the hdfs.
"If you have already formatted the NameNode, or are converting a
non-HA-enabled cluster to be HA-enabled, you should now copy over the
contents of your NameNode metadata directories to the other, unformatted
NameNode by running the command "/hdfs namenode -bootstrapStandby/" on
the unformatted NameNode. Running this command will also ensure that the
JournalNodes (as configured by *dfs.namenode.shared.edits.dir*) contain
sufficient edits transactions to be able to start both NameNodes."
Anyway here is a link about other namenode recovery possibilities:
http://blog.cloudera.com/blog/2012/05/namenode-recovery-tools-for-the-hadoop-distributed-file-system/
Andras
On 2014.12.18. 5:11, Sajid Syed wrote:
> Hi All,
>
> I have configured CDH4 with HA. It was working fine for some time and
> now I started seeing this error and namenode had failed over to
> secondary server.
>
>
> 2014-12-17 08:44:31,847 FATAL
> org.apache.hadoop.hdfs.server.namenode.NameNode: Exception in namenode
> join
> org.apache.hadoop.hdfs.server.namenode.EditLogInputException: Error
> replaying edit log at offset 0. Expected transaction ID was 1