You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by mg <us...@gmail.com> on 2012/11/13 12:03:23 UTC

Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Hi,

we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes 
running on Ubuntu 12.04 (Precise).

We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get 
dist-upgrade on all nodes, started CM, checked and updated the 
configuration and attempted to start the cluster.

However, the HDFS NameNode fails to start with the exception appended below.

There is sufficient space on all partitions. We do not bind against 
wildcard addresses (at least not yet).

Any ideas? Stacktrace follows.

Cheers,
Martin

FATAL	org.apache.hadoop.hdfs.server.namenode.NameNode	
Exception in namenode join
java.lang.NumberFormatException: null
	at java.lang.Long.parseLong(Long.java:375)
	at java.lang.Long.valueOf(Long.java:525)
	at 
org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
	at 
org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
	at 
org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
	at 
org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
	at 
org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
	at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
	at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
	at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
	at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
	at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
	at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
	at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
	at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
	at 
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
	at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by Arun C Murthy <ac...@hortonworks.com>.
Please don't cross-post, this belongs to just CDH lists.

On Nov 13, 2012, at 5:18 AM, mg wrote:

> Meanwhile we found that the seen_txid files are empty in 4 of 5 replicated namenode directories.
> 
> The edits_inprogress_... files are identical in all 5 dirs with the tx id from the one non-empty seen_txid file.
> 
> The fsimage files are identical, too.
> 
> Otherwise there are differences between every 2 of the 5 dirs as what the edits_ files are concerned.
> 
> Is it safe to copy the one non-empty seen_txid file over into the other 4 nn directories?
> 
> Cheers,
> Martin
> 
> On 13.11.2012 12:03, mg wrote:
>> Hi,
>> 
>> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
>> running on Ubuntu 12.04 (Precise).
>> 
>> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
>> dist-upgrade on all nodes, started CM, checked and updated the
>> configuration and attempted to start the cluster.
>> 
>> However, the HDFS NameNode fails to start with the exception appended
>> below.
>> 
>> There is sufficient space on all partitions. We do not bind against
>> wildcard addresses (at least not yet).
>> 
>> Any ideas? Stacktrace follows.
>> 
>> Cheers,
>> Martin
>> 
>> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
>> Exception in namenode join
>> java.lang.NumberFormatException: null
>>     at java.lang.Long.parseLong(Long.java:375)
>>     at java.lang.Long.valueOf(Long.java:525)
>>     at
>> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by Arun C Murthy <ac...@hortonworks.com>.
Please don't cross-post, this belongs to just CDH lists.

On Nov 13, 2012, at 5:18 AM, mg wrote:

> Meanwhile we found that the seen_txid files are empty in 4 of 5 replicated namenode directories.
> 
> The edits_inprogress_... files are identical in all 5 dirs with the tx id from the one non-empty seen_txid file.
> 
> The fsimage files are identical, too.
> 
> Otherwise there are differences between every 2 of the 5 dirs as what the edits_ files are concerned.
> 
> Is it safe to copy the one non-empty seen_txid file over into the other 4 nn directories?
> 
> Cheers,
> Martin
> 
> On 13.11.2012 12:03, mg wrote:
>> Hi,
>> 
>> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
>> running on Ubuntu 12.04 (Precise).
>> 
>> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
>> dist-upgrade on all nodes, started CM, checked and updated the
>> configuration and attempted to start the cluster.
>> 
>> However, the HDFS NameNode fails to start with the exception appended
>> below.
>> 
>> There is sufficient space on all partitions. We do not bind against
>> wildcard addresses (at least not yet).
>> 
>> Any ideas? Stacktrace follows.
>> 
>> Cheers,
>> Martin
>> 
>> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
>> Exception in namenode join
>> java.lang.NumberFormatException: null
>>     at java.lang.Long.parseLong(Long.java:375)
>>     at java.lang.Long.valueOf(Long.java:525)
>>     at
>> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by Arun C Murthy <ac...@hortonworks.com>.
Please don't cross-post, this belongs to just CDH lists.

On Nov 13, 2012, at 5:18 AM, mg wrote:

> Meanwhile we found that the seen_txid files are empty in 4 of 5 replicated namenode directories.
> 
> The edits_inprogress_... files are identical in all 5 dirs with the tx id from the one non-empty seen_txid file.
> 
> The fsimage files are identical, too.
> 
> Otherwise there are differences between every 2 of the 5 dirs as what the edits_ files are concerned.
> 
> Is it safe to copy the one non-empty seen_txid file over into the other 4 nn directories?
> 
> Cheers,
> Martin
> 
> On 13.11.2012 12:03, mg wrote:
>> Hi,
>> 
>> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
>> running on Ubuntu 12.04 (Precise).
>> 
>> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
>> dist-upgrade on all nodes, started CM, checked and updated the
>> configuration and attempted to start the cluster.
>> 
>> However, the HDFS NameNode fails to start with the exception appended
>> below.
>> 
>> There is sufficient space on all partitions. We do not bind against
>> wildcard addresses (at least not yet).
>> 
>> Any ideas? Stacktrace follows.
>> 
>> Cheers,
>> Martin
>> 
>> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
>> Exception in namenode join
>> java.lang.NumberFormatException: null
>>     at java.lang.Long.parseLong(Long.java:375)
>>     at java.lang.Long.valueOf(Long.java:525)
>>     at
>> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by Arun C Murthy <ac...@hortonworks.com>.
Please don't cross-post, this belongs to just CDH lists.

On Nov 13, 2012, at 5:18 AM, mg wrote:

> Meanwhile we found that the seen_txid files are empty in 4 of 5 replicated namenode directories.
> 
> The edits_inprogress_... files are identical in all 5 dirs with the tx id from the one non-empty seen_txid file.
> 
> The fsimage files are identical, too.
> 
> Otherwise there are differences between every 2 of the 5 dirs as what the edits_ files are concerned.
> 
> Is it safe to copy the one non-empty seen_txid file over into the other 4 nn directories?
> 
> Cheers,
> Martin
> 
> On 13.11.2012 12:03, mg wrote:
>> Hi,
>> 
>> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
>> running on Ubuntu 12.04 (Precise).
>> 
>> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
>> dist-upgrade on all nodes, started CM, checked and updated the
>> configuration and attempted to start the cluster.
>> 
>> However, the HDFS NameNode fails to start with the exception appended
>> below.
>> 
>> There is sufficient space on all partitions. We do not bind against
>> wildcard addresses (at least not yet).
>> 
>> Any ideas? Stacktrace follows.
>> 
>> Cheers,
>> Martin
>> 
>> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
>> Exception in namenode join
>> java.lang.NumberFormatException: null
>>     at java.lang.Long.parseLong(Long.java:375)
>>     at java.lang.Long.valueOf(Long.java:525)
>>     at
>> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>> 
>>     at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by mg <us...@gmail.com>.
Meanwhile we found that the seen_txid files are empty in 4 of 5 
replicated namenode directories.

The edits_inprogress_... files are identical in all 5 dirs with the tx 
id from the one non-empty seen_txid file.

The fsimage files are identical, too.

Otherwise there are differences between every 2 of the 5 dirs as what 
the edits_ files are concerned.

Is it safe to copy the one non-empty seen_txid file over into the other 
4 nn directories?

Cheers,
Martin

On 13.11.2012 12:03, mg wrote:
> Hi,
>
> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
> running on Ubuntu 12.04 (Precise).
>
> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
> dist-upgrade on all nodes, started CM, checked and updated the
> configuration and attempted to start the cluster.
>
> However, the HDFS NameNode fails to start with the exception appended
> below.
>
> There is sufficient space on all partitions. We do not bind against
> wildcard addresses (at least not yet).
>
> Any ideas? Stacktrace follows.
>
> Cheers,
> Martin
>
> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
> Exception in namenode join
> java.lang.NumberFormatException: null
>      at java.lang.Long.parseLong(Long.java:375)
>      at java.lang.Long.valueOf(Long.java:525)
>      at
> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by mg <us...@gmail.com>.
Meanwhile we found that the seen_txid files are empty in 4 of 5 
replicated namenode directories.

The edits_inprogress_... files are identical in all 5 dirs with the tx 
id from the one non-empty seen_txid file.

The fsimage files are identical, too.

Otherwise there are differences between every 2 of the 5 dirs as what 
the edits_ files are concerned.

Is it safe to copy the one non-empty seen_txid file over into the other 
4 nn directories?

Cheers,
Martin

On 13.11.2012 12:03, mg wrote:
> Hi,
>
> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
> running on Ubuntu 12.04 (Precise).
>
> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
> dist-upgrade on all nodes, started CM, checked and updated the
> configuration and attempted to start the cluster.
>
> However, the HDFS NameNode fails to start with the exception appended
> below.
>
> There is sufficient space on all partitions. We do not bind against
> wildcard addresses (at least not yet).
>
> Any ideas? Stacktrace follows.
>
> Cheers,
> Martin
>
> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
> Exception in namenode join
> java.lang.NumberFormatException: null
>      at java.lang.Long.parseLong(Long.java:375)
>      at java.lang.Long.valueOf(Long.java:525)
>      at
> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by mg <us...@gmail.com>.
Meanwhile we found that the seen_txid files are empty in 4 of 5 
replicated namenode directories.

The edits_inprogress_... files are identical in all 5 dirs with the tx 
id from the one non-empty seen_txid file.

The fsimage files are identical, too.

Otherwise there are differences between every 2 of the 5 dirs as what 
the edits_ files are concerned.

Is it safe to copy the one non-empty seen_txid file over into the other 
4 nn directories?

Cheers,
Martin

On 13.11.2012 12:03, mg wrote:
> Hi,
>
> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
> running on Ubuntu 12.04 (Precise).
>
> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
> dist-upgrade on all nodes, started CM, checked and updated the
> configuration and attempted to start the cluster.
>
> However, the HDFS NameNode fails to start with the exception appended
> below.
>
> There is sufficient space on all partitions. We do not bind against
> wildcard addresses (at least not yet).
>
> Any ideas? Stacktrace follows.
>
> Cheers,
> Martin
>
> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
> Exception in namenode join
> java.lang.NumberFormatException: null
>      at java.lang.Long.parseLong(Long.java:375)
>      at java.lang.Long.valueOf(Long.java:525)
>      at
> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)

Re: Namenode fails to start after upgrade from CDH 4.0.1 to 4.1.2

Posted by mg <us...@gmail.com>.
Meanwhile we found that the seen_txid files are empty in 4 of 5 
replicated namenode directories.

The edits_inprogress_... files are identical in all 5 dirs with the tx 
id from the one non-empty seen_txid file.

The fsimage files are identical, too.

Otherwise there are differences between every 2 of the 5 dirs as what 
the edits_ files are concerned.

Is it safe to copy the one non-empty seen_txid file over into the other 
4 nn directories?

Cheers,
Martin

On 13.11.2012 12:03, mg wrote:
> Hi,
>
> we just upgraded a cluster from CDH 4.0.1 to 4.1.2 on a number of nodes
> running on Ubuntu 12.04 (Precise).
>
> We first upgraded Cloudera Manager (now 4.1.0), then ran apt-get
> dist-upgrade on all nodes, started CM, checked and updated the
> configuration and attempted to start the cluster.
>
> However, the HDFS NameNode fails to start with the exception appended
> below.
>
> There is sufficient space on all partitions. We do not bind against
> wildcard addresses (at least not yet).
>
> Any ideas? Stacktrace follows.
>
> Cheers,
> Martin
>
> FATAL    org.apache.hadoop.hdfs.server.namenode.NameNode
> Exception in namenode join
> java.lang.NumberFormatException: null
>      at java.lang.Long.parseLong(Long.java:375)
>      at java.lang.Long.valueOf(Long.java:525)
>      at
> org.apache.hadoop.hdfs.util.PersistentLongFile.readFile(PersistentLongFile.java:93)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readTransactionIdFile(NNStorage.java:425)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImageTransactionalStorageInspector.inspectDirectory(FSImageTransactionalStorageInspector.java:71)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.inspectStorageDirs(NNStorage.java:1039)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NNStorage.readAndInspectDirs(NNStorage.java:1093)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:598)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:267)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:534)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:424)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:386)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:398)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:432)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:608)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:589)
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1140)
>
>      at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1204)