You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Bjoern Schiessle <bj...@schiessle.org> on 2010/12/22 16:03:11 UTC

namenode doesn't start after reboot

Hi,

After a Kernel update and a reboot the namenode doesn't start. I run the
Cloudera cdh3 Hadoop distribution. I have already searched for a solution.
It looks like I'm not the only one with such a problem. Sadly I could only
find descriptions of similar problems, but no solutions...

This is the error message from the namenode log file:


2010-12-22 16:13:04,830 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: 
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = pcube/129.69.216.24
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2+737
STARTUP_MSG:   build =  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd; compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010
************************************************************/
2010-12-22 16:13:05,001 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-12-22 16:13:05,007 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
2010-12-22 16:13:05,040 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
2010-12-22 16:13:05,335 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
2010-12-22 16:13:05,336 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
2010-12-22 16:13:05,361 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 72
2010-12-22 16:13:05,374 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 3
2010-12-22 16:13:05,375 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8822 loaded in 0 seconds.
2010-12-22 16:13:05,377 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)
	at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
	at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
	at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148)
	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)

2010-12-22 16:13:05,377 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at pcube/129.69.216.24
************************************************************/

Any idea what could be wrong and how I can get my namenode up running again?

Thanks a lot!
Björn

Re: namenode doesn't start after reboot

Posted by Bjoern Schiessle <bj...@schiessle.org>.

On Thu, 23 Dec 2010 12:02:51 -0800 Todd Lipcon wrote:
> On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle
> <bj...@schiessle.org>wrote:
> 
> >
> > 1. I have set up a second dfs.name.dir which is stored at another
> > computer (mounted by sshfs)
> >
> 
> I would strongly discourage the use of sshfs for the name dir. For one,
> it's slow, and for two, I've sen it have some really weird semantics
> where it's doing write-back caching.

Thanks for this insights. I now switched to NFS.

Thanks,
Björn

Re: namenode doesn't start after reboot

Posted by Ryan Rawson <ry...@gmail.com>.

I think the bug might be related to this:

https://issues.apache.org/jira/browse/HDFS-686

and

https://issues.apache.org/jira/browse/HDFS-1002



On Thu, Dec 23, 2010 at 12:47 PM, Jakob Homan <jg...@gmail.com> wrote:
> Please move discussions of CDH issues to Cloudera's lists.  Thanks.
>
> On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon <to...@cloudera.com> wrote:
>> On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <bj...@schiessle.org>wrote:
>>
>>>
>>> 1. I have set up a second dfs.name.dir which is stored at another
>>> computer (mounted by sshfs)
>>>
>>
>> I would strongly discourage the use of sshfs for the name dir. For one, it's
>> slow, and for two, I've sen it have some really weird semantics where it's
>> doing write-back caching.
>>
>> Just take a look at its manpage and you should get scared about using it for
>> a critical mount point like this.
>>
>> A soft interruptable NFS mount is a much safer bet.
>>
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>

Re: namenode doesn't start after reboot

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Dec 23, 2010 at 12:47 PM, Jakob Homan <jg...@gmail.com> wrote:

> Please move discussions of CDH issues to Cloudera's lists.  Thanks.
>

Hi Jakob,

These bugs are clearly not CDH-specific. NameNode corruption bugs, and best
practices with regard to the storage of NN metadata, are clearly applicable
to any version of Hadoop that users may run, be it Apache, Yahoo, Facebook,
0.20, 0.21, or trunk. If you have reason to believe my suggestion you quoted
below is somehow not relevant to the larger community I would love to hear
it.

My understanding of the ASF goals is that we should encourage a cohesive
community. Asking users of CDH to move general Hadoop questions off of ASF
mailing lists just because of their choice in distros encourages a fractured
community rather than a cohesive one.

Clearly. if a user has a question specifically about Cloudera packaging they
should be directed to the CDH lists so as not to clutter non-CDH users'
inboxes with irrelevant questions. I think if you browse the archives you'll
find that Cloudera employees have been consistent about doing this since we
started the cdh-user list several months ago. But if an issue is a bug that
is likely to occur in trunk, it makes sense to me to leave it on the list
associated with the core project.

Personally I do my best to answer questions on the ASF lists regardless of
which distro the person is using - though our distros have some divergence
in backported patch sets, it's rare that a bug in one distro doesn't allow
us to fix a bug in trunk. I can readily pull up several recent examples of
this, and I'm surprised that there isn't more concern in the general
community about bugs that may result in NN metadata corruption.

Thanks,
-Todd

>
> On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon <to...@cloudera.com> wrote:
> > On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <bjoern@schiessle.org
> >wrote:
> >
> >>
> >> 1. I have set up a second dfs.name.dir which is stored at another
> >> computer (mounted by sshfs)
> >>
> >
> > I would strongly discourage the use of sshfs for the name dir. For one,
> it's
> > slow, and for two, I've sen it have some really weird semantics where
> it's
> > doing write-back caching.
> >
> > Just take a look at its manpage and you should get scared about using it
> for
> > a critical mount point like this.
> >
> > A soft interruptable NFS mount is a much safer bet.
> >
> > -Todd
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: namenode doesn't start after reboot

Posted by Jakob Homan <jg...@gmail.com>.

Please move discussions of CDH issues to Cloudera's lists.  Thanks.

On Thu, Dec 23, 2010 at 12:02 PM, Todd Lipcon <to...@cloudera.com> wrote:
> On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <bj...@schiessle.org>wrote:
>
>>
>> 1. I have set up a second dfs.name.dir which is stored at another
>> computer (mounted by sshfs)
>>
>
> I would strongly discourage the use of sshfs for the name dir. For one, it's
> slow, and for two, I've sen it have some really weird semantics where it's
> doing write-back caching.
>
> Just take a look at its manpage and you should get scared about using it for
> a critical mount point like this.
>
> A soft interruptable NFS mount is a much safer bet.
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: namenode doesn't start after reboot

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Dec 23, 2010 at 2:50 AM, Bjoern Schiessle <bj...@schiessle.org>wrote:

>
> 1. I have set up a second dfs.name.dir which is stored at another
> computer (mounted by sshfs)
>

I would strongly discourage the use of sshfs for the name dir. For one, it's
slow, and for two, I've sen it have some really weird semantics where it's
doing write-back caching.

Just take a look at its manpage and you should get scared about using it for
a critical mount point like this.

A soft interruptable NFS mount is a much safer bet.

-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Re: namenode doesn't start after reboot

Posted by Bjoern Schiessle <bj...@schiessle.org>.

Hi,

On Thu, 23 Dec 2010 09:15:41 -0800 Aaron T. Myers wrote:
> All this aside, you really shouldn't have to "safely" stop all the
> Hadoop services when you reboot any of your servers. Hadoop should be
> able to survive a crash of any of the daemons. Any circumstance in
> which Hadoop currently corrupts the edits log or fsimage is a serious
> bug, and should be reported via JIRA.

this is also what I would expect. Nevertheless the last reboot caused the
problem described at the beginning of the thread. During the tests today
I always stopped the Datanode and Namenode by my own which works
flawlessly. To be a little bit more safe I wrote my own stop-script which
stops the datanode and the namenode before shutdown.

best wishes,
Björn

Re: namenode doesn't start after reboot

Posted by "Aaron T. Myers" <at...@cloudera.com>.

All this aside, you really shouldn't have to "safely" stop all the Hadoop
services when you reboot any of your servers. Hadoop should be able to
survive a crash of any of the daemons. Any circumstance in which Hadoop
currently corrupts the edits log or fsimage is a serious bug, and should be
reported via JIRA.

--
Aaron T. Myers
Software Engineer, Cloudera



On Thu, Dec 23, 2010 at 7:29 AM, rahul patodi <pa...@gmail.com> wrote:

> Hi,
> If you want to reboot the server:
> 1. stop mapred
> 2. stop dfs
> the reboot
> when you again want to restart hadoop
> firstly start dfs then mepred.
>
> --
> *Regards*,
> Rahul Patodi
> Software Engineer,
> Impetus Infotech (India) Pvt Ltd,
> www.impetus.com
> Mob:09907074413
>
>
> On Thu, Dec 23, 2010 at 6:15 PM, li ping <li...@gmail.com> wrote:
>
> > As far as I know, setup a backup namenode dir is enough.
> >
> > I haven't use the hadoop in a production environment. So, I can't tell
> you
> > what would be right way to reboot the server.
> >
> > On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <bjoern@schiessle.org
> > >wrote:
> >
> > > Hi,
> > >
> > > On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote:
> > > > It seems the exception occurs during NameNode loads the editlog.
> > > > make sure the editlog file exists. or you can debug the application
> to
> > > > see what's wrong.
> > >
> > > last night I tried to fix the problem and did a big mistake. Instead of
> > > copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and
> > > edits.new to a backup I moved them and later delete the only version
> > > hence I thought I have a copy.
> > >
> > > The good thing: The namenode starts again.
> > > The bad thing: My file system is now in an inconsistent state.
> > >
> > > Probably the only solution is to reformat the hdfs and start from
> > > scratch. Thankfully there wasn't that much data stored at the hdfs
> until
> > > now but I definitely have to make sure that this doesn't happens again:
> > >
> > > 1. I have set up a second dfs.name.dir which is stored at another
> > > computer (mounted by sshfs)
> > > 2. I will install a backup script similar to:
> > > http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script
> > >
> > > Do you think this should be enough to overcome such situations in the
> > > future? Any additional ideas how to make it more safe?
> > >
> > > I'm still a little bit afraid if I think about the next time I will
> have
> > > to reboot the server. Shouldn't a reboot safely stop and restart all
> > > Hadoop services? Is there any thing I can do to make sure that the next
> > > reboot will not cause the same problems?
> > >
> > > Thanks a lot!
> > > Björn
> > >
> > >
> > >
> >
> >
> > --
> > -----李平
> >
>

Re: namenode doesn't start after reboot

Posted by rahul patodi <pa...@gmail.com>.

Hi,
If you want to reboot the server:
1. stop mapred
2. stop dfs
the reboot
when you again want to restart hadoop
firstly start dfs then mepred.

-- 
*Regards*,
Rahul Patodi
Software Engineer,
Impetus Infotech (India) Pvt Ltd,
www.impetus.com
Mob:09907074413


On Thu, Dec 23, 2010 at 6:15 PM, li ping <li...@gmail.com> wrote:

> As far as I know, setup a backup namenode dir is enough.
>
> I haven't use the hadoop in a production environment. So, I can't tell you
> what would be right way to reboot the server.
>
> On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <bjoern@schiessle.org
> >wrote:
>
> > Hi,
> >
> > On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote:
> > > It seems the exception occurs during NameNode loads the editlog.
> > > make sure the editlog file exists. or you can debug the application to
> > > see what's wrong.
> >
> > last night I tried to fix the problem and did a big mistake. Instead of
> > copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and
> > edits.new to a backup I moved them and later delete the only version
> > hence I thought I have a copy.
> >
> > The good thing: The namenode starts again.
> > The bad thing: My file system is now in an inconsistent state.
> >
> > Probably the only solution is to reformat the hdfs and start from
> > scratch. Thankfully there wasn't that much data stored at the hdfs until
> > now but I definitely have to make sure that this doesn't happens again:
> >
> > 1. I have set up a second dfs.name.dir which is stored at another
> > computer (mounted by sshfs)
> > 2. I will install a backup script similar to:
> > http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script
> >
> > Do you think this should be enough to overcome such situations in the
> > future? Any additional ideas how to make it more safe?
> >
> > I'm still a little bit afraid if I think about the next time I will have
> > to reboot the server. Shouldn't a reboot safely stop and restart all
> > Hadoop services? Is there any thing I can do to make sure that the next
> > reboot will not cause the same problems?
> >
> > Thanks a lot!
> > Björn
> >
> >
> >
>
>
> --
> -----李平
>

Re: namenode doesn't start after reboot

Posted by li ping <li...@gmail.com>.

As far as I know, setup a backup namenode dir is enough.

I haven't use the hadoop in a production environment. So, I can't tell you
what would be right way to reboot the server.

On Thu, Dec 23, 2010 at 6:50 PM, Bjoern Schiessle <bj...@schiessle.org>wrote:

> Hi,
>
> On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote:
> > It seems the exception occurs during NameNode loads the editlog.
> > make sure the editlog file exists. or you can debug the application to
> > see what's wrong.
>
> last night I tried to fix the problem and did a big mistake. Instead of
> copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and
> edits.new to a backup I moved them and later delete the only version
> hence I thought I have a copy.
>
> The good thing: The namenode starts again.
> The bad thing: My file system is now in an inconsistent state.
>
> Probably the only solution is to reformat the hdfs and start from
> scratch. Thankfully there wasn't that much data stored at the hdfs until
> now but I definitely have to make sure that this doesn't happens again:
>
> 1. I have set up a second dfs.name.dir which is stored at another
> computer (mounted by sshfs)
> 2. I will install a backup script similar to:
> http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script
>
> Do you think this should be enough to overcome such situations in the
> future? Any additional ideas how to make it more safe?
>
> I'm still a little bit afraid if I think about the next time I will have
> to reboot the server. Shouldn't a reboot safely stop and restart all
> Hadoop services? Is there any thing I can do to make sure that the next
> reboot will not cause the same problems?
>
> Thanks a lot!
> Björn
>
>
>


-- 
-----李平

Re: namenode doesn't start after reboot

Posted by Bjoern Schiessle <bj...@schiessle.org>.

Hi,

On Thu, 23 Dec 2010 09:30:17 +0800 li ping wrote:
> It seems the exception occurs during NameNode loads the editlog.
> make sure the editlog file exists. or you can debug the application to
> see what's wrong.

last night I tried to fix the problem and did a big mistake. Instead of
copying /var/lib/hadoop-0.20/cache/hadoop/dfs/name/current/edits and
edits.new to a backup I moved them and later delete the only version
hence I thought I have a copy.

The good thing: The namenode starts again.
The bad thing: My file system is now in an inconsistent state.

Probably the only solution is to reformat the hdfs and start from
scratch. Thankfully there wasn't that much data stored at the hdfs until
now but I definitely have to make sure that this doesn't happens again:

1. I have set up a second dfs.name.dir which is stored at another
computer (mounted by sshfs)
2. I will install a backup script similar to:
http://blog.milford.io/2010/10/simple-hadoop-namenode-backup-script

Do you think this should be enough to overcome such situations in the
future? Any additional ideas how to make it more safe?

I'm still a little bit afraid if I think about the next time I will have
to reboot the server. Shouldn't a reboot safely stop and restart all
Hadoop services? Is there any thing I can do to make sure that the next
reboot will not cause the same problems?

Thanks a lot!
Björn

Re: namenode doesn't start after reboot

Posted by li ping <li...@gmail.com>.

It seems the exception occurs during NameNode loads the editlog.
make sure the editlog file exists. or you can debug the application to see
what's wrong.

On Thu, Dec 23, 2010 at 2:01 AM, daniel sikar <ds...@gmail.com> wrote:

> I can't help but with hindsight - it's advisable to snapshot your
> namenodes as HDFS dies with them.
>
> On 22 December 2010 15:03, Bjoern Schiessle <bj...@schiessle.org> wrote:
> > Hi,
> >
> > After a Kernel update and a reboot the namenode doesn't start. I run the
> > Cloudera cdh3 Hadoop distribution. I have already searched for a
> solution.
> > It looks like I'm not the only one with such a problem. Sadly I could
> only
> > find descriptions of similar problems, but no solutions...
> >
> > This is the error message from the namenode log file:
> >
> >
> > 2010-12-22 16:13:04,830 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> > /************************************************************
> > STARTUP_MSG: Starting NameNode
> > STARTUP_MSG:   host = pcube/129.69.216.24
> > STARTUP_MSG:   args = []
> > STARTUP_MSG:   version = 0.20.2+737
> > STARTUP_MSG:   build =  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd;
> compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010
> > ************************************************************/
> > 2010-12-22 16:13:05,001 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=NameNode, sessionId=null
> > 2010-12-22 16:13:05,007 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing
> NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> > 2010-12-22 16:13:05,036 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
> > 2010-12-22 16:13:05,036 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> > 2010-12-22 16:13:05,036 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=false
> > 2010-12-22 16:13:05,040 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s),
> accessTokenLifetime=0 min(s)
> > 2010-12-22 16:13:05,335 INFO
> org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
> Initializing FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> > 2010-12-22 16:13:05,336 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> > 2010-12-22 16:13:05,361 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Number of files = 72
> > 2010-12-22 16:13:05,374 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Number of files under
> construction = 3
> > 2010-12-22 16:13:05,375 INFO
> org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8822 loaded
> in 0 seconds.
> > 2010-12-22 16:13:05,377 ERROR
> org.apache.hadoop.hdfs.server.namenode.NameNode:
> java.lang.NullPointerException
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343)
> >        at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148)
> >        at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)
> >
> > 2010-12-22 16:13:05,377 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > /************************************************************
> > SHUTDOWN_MSG: Shutting down NameNode at pcube/129.69.216.24
> > ************************************************************/
> >
> > Any idea what could be wrong and how I can get my namenode up running
> again?
> >
> > Thanks a lot!
> > Björn
> >
>



-- 
-----李平

Re: namenode doesn't start after reboot

Posted by daniel sikar <ds...@gmail.com>.

I can't help but with hindsight - it's advisable to snapshot your
namenodes as HDFS dies with them.

On 22 December 2010 15:03, Bjoern Schiessle <bj...@schiessle.org> wrote:
> Hi,
>
> After a Kernel update and a reboot the namenode doesn't start. I run the
> Cloudera cdh3 Hadoop distribution. I have already searched for a solution.
> It looks like I'm not the only one with such a problem. Sadly I could only
> find descriptions of similar problems, but no solutions...
>
> This is the error message from the namenode log file:
>
>
> 2010-12-22 16:13:04,830 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = pcube/129.69.216.24
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2+737
> STARTUP_MSG:   build =  -r 98c55c28258aa6f42250569bd7fa431ac657bdbd; compiled by 'root' on Mon Oct 11 17:21:30 UTC 2010
> ************************************************************/
> 2010-12-22 16:13:05,001 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=NameNode, sessionId=null
> 2010-12-22 16:13:05,007 INFO org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: fsOwner=hdfs
> 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2010-12-22 16:13:05,036 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isPermissionEnabled=false
> 2010-12-22 16:13:05,040 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
> 2010-12-22 16:13:05,335 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: Initializing FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NullContext
> 2010-12-22 16:13:05,336 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered FSNamesystemStatusMBean
> 2010-12-22 16:13:05,361 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 72
> 2010-12-22 16:13:05,374 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files under construction = 3
> 2010-12-22 16:13:05,375 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size 8822 loaded in 0 seconds.
> 2010-12-22 16:13:05,377 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException
>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)
>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)
>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)
>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)
>        at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1034)
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
>        at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
>        at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343)
>        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148)
>        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)
>
> 2010-12-22 16:13:05,377 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at pcube/129.69.216.24
> ************************************************************/
>
> Any idea what could be wrong and how I can get my namenode up running again?
>
> Thanks a lot!
> Björn
>