You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ankur Sethi <as...@i411.com> on 2007/07/14 20:53:54 UTC

New user question

I am about to attempt setting up a hadoop file system for an application.

Hadoop Filesystem has single point of failure, namenode.  Can you explain
steps necessary for bringing the HDFS backup in case of namenode failure?

Before asking this question I went through these pages:
http://wiki.apache.org/lucene-hadoop-data/attachments/HadoopPresentations/att
achments/HDFSDescription.pdf

and http://lucene.apache.org/hadoop/hdfs_design.html

These describe the overall architecture and the fact that one can have
secondary namenodes.

Lets say the machine just died.

>From the documentation: "The Namenode machine is a single point of failure
for an HDFS cluster. If the Namenode machine fails, manual intervention is
necessary. Currently, automatic restart and failover of the Namenode software
to another machine is not supported."

So what is this manual intervention?  I am confused on this.  All the nodes
have a configuration file with the master namenode set.  So one should
bringup a machine with the same name/ip address.

Then what?  Can one bring up the new machine and start a namenode server and
have it repopulate on its own?  Please explain?

Sorry if this has been asked before.  I did research on the mailing list and
the FAQ page and the documentation before asking this.

Thanks,
Ankur

RE: New user question

Posted by Ankur Sethi <as...@i411.com>.
I see that noone has responded so this is an affirmative that data of the
entire cluster is lost when the namenode data is lost.

I suppose we will have a secondary namenode as backup but we can see that
Hadoop has a long way to go.  Wow, looks like the guys at google have put it
a lot of hardwork.

Ankur
-----Original Message-----
From: amalagaura@gmail.com [mailto:amalagaura@gmail.com] On Behalf Of Ankur
Sethi
Sent: Saturday, July 14, 2007 6:51 PM
To: hadoop-user@lucene.apache.org
Subject: Re: New user question

In case the namenode data is lost the data of the entire cluster is lost?

On 7/14/07, Raghu Angadi <ra...@yahoo-inc.com> wrote:
>
>
> You can specify multiple directories for Namenode data, in which case
> the image is written to all the directories. You can also an NFS mount,
> raid or similar approach.
>
> Raghu.
>
> Ankur Sethi wrote:
> > Thank you for the information.
> >
> > I want to take a worse case scenario if the namenode fails.  So you are
> > suggesting copying the dfs.name.dir directory.  We can take regular
> backups
> > of this?   Shouldn't HDFS be truly fault tolerant in this regard?  If
> you
> > have 500 machines shouldn't it replicate the essential data in case of
> > failure.
> >
> > The google file system maintains replicates server critical information
> as
> > well.
> >
> > Let's say one did not have the dfs.name.dir backed up, what would
> happen?
> >
> > Thanks,
> > Ankur
>

Re: New user question

Posted by Ankur Sethi <an...@gmail.com>.
In case the namenode data is lost the data of the entire cluster is lost?

On 7/14/07, Raghu Angadi <ra...@yahoo-inc.com> wrote:
>
>
> You can specify multiple directories for Namenode data, in which case
> the image is written to all the directories. You can also an NFS mount,
> raid or similar approach.
>
> Raghu.
>
> Ankur Sethi wrote:
> > Thank you for the information.
> >
> > I want to take a worse case scenario if the namenode fails.  So you are
> > suggesting copying the dfs.name.dir directory.  We can take regular
> backups
> > of this?   Shouldn't HDFS be truly fault tolerant in this regard?  If
> you
> > have 500 machines shouldn't it replicate the essential data in case of
> > failure.
> >
> > The google file system maintains replicates server critical information
> as
> > well.
> >
> > Let's say one did not have the dfs.name.dir backed up, what would
> happen?
> >
> > Thanks,
> > Ankur
>

Re: New user question

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
You can specify multiple directories for Namenode data, in which case 
the image is written to all the directories. You can also an NFS mount, 
raid or similar approach.

Raghu.

Ankur Sethi wrote:
> Thank you for the information.
> 
> I want to take a worse case scenario if the namenode fails.  So you are
> suggesting copying the dfs.name.dir directory.  We can take regular backups
> of this?   Shouldn't HDFS be truly fault tolerant in this regard?  If you
> have 500 machines shouldn't it replicate the essential data in case of
> failure.
> 
> The google file system maintains replicates server critical information as
> well.
> 
> Let's say one did not have the dfs.name.dir backed up, what would happen?
> 
> Thanks,
> Ankur

Re: New user question

Posted by Ankur Sethi <an...@gmail.com>.
Thank you for the information.

I want to take a worse case scenario if the namenode fails.  So you are
suggesting copying the dfs.name.dir directory.  We can take regular backups
of this?   Shouldn't HDFS be truly fault tolerant in this regard?  If you
have 500 machines shouldn't it replicate the essential data in case of
failure.

The google file system maintains replicates server critical information as
well.

Let's say one did not have the dfs.name.dir backed up, what would happen?

Thanks,
Ankur

On 7/14/07, Raghu Angadi <ra...@yahoo-inc.com> wrote:
>
> Ankur Sethi wrote:
>
> > Then what?  Can one bring up the new machine and start a namenode server
> and
> > have it repopulate on its own?  Please explain?
>
> If you bring up the new Namenode with same hostname and IP, then you
> don't need to restart the Datanodes. If the hostname changes, then you
> need to edit the configuration, distribute the configuration to other
> nodes and restart the whole cluster.
>
> Before bringing up the new Namenode, you need to copy ${dfs.name.dir}
> directory from the original Namenode. By default, this is set to
> ${dfs.tmp.dir}/dfs/name. This directory has the filesystem image for the
> cluster.
>
> Raghu.
>
> > Sorry if this has been asked before.  I did research on the mailing list
> and
> > the FAQ page and the documentation before asking this.
> >
> > Thanks,
> > Ankur
>
>

Re: New user question

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Ankur Sethi wrote:

> Then what?  Can one bring up the new machine and start a namenode server and
> have it repopulate on its own?  Please explain?

If you bring up the new Namenode with same hostname and IP, then you 
don't need to restart the Datanodes. If the hostname changes, then you 
need to edit the configuration, distribute the configuration to other 
nodes and restart the whole cluster.

Before bringing up the new Namenode, you need to copy ${dfs.name.dir} 
directory from the original Namenode. By default, this is set to 
${dfs.tmp.dir}/dfs/name. This directory has the filesystem image for the 
cluster.

Raghu.

> Sorry if this has been asked before.  I did research on the mailing list and
> the FAQ page and the documentation before asking this.
> 
> Thanks,
> Ankur