You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ben Kucinich <be...@gmail.com> on 2008/02/03 08:28:20 UTC

how to recover if master node goes down?

I am new to Hadoop. I want to know a few things.

I have a Hadoop cluster of 1 master node and N - 1 slave nodes. I am putting
files into the DFS. If one of the slave node goes down, the data is still
accessible due to proper replication.

What will happen if the master node goes down? Will some slave node take
over as the master node automatically? Or all the data would become
inaccessible?

If all the data would become inacessible, what recovery action can be taken.
Say, I boot the master node system again and run start-all.sh again, would
it be enough to get things going fine again? Will I be able to access the
old data from the DFS again?

Re: how to recover if master node goes down?

Posted by Amar Kamat <am...@yahoo-inc.com>.
Ben Kucinich wrote:
> I am new to Hadoop. I want to know a few things.
>
> I have a Hadoop cluster of 1 master node and N - 1 slave nodes. I am putting
> files into the DFS. If one of the slave node goes down, the data is still
> accessible due to proper replication.
>
>   
There are 2 masters in hadoop, one for DFS(namenode) and one for 
MapReduce(JobTracker). I guess your question is directed for DFS.
> What will happen if the master node goes down? Will some slave node take
> over as the master node automatically? Or all the data would become
> inaccessible?
>   
The namenode periodically backs-up the file-system image on to the 
secondary namenode. The secondary namenode doesn't come up 
automatically. The data will not be lost since the fs-image is backed up 
but will become temporarily inaccessible.
> If all the data would become inacessible, what recovery action can be taken.
> Say, I boot the master node system again and run start-all.sh again, would
> it be enough to get things going fine again? Will I be able to access the
> old data from the DFS again?
>
>   
You just need to log on to the secondary namenode and start the namenode 
daemon. Yes this would be enough to get the things started and you would 
be able to access the data again.
Amar.

Re: how to recover if master node goes down?

Posted by Ted Dunning <td...@veoh.com>.


On 2/2/08 11:28 PM, "Ben Kucinich" <be...@gmail.com> wrote:

> What will happen if the master node goes down? Will some slave node take
> over as the master node automatically? Or all the data would become
> inaccessible?

There are two kinds of master node, one is the name node (what you are
talking about) and the other is the jobtracker which manages worker nodes
doing map-reduce stuff.

Anyway, if the name node goes down, your files are inaccessible.

It is conventional to have the name node store backups of its information so
that you would be able to recover (nearly all of) your data.  Writing to the
journal and to the backup is not instantaneous so it is likely you would
have a small amount of lossage if you were actively writing to the cluster,
but that should be limited to the files being written at the time.  Since
files cannot be updated, there is no risk in this scenario to existing
files.

Failing over to a secondary name node is not at all an automated process.

> If all the data would become inacessible, what recovery action can be taken.
> Say, I boot the master node system again and run start-all.sh again, would
> it be enough to get things going fine again? Will I be able to access the
> old data from the DFS again?

If you are able to restart the name node, then everything will be fine.