You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Goel, Ankur" <an...@corp.aol.com> on 2008/11/10 09:24:04 UTC

Best way to handle namespace host failures

Hi Folks, 

             I am looking for some advice on some the ways / techniques
that people are using to get around namenode failures (Both disk and
host).

We have a small cluster with several job scheduled for periodic
execution on the same host where name server runs. What we would like to
have is an automatic failover mechanism in hadoop so that a secondary
namenode automatically takes the roll of a master. 

 

I can move this discussion to a JIRA if people are interested.

 

Thanks

-Ankur


Re: Best way to handle namespace host failures

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.


On 11/10/08 10:42 PM, "Dhruba Borthakur" <dh...@gmail.com> wrote:
> 2. Create a virtual IP, say name.xx.com that points to the real
> machine name of the machine on which the namenode runs.

    Everyone doing this should be aware of the discussion happening in

https://issues.apache.org/jira/browse/HADOOP-3988

    though.


Re: Best way to handle namespace host failures

Posted by Dhruba Borthakur <dh...@gmail.com>.
Couple of things that one can do:

1. dfs.name.dir should have at least two locations, one on the local
disk and one on NFS. This means that all transactions are
synchronously logged into two places.

2. Create a virtual IP, say name.xx.com that points to the real
machine name of the machine on which the namenode runs.

If the namenode machine burns, then change the virtual IP to point to
a new machine. Copy the namenode metadata from the NFS location to the
local disk on this new machine. Then start namenode on this new
machine.

Done!
-dhruba


On Mon, Nov 10, 2008 at 12:24 AM, Goel, Ankur <an...@corp.aol.com> wrote:
> Hi Folks,
>
>             I am looking for some advice on some the ways / techniques
> that people are using to get around namenode failures (Both disk and
> host).
>
> We have a small cluster with several job scheduled for periodic
> execution on the same host where name server runs. What we would like to
> have is an automatic failover mechanism in hadoop so that a secondary
> namenode automatically takes the roll of a master.
>
>
>
> I can move this discussion to a JIRA if people are interested.
>
>
>
> Thanks
>
> -Ankur
>
>

RE: Best way to handle namespace host failures

Posted by "Goel, Ankur" <an...@corp.aol.com>.
In case we are starting namenode on a different host, the configuration
on all the cluster nodes will need to be updated before a cluster
restart. right?


-----Original Message-----
From: Alex Loddengaard [mailto:alex@cloudera.com] 
Sent: Tuesday, November 11, 2008 12:07 AM
To: core-user@hadoop.apache.org
Cc: Ian Holsman
Subject: Re: Best way to handle namespace host failures

There has been a lot of discussion on this list about handling namenode
failover.  Generally the most common approach is to backup the namenode
to
an NFS mount and manually instantiate a new namenode when your current
namenode fails.
As Hadoop exists today, the namenode is a single point of failure.

Alex

On Mon, Nov 10, 2008 at 3:12 AM, Goel, Ankur
<an...@corp.aol.com>wrote:

> Thanks for the replies folks. We are not seeing this frequently but we
> just want to avoid single point of failure and keep the manual
> intervention to the min. or at best none. This is to ensure that
system
> runs smoothly in production without abrupt failures.
>
> Thanks
> -Ankur
>
> -----Original Message-----
> From: Amar Kamat [mailto:amarrk@yahoo-inc.com]
> Sent: Monday, November 10, 2008 3:53 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Best way to handle namespace host failures
>
> Goel, Ankur wrote:
> > Hi Folks,
> >
> >              I am looking for some advice on some the ways /
> techniques
> > that people are using to get around namenode failures (Both disk and
> > host).
> >
> > We have a small cluster with several job scheduled for periodic
> > execution on the same host where name server runs. What we would
like
> to
> > have is an automatic failover mechanism in hadoop so that a
secondary
> > namenode automatically takes the roll of a master.
> >
> Are you seeing this frequently? If yes then you should find out why
its
> happening. As far as I know namenode failure is not expected to be
> frequent.
> Amar
> >
> >
> > I can move this discussion to a JIRA if people are interested.
> >
> >
> >
> > Thanks
> >
> > -Ankur
> >
> >
> >
>
>

Re: Best way to handle namespace host failures

Posted by Alex Loddengaard <al...@cloudera.com>.
There has been a lot of discussion on this list about handling namenode
failover.  Generally the most common approach is to backup the namenode to
an NFS mount and manually instantiate a new namenode when your current
namenode fails.
As Hadoop exists today, the namenode is a single point of failure.

Alex

On Mon, Nov 10, 2008 at 3:12 AM, Goel, Ankur <an...@corp.aol.com>wrote:

> Thanks for the replies folks. We are not seeing this frequently but we
> just want to avoid single point of failure and keep the manual
> intervention to the min. or at best none. This is to ensure that system
> runs smoothly in production without abrupt failures.
>
> Thanks
> -Ankur
>
> -----Original Message-----
> From: Amar Kamat [mailto:amarrk@yahoo-inc.com]
> Sent: Monday, November 10, 2008 3:53 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Best way to handle namespace host failures
>
> Goel, Ankur wrote:
> > Hi Folks,
> >
> >              I am looking for some advice on some the ways /
> techniques
> > that people are using to get around namenode failures (Both disk and
> > host).
> >
> > We have a small cluster with several job scheduled for periodic
> > execution on the same host where name server runs. What we would like
> to
> > have is an automatic failover mechanism in hadoop so that a secondary
> > namenode automatically takes the roll of a master.
> >
> Are you seeing this frequently? If yes then you should find out why its
> happening. As far as I know namenode failure is not expected to be
> frequent.
> Amar
> >
> >
> > I can move this discussion to a JIRA if people are interested.
> >
> >
> >
> > Thanks
> >
> > -Ankur
> >
> >
> >
>
>

RE: Best way to handle namespace host failures

Posted by "Goel, Ankur" <an...@corp.aol.com>.
Thanks for the replies folks. We are not seeing this frequently but we
just want to avoid single point of failure and keep the manual
intervention to the min. or at best none. This is to ensure that system
runs smoothly in production without abrupt failures.

Thanks
-Ankur

-----Original Message-----
From: Amar Kamat [mailto:amarrk@yahoo-inc.com] 
Sent: Monday, November 10, 2008 3:53 PM
To: core-user@hadoop.apache.org
Subject: Re: Best way to handle namespace host failures

Goel, Ankur wrote:
> Hi Folks, 
>
>              I am looking for some advice on some the ways /
techniques
> that people are using to get around namenode failures (Both disk and
> host).
>
> We have a small cluster with several job scheduled for periodic
> execution on the same host where name server runs. What we would like
to
> have is an automatic failover mechanism in hadoop so that a secondary
> namenode automatically takes the roll of a master. 
>   
Are you seeing this frequently? If yes then you should find out why its 
happening. As far as I know namenode failure is not expected to be
frequent.
Amar
>  
>
> I can move this discussion to a JIRA if people are interested.
>
>  
>
> Thanks
>
> -Ankur
>
>
>   


Re: Best way to handle namespace host failures

Posted by Amar Kamat <am...@yahoo-inc.com>.
Goel, Ankur wrote:
> Hi Folks, 
>
>              I am looking for some advice on some the ways / techniques
> that people are using to get around namenode failures (Both disk and
> host).
>
> We have a small cluster with several job scheduled for periodic
> execution on the same host where name server runs. What we would like to
> have is an automatic failover mechanism in hadoop so that a secondary
> namenode automatically takes the roll of a master. 
>   
Are you seeing this frequently? If yes then you should find out why its 
happening. As far as I know namenode failure is not expected to be frequent.
Amar
>  
>
> I can move this discussion to a JIRA if people are interested.
>
>  
>
> Thanks
>
> -Ankur
>
>
>   


Re: Best way to handle namespace host failures

Posted by Sharad Agarwal <sh...@yahoo-inc.com>.
Goel, Ankur wrote:
> Hi Folks, 
>
>              I am looking for some advice on some the ways / techniques
> that people are using to get around namenode failures (Both disk and
> host).
>
> We have a small cluster with several job scheduled for periodic
> execution on the same host where name server runs. What we would like to
> have is an automatic failover mechanism in hadoop so that a secondary
> namenode automatically takes the roll of a master. 
>   
Secondary namenode is a misnomer. Description at 
http://wiki.apache.org/hadoop/FAQ#7 should help.
>  
>
> I can move this discussion to a JIRA if people are interested.
>
>  
>
> Thanks
>
> -Ankur
>
>
>