You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2008/07/29 18:01:22 UTC

Question about fault tolerance and fail over for name nodes

What are people doing?

For jobs that have a long enough SLA, just shutting down the cluster and 
bringing up the secondary as the master works for us.
We have some jobs where that doesn't work well, because the recovery 
time is not acceptable.

There has been internal discussion of using drdb to hotfail a namenode 
to a backup so that the running job can continue.

Re: Question about fault tolerance and fail over for name nodes

Posted by Steve Loughran <st...@apache.org>.

Andreas Kostyrka wrote:
> On Tuesday 29 July 2008 18:22:07 Paco NATHAN wrote:
>> Jason,
>>
>> FWIW -- based on a daily batch process, requiring 9 Hadoop jobs in
>> sequence -- 100+2 EC2 nodes, 2 Tb data, 6 hrs run time.
>>
>> We tend to see a namenode failing early, e.g., the "problem advancing"
>> exception in the values iterator, particularly during a reduce phase.
>>
>> Hot-fail would be great. Otherwise, given the duration of our batch
>> job overall, we use what you describe: shut down cluster, etc.
>>
>> Would prefer to observe this kind of failure sooner than later. We've
>> discussed internally how to craft an initial job which could stress
>> test the namenode.  Think of a "unit test" for the cluster.
> 
> ssh namenode 'kill -9 $(ps ax | grep java.*NameNode | cut -f 1 -d " " )'
> 
> Here goes your namenode failure, if you just want to do the exercise for a 
> failover ;)

Simulating network partitioning can be more interesting, as then your 
failover tools have to deal with the risk that there are now two 
machines that think they are in charge. This is why building 
High-Availability and fault-tolerant systems are tricky.

-- 
Steve Loughran                  http://www.1060.org/blogxter/publish/5
Author: Ant in Action           http://antbook.org/

Re: Question about fault tolerance and fail over for name nodes

Posted by Andreas Kostyrka <an...@kostyrka.org>.

On Tuesday 29 July 2008 18:22:07 Paco NATHAN wrote:
> Jason,
>
> FWIW -- based on a daily batch process, requiring 9 Hadoop jobs in
> sequence -- 100+2 EC2 nodes, 2 Tb data, 6 hrs run time.
>
> We tend to see a namenode failing early, e.g., the "problem advancing"
> exception in the values iterator, particularly during a reduce phase.
>
> Hot-fail would be great. Otherwise, given the duration of our batch
> job overall, we use what you describe: shut down cluster, etc.
>
> Would prefer to observe this kind of failure sooner than later. We've
> discussed internally how to craft an initial job which could stress
> test the namenode.  Think of a "unit test" for the cluster.

ssh namenode 'kill -9 $(ps ax | grep java.*NameNode | cut -f 1 -d " " )'

Here goes your namenode failure, if you just want to do the exercise for a 
failover ;)

Andreas

>
> The business case for this becomes especially important when you need
> to automate the Hadoop cluster launch, e.g. with RightScale or another
> "cloud enabler" service.
>
> Anybody else heading in this direction?
>
> Paco
>
> On Tue, Jul 29, 2008 at 11:01 AM, Jason Venner <ja...@attributor.com> wrote:
> > What are people doing?
> >
> > For jobs that have a long enough SLA, just shutting down the cluster and
> > bringing up the secondary as the master works for us.
> > We have some jobs where that doesn't work well, because the recovery time
> > is not acceptable.
> >
> > There has been internal discussion of using drdb to hotfail a namenode to
> > a backup so that the running job can continue.

Re: Question about fault tolerance and fail over for name nodes

Posted by Paco NATHAN <ce...@gmail.com>.

Jason,

FWIW -- based on a daily batch process, requiring 9 Hadoop jobs in
sequence -- 100+2 EC2 nodes, 2 Tb data, 6 hrs run time.

We tend to see a namenode failing early, e.g., the "problem advancing"
exception in the values iterator, particularly during a reduce phase.

Hot-fail would be great. Otherwise, given the duration of our batch
job overall, we use what you describe: shut down cluster, etc.

Would prefer to observe this kind of failure sooner than later. We've
discussed internally how to craft an initial job which could stress
test the namenode.  Think of a "unit test" for the cluster.

The business case for this becomes especially important when you need
to automate the Hadoop cluster launch, e.g. with RightScale or another
"cloud enabler" service.

Anybody else heading in this direction?

Paco

On Tue, Jul 29, 2008 at 11:01 AM, Jason Venner <ja...@attributor.com> wrote:
> What are people doing?
>
> For jobs that have a long enough SLA, just shutting down the cluster and
> bringing up the secondary as the master works for us.
> We have some jobs where that doesn't work well, because the recovery time is
> not acceptable.
>
> There has been internal discussion of using drdb to hotfail a namenode to a
> backup so that the running job can continue.