You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Aaron Kimball <aa...@cloudera.com> on 2009/09/22 22:21:21 UTC

Re: Hadoop Failover and Recovery

Allen,

Here's an example of an HA solution being used for the namenode:
http://www.cloudera.com/blog/2009/07/22/hadoop-ha-configuration/

- Aaron

On Mon, Aug 31, 2009 at 10:01 AM, Allen Wittenauer <awittenauer@linkedin.com
> wrote:

> On 8/28/09 8:58 PM, "sagar_shukla" <sa...@persistent.co.in> wrote:
> >      What are the failover and recovery mechanisms available for Hadoop ?
> I
> > searched over the internet but could not find any good documentation for
> > different scenarios like datanode going down or namenode going down.
>
> In most cases, the documentation for "fixing" Hadoop is:
>
> A) fix hardware
> B) clean out tmp files, etc
> C) restart processes for that node
>
> Name node is a bit of a special case. I'm amused that
> http://wiki.apache.org/hadoop/NameNodeFailover is empty. :)
>
> For name node, you have some preventative things to do first:
>
> A) have matching hardware available
> B) make sure you have fsimage and edits file writing or at least available
> to that machine via NFS, SMB, whatever it takes
>
> On failure, use that backup image to bring the name node backup on your
> spare box.
>
> Note that the NN isn't HA.  I suspect something like SunCluster or VCS
> could
> be used here to make it less susceptible to issues, but I don't know if
> anyone has tried it.
>
>
>
>