You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brian Long <br...@dotspots.com> on 2009/02/27 02:30:15 UTC

How to deal with HDFS failures properly

I'm wondering what the proper actions to take in light of a NameNode or
DataNode failure are in an application which is holding a reference to a
FileSystem object.
* Does the FileSystem handle all of this itself (e.g. reconnect logic)?
* Do I need to get a new FileSystem using .get(Configuration)?
* Does the FileSystem need to be closed before re-getting?
* Do the answers to these questions depend on whether it's a NameNode or
DataNode that's failed?

In short, how does an application (not a Hadoop job -- just an app using
HDFS) properly recover from a NameNode or DataNode failure? I haven't
figured out the magic juju yet and my applications are not handling DFS
outages gracefully.

Thanks,
Brian

Re: How to deal with HDFS failures properly

Posted by Ariel Rabkin <as...@gmail.com>.
DataNode failures should be transparent.  NameNode failures will bring
down the whole HDFS and result in noticeable outage.   Replicating the
NameNode is on the long-term roadmap, but my impression is that it
won't be happening very soon.

--Ari

On Thu, Feb 26, 2009 at 5:30 PM, Brian Long <br...@dotspots.com> wrote:
> I'm wondering what the proper actions to take in light of a NameNode or
> DataNode failure are in an application which is holding a reference to a
> FileSystem object.
> * Does the FileSystem handle all of this itself (e.g. reconnect logic)?
> * Do I need to get a new FileSystem using .get(Configuration)?
> * Does the FileSystem need to be closed before re-getting?
> * Do the answers to these questions depend on whether it's a NameNode or
> DataNode that's failed?
>
> In short, how does an application (not a Hadoop job -- just an app using
> HDFS) properly recover from a NameNode or DataNode failure? I haven't
> figured out the magic juju yet and my applications are not handling DFS
> outages gracefully.
>
> Thanks,
> Brian
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department