You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Ariel Rabkin <as...@gmail.com> on 2009/03/02 06:48:02 UTC

Re: How to deal with HDFS failures properly

DataNode failures should be transparent.  NameNode failures will bring
down the whole HDFS and result in noticeable outage.   Replicating the
NameNode is on the long-term roadmap, but my impression is that it
won't be happening very soon.

--Ari

On Thu, Feb 26, 2009 at 5:30 PM, Brian Long <br...@dotspots.com> wrote:
> I'm wondering what the proper actions to take in light of a NameNode or
> DataNode failure are in an application which is holding a reference to a
> FileSystem object.
> * Does the FileSystem handle all of this itself (e.g. reconnect logic)?
> * Do I need to get a new FileSystem using .get(Configuration)?
> * Does the FileSystem need to be closed before re-getting?
> * Do the answers to these questions depend on whether it's a NameNode or
> DataNode that's failed?
>
> In short, how does an application (not a Hadoop job -- just an app using
> HDFS) properly recover from a NameNode or DataNode failure? I haven't
> figured out the magic juju yet and my applications are not handling DFS
> outages gracefully.
>
> Thanks,
> Brian
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department