You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by jiang licht <li...@yahoo.com> on 2010/08/20 22:56:36 UTC

what will happen if a backup name node folder becomes unaccessible?

Using nfs folder to back up dfs meta information as follows,

<property>
        <name>dfs.name.dir</name>
        <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
    </property>

where /hadoop-backup is on a backup machine and mounted on the master node.

I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?

Thanks,

Michael


      

Re: what will happen if a backup name node folder becomes unaccessible?

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Aug 23, 2010 at 3:05 PM, Michael Segel
<mi...@hotmail.com> wrote:
>
> Ok...
>
> Now you have me confused.
> Everything we've seen says that writing to both a local disk and to an NFS mounted disk would be the best way to prevent a problem.
>
> Now you and Harsh J say that this could actually be problematic.
>
> Which is it?
> Is this now a defect that should be addressed, or should we just not use an NFS mounted drive?
>
> Thx
>
> -Mike
>
>
>> Date: Mon, 23 Aug 2010 11:42:59 -0700
>> From: licht_jiang@yahoo.com
>> Subject: Re: what will happen if a backup name node folder becomes unaccessible?
>> To: common-user@hadoop.apache.org
>>
>> This makes a good argument. Actually, after seeing the previous reply, I kindof convinced that I should go back to "sync" the meta data to a backup location instead of using this feature, which as David mentioned, introduced a 2nd single point of failure to hadoop, which degrades the availability of hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can someone confirm whether a name node will shut down given that a backup folder listed in "dfs.name.dir" becomes unavailable in this version?
>>
>> Thanks,
>>
>> Michael
>>
>> --- On Sun, 8/22/10, David B. Ritch <da...@gmail.com> wrote:
>>
>> From: David B. Ritch <da...@gmail.com>
>> Subject: Re: what will happen if a backup name node folder becomes unaccessible?
>> To: common-user@hadoop.apache.org
>> Date: Sunday, August 22, 2010, 11:34 PM
>>
>>  Which version of Hadoop was this?  The folks at Cloudera have assured
>> me that the namenode in CDH2 will continue as long as one of the
>> directories is still writable.
>>
>> It *does* seem a bit of a waste if an availability feature - the ability
>> to write to multiple directories - actually reduces availability by
>> providing an additional single point of failure.
>>
>> Thanks!
>>
>> dbr
>>
>> On 8/20/2010 5:27 PM, Harsh J wrote:
>> > Whee, lets try it out:
>> >
>> > Start with both paths available. ... Starts fine.
>> > Store some files. ... Works.
>> > rm -r the second path. ... Ouch.
>> > Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
>> > stuff back yet]
>> > Wait for checkpoint to hit.
>> > And ...
>> > Boom!
>> >
>> > 2010-08-21 02:42:00,385 INFO
>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
>> > from 127.0.0.1
>> > 2010-08-21 02:42:00,385 INFO
>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>> > transactions: 37 Total time for transactions(ms): 6Number of
>> > transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
>> > 307 277
>> > 2010-08-21 02:42:00,439 FATAL
>> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
>> > storage directories are inaccessible.
>> > 2010-08-21 02:42:00,440 INFO
>> > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>> > /************************************************************
>> > SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
>> > ************************************************************/
>> >
>> > So yes, as Edward says - never let this happen!
>> >
>> > On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
>> >> Using nfs folder to back up dfs meta information as follows,
>> >>
>> >> <property>
>> >>         <name>dfs.name.dir</name>
>> >>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>> >>     </property>
>> >>
>> >> where /hadoop-backup is on a backup machine and mounted on the master node.
>> >>
>> >> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
>> >>
>> >> Thanks,
>> >>
>> >> Michael
>> >>
>> >>
>> >>
>> >
>> >
>>
>>
>>
>>
>>
>

The best method to store it.. is whatever you believe. tow disk, three
disk, one disk + nfs.

Practically speaking, your NameNode data for a 300 TB of HDFS data
might be less then a GB. Everyone gets super panicked about SPOFs, but
practically speaking you can fit all of your NameNode data on a
RAMDisk! (not suggesting that)

This is not like a 150 GB data, that is hard to backup and store and
replicate. It is a few MB, if you lose it and do not have a good
backup you were NOT trying that hard.

In any case, the reason the NameNode does not continue writing in the
case of single failures: The goal is to keep all the directories
consistent. For example, If the two directories have different
content, which one is the authoritative copy?

RE: what will happen if a backup name node folder becomes unaccessible?

Posted by Michael Segel <mi...@hotmail.com>.
Ok... 

Now you have me confused.
Everything we've seen says that writing to both a local disk and to an NFS mounted disk would be the best way to prevent a problem.

Now you and Harsh J say that this could actually be problematic. 

Which is it?
Is this now a defect that should be addressed, or should we just not use an NFS mounted drive?

Thx

-Mike


> Date: Mon, 23 Aug 2010 11:42:59 -0700
> From: licht_jiang@yahoo.com
> Subject: Re: what will happen if a backup name node folder becomes unaccessible?
> To: common-user@hadoop.apache.org
> 
> This makes a good argument. Actually, after seeing the previous reply, I kindof convinced that I should go back to "sync" the meta data to a backup location instead of using this feature, which as David mentioned, introduced a 2nd single point of failure to hadoop, which degrades the availability of hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can someone confirm whether a name node will shut down given that a backup folder listed in "dfs.name.dir" becomes unavailable in this version?
> 
> Thanks,
> 
> Michael
> 
> --- On Sun, 8/22/10, David B. Ritch <da...@gmail.com> wrote:
> 
> From: David B. Ritch <da...@gmail.com>
> Subject: Re: what will happen if a backup name node folder becomes unaccessible?
> To: common-user@hadoop.apache.org
> Date: Sunday, August 22, 2010, 11:34 PM
> 
>  Which version of Hadoop was this?  The folks at Cloudera have assured
> me that the namenode in CDH2 will continue as long as one of the
> directories is still writable.
> 
> It *does* seem a bit of a waste if an availability feature - the ability
> to write to multiple directories - actually reduces availability by
> providing an additional single point of failure.
> 
> Thanks!
> 
> dbr
> 
> On 8/20/2010 5:27 PM, Harsh J wrote:
> > Whee, lets try it out:
> >
> > Start with both paths available. ... Starts fine.
> > Store some files. ... Works.
> > rm -r the second path. ... Ouch.
> > Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
> > stuff back yet]
> > Wait for checkpoint to hit.
> > And ...
> > Boom!
> >
> > 2010-08-21 02:42:00,385 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
> > from 127.0.0.1
> > 2010-08-21 02:42:00,385 INFO
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> > transactions: 37 Total time for transactions(ms): 6Number of
> > transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
> > 307 277
> > 2010-08-21 02:42:00,439 FATAL
> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
> > storage directories are inaccessible.
> > 2010-08-21 02:42:00,440 INFO
> > org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> > /************************************************************
> > SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
> > ************************************************************/
> >
> > So yes, as Edward says - never let this happen!
> >
> > On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
> >> Using nfs folder to back up dfs meta information as follows,
> >>
> >> <property>
> >>         <name>dfs.name.dir</name>
> >>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
> >>     </property>
> >>
> >> where /hadoop-backup is on a backup machine and mounted on the master node.
> >>
> >> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
> >>
> >> Thanks,
> >>
> >> Michael
> >>
> >>
> >>
> >
> >
> 
> 
> 
> 
>       
 		 	   		  

Re: what will happen if a backup name node folder becomes unaccessible?

Posted by jiang licht <li...@yahoo.com>.
This makes a good argument. Actually, after seeing the previous reply, I kindof convinced that I should go back to "sync" the meta data to a backup location instead of using this feature, which as David mentioned, introduced a 2nd single point of failure to hadoop, which degrades the availability of hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can someone confirm whether a name node will shut down given that a backup folder listed in "dfs.name.dir" becomes unavailable in this version?

Thanks,

Michael

--- On Sun, 8/22/10, David B. Ritch <da...@gmail.com> wrote:

From: David B. Ritch <da...@gmail.com>
Subject: Re: what will happen if a backup name node folder becomes unaccessible?
To: common-user@hadoop.apache.org
Date: Sunday, August 22, 2010, 11:34 PM

 Which version of Hadoop was this?  The folks at Cloudera have assured
me that the namenode in CDH2 will continue as long as one of the
directories is still writable.

It *does* seem a bit of a waste if an availability feature - the ability
to write to multiple directories - actually reduces availability by
providing an additional single point of failure.

Thanks!

dbr

On 8/20/2010 5:27 PM, Harsh J wrote:
> Whee, lets try it out:
>
> Start with both paths available. ... Starts fine.
> Store some files. ... Works.
> rm -r the second path. ... Ouch.
> Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
> stuff back yet]
> Wait for checkpoint to hit.
> And ...
> Boom!
>
> 2010-08-21 02:42:00,385 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
> from 127.0.0.1
> 2010-08-21 02:42:00,385 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> transactions: 37 Total time for transactions(ms): 6Number of
> transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
> 307 277
> 2010-08-21 02:42:00,439 FATAL
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
> storage directories are inaccessible.
> 2010-08-21 02:42:00,440 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
> ************************************************************/
>
> So yes, as Edward says - never let this happen!
>
> On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
>> Using nfs folder to back up dfs meta information as follows,
>>
>> <property>
>>         <name>dfs.name.dir</name>
>>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>>     </property>
>>
>> where /hadoop-backup is on a backup machine and mounted on the master node.
>>
>> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>
>




      

Re: what will happen if a backup name node folder becomes unaccessible?

Posted by "David B. Ritch" <da...@gmail.com>.
 Which version of Hadoop was this?  The folks at Cloudera have assured
me that the namenode in CDH2 will continue as long as one of the
directories is still writable.

It *does* seem a bit of a waste if an availability feature - the ability
to write to multiple directories - actually reduces availability by
providing an additional single point of failure.

Thanks!

dbr

On 8/20/2010 5:27 PM, Harsh J wrote:
> Whee, lets try it out:
>
> Start with both paths available. ... Starts fine.
> Store some files. ... Works.
> rm -r the second path. ... Ouch.
> Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
> stuff back yet]
> Wait for checkpoint to hit.
> And ...
> Boom!
>
> 2010-08-21 02:42:00,385 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
> from 127.0.0.1
> 2010-08-21 02:42:00,385 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
> transactions: 37 Total time for transactions(ms): 6Number of
> transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
> 307 277
> 2010-08-21 02:42:00,439 FATAL
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
> storage directories are inaccessible.
> 2010-08-21 02:42:00,440 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
> ************************************************************/
>
> So yes, as Edward says - never let this happen!
>
> On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
>> Using nfs folder to back up dfs meta information as follows,
>>
>> <property>
>>         <name>dfs.name.dir</name>
>>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>>     </property>
>>
>> where /hadoop-backup is on a backup machine and mounted on the master node.
>>
>> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
>>
>> Thanks,
>>
>> Michael
>>
>>
>>
>
>


Re: what will happen if a backup name node folder becomes unaccessible?

Posted by jiang licht <li...@yahoo.com>.
Haha, vivid tutorial, thanks!

Best regards,

Michael

--- On Fri, 8/20/10, Harsh J <qw...@gmail.com> wrote:

From: Harsh J <qw...@gmail.com>
Subject: Re: what will happen if a backup name node folder becomes unaccessible?
To: common-user@hadoop.apache.org
Date: Friday, August 20, 2010, 4:27 PM

Whee, lets try it out:

Start with both paths available. ... Starts fine.
Store some files. ... Works.
rm -r the second path. ... Ouch.
Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
stuff back yet]
Wait for checkpoint to hit.
And ...
Boom!

2010-08-21 02:42:00,385 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
from 127.0.0.1
2010-08-21 02:42:00,385 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 37 Total time for transactions(ms): 6Number of
transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
307 277
2010-08-21 02:42:00,439 FATAL
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
storage directories are inaccessible.
2010-08-21 02:42:00,440 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

So yes, as Edward says - never let this happen!

On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
> Using nfs folder to back up dfs meta information as follows,
>
> <property>
>         <name>dfs.name.dir</name>
>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>     </property>
>
> where /hadoop-backup is on a backup machine and mounted on the master node.
>
> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
>
> Thanks,
>
> Michael
>
>
>



-- 
Harsh J
www.harshj.com



      

Re: what will happen if a backup name node folder becomes unaccessible?

Posted by Harsh J <qw...@gmail.com>.
Whee, lets try it out:

Start with both paths available. ... Starts fine.
Store some files. ... Works.
rm -r the second path. ... Ouch.
Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
stuff back yet]
Wait for checkpoint to hit.
And ...
Boom!

2010-08-21 02:42:00,385 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
from 127.0.0.1
2010-08-21 02:42:00,385 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
transactions: 37 Total time for transactions(ms): 6Number of
transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
307 277
2010-08-21 02:42:00,439 FATAL
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
storage directories are inaccessible.
2010-08-21 02:42:00,440 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
************************************************************/

So yes, as Edward says - never let this happen!

On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
> Using nfs folder to back up dfs meta information as follows,
>
> <property>
>         <name>dfs.name.dir</name>
>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>     </property>
>
> where /hadoop-backup is on a backup machine and mounted on the master node.
>
> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
>
> Thanks,
>
> Michael
>
>
>



-- 
Harsh J
www.harshj.com

Re: what will happen if a backup name node folder becomes unaccessible?

Posted by Edward Capriolo <ed...@gmail.com>.
On Fri, Aug 20, 2010 at 4:56 PM, jiang licht <li...@yahoo.com> wrote:
> Using nfs folder to back up dfs meta information as follows,
>
> <property>
>         <name>dfs.name.dir</name>
>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>     </property>
>
> where /hadoop-backup is on a backup machine and mounted on the master node.
>
> I have a question: if somehow, the backup folder becomes unavailable, will it freeze master node? That is, will write operation simply hang up on this condition on the master node? Or will master node log the problem and continues to work?
>
> Thanks,
>
> Michael
>
>
>

Hadoop MUST be able to write and read from both directories. If it can
not the NameNode will shutdown and refuse to start again.