You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sudhir Vallamkondu <Su...@icrossing.com> on 2010/08/24 07:19:14 UTC

Re: common-user Digest 23 Aug 2010 21:21:26 -0000 Issue 1518

Looking at the codebase it seems to suggest that it ignores a editlog
storage directory if it encounters an error

http://www.google.com/codesearch/p?hl=en#GLh8vwsjDqs/trunk/src/hdfs/org/apac
he/hadoop/hdfs/server/namenode/FSEditLog.java&q=namenode%20editlog&sa=N&cd=2
0&ct=rc

Check lines:
Code in line 334
comment: 387 - 390
comment: 411 - 414
Comment: 433 - 436

The processIOError method is called throughout the code if it encounters an
IOException.  

A fatal error is only thrown if none of the storage directories is
accessible. Lines 394, 420

- Sudhir



On Aug/23/ 2:21 PM, "common-user-digest-help@hadoop.apache.org"
<co...@hadoop.apache.org> wrote:

> From: Michael Segel <mi...@hotmail.com>
> Date: Mon, 23 Aug 2010 14:05:05 -0500
> To: <co...@hadoop.apache.org>
> Subject: RE: what will happen if a backup name node folder becomes
> unaccessible?
> 
> 
> Ok... 
> 
> Now you have me confused.
> Everything we've seen says that writing to both a local disk and to an NFS
> mounted disk would be the best way to prevent a problem.
> 
> Now you and Harsh J say that this could actually be problematic.
> 
> Which is it?
> Is this now a defect that should be addressed, or should we just not use an
> NFS mounted drive?
> 
> Thx
> 
> -Mike
> 
> 
>> Date: Mon, 23 Aug 2010 11:42:59 -0700
>> From: licht_jiang@yahoo.com
>> Subject: Re: what will happen if a backup name node folder becomes
>> unaccessible?
>> To: common-user@hadoop.apache.org
>> 
>> This makes a good argument. Actually, after seeing the previous reply, I
>> kindof convinced that I should go back to "sync" the meta data to a backup
>> location instead of using this feature, which as David mentioned, introduced
>> a 2nd single point of failure to hadoop, which degrades the availability of
>> hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can someone
>> confirm whether a name node will shut down given that a backup folder listed
>> in "dfs.name.dir" becomes unavailable in this version?
>> 
>> Thanks,
>> 
>> Michael
>> 
>> --- On Sun, 8/22/10, David B. Ritch <da...@gmail.com> wrote:
>> 
>> From: David B. Ritch <da...@gmail.com>
>> Subject: Re: what will happen if a backup name node folder becomes
>> unaccessible?
>> To: common-user@hadoop.apache.org
>> Date: Sunday, August 22, 2010, 11:34 PM
>> 
>>  Which version of Hadoop was this?  The folks at Cloudera have assured
>> me that the namenode in CDH2 will continue as long as one of the
>> directories is still writable.
>> 
>> It *does* seem a bit of a waste if an availability feature - the ability
>> to write to multiple directories - actually reduces availability by
>> providing an additional single point of failure.
>> 
>> Thanks!
>> 
>> dbr
>> 
>> On 8/20/2010 5:27 PM, Harsh J wrote:
>>> Whee, lets try it out:
>>> 
>>> Start with both paths available. ... Starts fine.
>>> Store some files. ... Works.
>>> rm -r the second path. ... Ouch.
>>> Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
>>> stuff back yet]
>>> Wait for checkpoint to hit.
>>> And ...
>>> Boom!
>>> 
>>> 2010-08-21 02:42:00,385 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
>>> from 127.0.0.1
>>> 2010-08-21 02:42:00,385 INFO
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>> transactions: 37 Total time for transactions(ms): 6Number of
>>> transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
>>> 307 277
>>> 2010-08-21 02:42:00,439 FATAL
>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
>>> storage directories are inaccessible.
>>> 2010-08-21 02:42:00,440 INFO
>>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>> /************************************************************
>>> SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
>>> ************************************************************/
>>> 
>>> So yes, as Edward says - never let this happen!
>>> 
>>> On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com> wrote:
>>>> Using nfs folder to back up dfs meta information as follows,
>>>> 
>>>> <property>
>>>>         <name>dfs.name.dir</name>
>>>>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>>>>     </property>
>>>> 
>>>> where /hadoop-backup is on a backup machine and mounted on the master node.
>>>> 
>>>> I have a question: if somehow, the backup folder becomes unavailable, will
>>>> it freeze master node? That is, will write operation simply hang up on this
>>>> condition on the master node? Or will master node log the problem and
>>>> continues to work?
>>>> 
>>>> Thanks,
>>>> 
>>>> Michael
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>>       


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information of iCrossing. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.

Re: common-user Digest 23 Aug 2010 21:21:26 -0000 Issue 1518

Posted by Harsh J <qw...@gmail.com>.

Hello Sudhir,

You're right about this, but I don't seem to be getting the warning for the
edit log IOException at all in the first place. Here's my steps to get to
what I described earlier (note that am just using two directories on the
same disk, not two different devices or nfs, etc.) Its my personal computer
so I don't mind doing this again for now (as the other directory remains
untouched).

*hadoop 11:13:00 ~/.hadoop $* jps

4954 SecondaryNameNode

5911 Jps

5158 TaskTracker

4592 NameNode

5650 JobTracker

4768 DataNode

*hadoop 11:13:02 ~/.hadoop $* hadoop dfs -ls

Found 2 items

-rw-r--r--   1 hadoop supergroup     411536 2010-08-18 15:50
/user/hadoop/data
drwxr-xr-x   - hadoop supergroup          0 2010-08-18 16:02
/user/hadoop/dataout
hadoop 11:13:07 ~/.hadoop $ tail -n 10 conf/hdfs-site.xml

 <property>

   <name>*dfs.name.dir*</name>

   <value>/home/hadoop/.dfs/name,*/home/hadoop/.dfs/testdir*</value>

   <final>true</final>

 </property>

 <property>

   <name>dfs.datanode.max.xcievers</name>

   <value>2047</value>

 </property>

</configuration>

*hadoop 11:13:25 ~/.hadoop $* ls ~/.dfs/

data  name  testdir

*hadoop 11:13:36 ~/.hadoop $ rm -r ~/.dfs/testdir  *

*hadoop 11:13:49 ~/.hadoop $* jps

6135 Jps

4954 SecondaryNameNode

5158 TaskTracker

4592 NameNode

5650 JobTracker

4768 DataNode

*hadoop 11:13:56 ~/.hadoop $* hadoop dfs -put /etc/profile profile1

*hadoop 11:14:10 ~/.hadoop $* hadoop dfs -put /etc/profile profile2

*hadoop 11:14:12 ~/.hadoop $* hadoop dfs -put /etc/profile profile3

*hadoop 11:14:15 ~/.hadoop $* hadoop dfs -put /etc/profile profile4


*hadoop 11:17:21 ~/.hadoop $* jps
4954 SecondaryNameNode

5158 TaskTracker

4592 NameNode

5650 JobTracker

4768 DataNode

6954 Jps

*hadoop 11:17:23 ~/.hadoop $* tail -f
hadoop-0.20.2/logs/hadoop-hadoop-namenode-hadoop.log
2010-08-24 11:14:17,632 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.allocateBlock: /user/hadoop/profile4. blk_28644972299224370_1019

2010-08-24 11:14:17,709 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addStoredBlock: blockMap updated: 192.168.1.8:50010 is added to
blk_28644972299224370_1019 size 497
2010-08-24 11:14:17,713 INFO org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: file /user/hadoop/profile4 is closed by
DFSClient_-2054565417
2010-08-24 11:17:31,187 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
192.168.1.8

2010-08-24 11:17:31,187 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
19 Total time for transactions(ms): 4Number of transactions batched in
Syncs: 0 Number of syncs: 14 SyncTimes(ms): 183 174

2010-08-24 11:17:31,281 FATAL
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
storage directories are inaccessible.

2010-08-24 11:17:31,283 INFO
org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at hadoop.cf.net/127.0.0.1

************************************************************/

^C
*hadoop 11:17:51 ~/.hadoop $* ls /home/hadoop/.dfs/

data  name
*hadoop 11:21:14 ~/.hadoop $* jps
8259 Jps

4954 SecondaryNameNode

5158 TaskTracker

5650 JobTracker

4768 DataNode
*hadoop 11:36:03 ~/.hadoop $* mkdir ~/.dfs/testdir
*hadoop 11:36:04 ~/.hadoop $ *stop-all.sh
stopping jobtracker

localhost: stopping tasktracker

no namenode to stop

localhost: stopping datanode

localhost: stopping secondarynamenode
*hadoop 11:37:01 ~/.hadoop $ *start-all.sh
starting namenode, logging to
/home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-namenode-hadoop.out


localhost: starting datanode, logging to
/home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-datanode-hadoop.out

localhost: starting secondarynamenode, logging to
/home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-secondarynamenode-hadoop.out

starting jobtracker, logging to
/home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-jobtracker-hadoop.out


localhost: starting tasktracker, logging to
/home/hadoop/.hadoop/hadoop-0.20.2/bin/../logs/hadoop-hadoop-tasktracker-hadoop.out
*hadoop 11:39:30 ~/.hadoop $* hadoop dfs -ls
Found 6 items

-rw-r--r--   1 hadoop supergroup     411536 2010-08-18 15:50
/user/hadoop/data
drwxr-xr-x   - hadoop supergroup          0 2010-08-18 16:02
/user/hadoop/dataout
-rw-r--r--   1 hadoop supergroup        497 2010-08-24 11:14
/user/hadoop/profile1
-rw-r--r--   1 hadoop supergroup        497 2010-08-24 11:14
/user/hadoop/profile2
-rw-r--r--   1 hadoop supergroup        497 2010-08-24 11:14
/user/hadoop/profile3
-rw-r--r--   1 hadoop supergroup        497 2010-08-24 11:14
/user/hadoop/profile4



On Tue, Aug 24, 2010 at 10:49 AM, Sudhir Vallamkondu <
Sudhir.Vallamkondu@icrossing.com> wrote:
> Looking at the codebase it seems to suggest that it ignores a editlog
> storage directory if it encounters an error
>
>
http://www.google.com/codesearch/p?hl=en#GLh8vwsjDqs/trunk/src/hdfs/org/apac
>
he/hadoop/hdfs/server/namenode/FSEditLog.java&q=namenode%20editlog&sa=N&cd=2
> 0&ct=rc
>
> Check lines:
> Code in line 334
> comment: 387 - 390
> comment: 411 - 414
> Comment: 433 - 436
>
> The processIOError method is called throughout the code if it encounters
an
> IOException.
>
> A fatal error is only thrown if none of the storage directories is
> accessible. Lines 394, 420
>
> - Sudhir
>
>
>
> On Aug/23/ 2:21 PM, "common-user-digest-help@hadoop.apache.org"
> <co...@hadoop.apache.org> wrote:
>
>> From: Michael Segel <mi...@hotmail.com>
>> Date: Mon, 23 Aug 2010 14:05:05 -0500
>> To: <co...@hadoop.apache.org>
>> Subject: RE: what will happen if a backup name node folder becomes
>> unaccessible?
>>
>>
>> Ok...
>>
>> Now you have me confused.
>> Everything we've seen says that writing to both a local disk and to an
NFS
>> mounted disk would be the best way to prevent a problem.
>>
>> Now you and Harsh J say that this could actually be problematic.
>>
>> Which is it?
>> Is this now a defect that should be addressed, or should we just not use
an
>> NFS mounted drive?
>>
>> Thx
>>
>> -Mike
>>
>>
>>> Date: Mon, 23 Aug 2010 11:42:59 -0700
>>> From: licht_jiang@yahoo.com
>>> Subject: Re: what will happen if a backup name node folder becomes
>>> unaccessible?
>>> To: common-user@hadoop.apache.org
>>>
>>> This makes a good argument. Actually, after seeing the previous reply, I
>>> kindof convinced that I should go back to "sync" the meta data to a
backup
>>> location instead of using this feature, which as David mentioned,
introduced
>>> a 2nd single point of failure to hadoop, which degrades the availability
of
>>> hadoop. BTW, we are using cloudera package hadoop-0.20.2+228. Can
someone
>>> confirm whether a name node will shut down given that a backup folder
listed
>>> in "dfs.name.dir" becomes unavailable in this version?
>>>
>>> Thanks,
>>>
>>> Michael
>>>
>>> --- On Sun, 8/22/10, David B. Ritch <da...@gmail.com> wrote:
>>>
>>> From: David B. Ritch <da...@gmail.com>
>>> Subject: Re: what will happen if a backup name node folder becomes
>>> unaccessible?
>>> To: common-user@hadoop.apache.org
>>> Date: Sunday, August 22, 2010, 11:34 PM
>>>
>>>  Which version of Hadoop was this?  The folks at Cloudera have assured
>>> me that the namenode in CDH2 will continue as long as one of the
>>> directories is still writable.
>>>
>>> It *does* seem a bit of a waste if an availability feature - the ability
>>> to write to multiple directories - actually reduces availability by
>>> providing an additional single point of failure.
>>>
>>> Thanks!
>>>
>>> dbr
>>>
>>> On 8/20/2010 5:27 PM, Harsh J wrote:
>>>> Whee, lets try it out:
>>>>
>>>> Start with both paths available. ... Starts fine.
>>>> Store some files. ... Works.
>>>> rm -r the second path. ... Ouch.
>>>> Store some more files. ... Still Works. [Cuz the SNN hasn't sent us
>>>> stuff back yet]
>>>> Wait for checkpoint to hit.
>>>> And ...
>>>> Boom!
>>>>
>>>> 2010-08-21 02:42:00,385 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log
>>>> from 127.0.0.1
>>>> 2010-08-21 02:42:00,385 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of
>>>> transactions: 37 Total time for transactions(ms): 6Number of
>>>> transactions batched in Syncs: 0 Number of syncs: 26 SyncTimes(ms):
>>>> 307 277
>>>> 2010-08-21 02:42:00,439 FATAL
>>>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Fatal Error : All
>>>> storage directories are inaccessible.
>>>> 2010-08-21 02:42:00,440 INFO
>>>> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>>>> /************************************************************
>>>> SHUTDOWN_MSG: Shutting down NameNode at localhost/127.0.0.1
>>>> ************************************************************/
>>>>
>>>> So yes, as Edward says - never let this happen!
>>>>
>>>> On Sat, Aug 21, 2010 at 2:26 AM, jiang licht <li...@yahoo.com>
wrote:
>>>>> Using nfs folder to back up dfs meta information as follows,
>>>>>
>>>>> <property>
>>>>>         <name>dfs.name.dir</name>
>>>>>         <value>/hadoop/dfs/name,/hadoop-backup/dfs/name</value>
>>>>>     </property>
>>>>>
>>>>> where /hadoop-backup is on a backup machine and mounted on the master
node.
>>>>>
>>>>> I have a question: if somehow, the backup folder becomes unavailable,
will
>>>>> it freeze master node? That is, will write operation simply hang up on
this
>>>>> condition on the master node? Or will master node log the problem and
>>>>> continues to work?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Michael
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>>
>>>
>>>
>
>
> iCrossing Privileged and Confidential Information
> This email message is for the sole use of the intended recipient(s) and
may contain confidential and privileged information of iCrossing. Any
unauthorized review, use, disclosure or distribution is prohibited. If you
are not the intended recipient, please contact the sender by reply email and
destroy all copies of the original message.
>
>
>

Above steps were done performed using Apache Hadoop 0.20.2. Not cloudera's
version of it, if that helps.

-- 
Harsh J
www.harshj.com