You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Brian Bockelman <bb...@cse.unl.edu> on 2008/12/06 01:59:49 UTC

File loss at Nebraska

We are continuing to see a small, consistent amount of block  
corruption leading to file loss.  We have been upgrading our cluster  
lately, which means we've been doing a rolling de-commissioning of our  
nodes (and then adding them back with more disks!).

Previously, when I've had time to investigate this very deeply, I've  
found issues like these:

https://issues.apache.org/jira/browse/HADOOP-4692
https://issues.apache.org/jira/browse/HADOOP-4543

I suspect that this causes some or all of our problems.

I also saw that one of our nodes was at 100.2% full; I think this is  
due to the same issue; Hadoop's actual usage of the file system is  
greater than the max capacity because some of the blocks were truncated.

I'd have to check with our sysadmins, but I think we've lost about  
200-300 files during the upgrade process.  Right now, there are about  
900 chronically under-replicated blocks; in the past, that's meant the  
only replica is actually corrupt, and Hadoop is trying to relentlessly  
retransfer it, failing to, but not realizing the source is corrupt.   
To some extent, this whole issue is caused because we only have enough  
space for 2 replicas; I'd imagine that at 3 replicas, the issue would  
be much harder to trigger.

Any suggestions?  For us, file loss is something we can deal with (not  
necessarily fun to deal with, of course), but it might not be the case  
in the future.

Brian

Re: Reducing Hadoop Logs

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.

Arv Mistry wrote:
>  
> I'm using hadoop 0.17.0. Unfortunately I cant upgrade to 0.19.0 just
> yet.
>
> I'm trying to control the amount of extraneous files. I noticed there
> are the following log files produced by hadoop;
>
> 	On Slave
> 		- userlogs (for each map/reduce job)
> 			- stderr
> 			- stdout
> 			- syslog
> 		- datanode .log file
> 		- datanode .out file
> 		- tasktracker .log file
> 		- tasktracker .out file
>
> 	On Master
> 		- jobtracker .log file
> 		- jobtracker .out file
> 		- namenode   .log file
> 		- namenode   .out file
> 		- secondarynamenode .log file
> 		- secondarynamenode .out file		
> 		- job .xml file
> 		- history
> 			- xml file for job
>
> 	
> Does any body know of how to configure hadoop so I don't have to delete
> these files manually? Or just so that they don't get created at all.
>
> For the history files, I set hadoop.job.history.user.location to none in
> the hadoop-site.xml file but I still get the history files created.
>   
Setting hadoop.job.history.user.location to "none", makes only history 
location specified for user. JT still has history location. History will 
be cleanup after a month.

Userlogs will be cleaned up after "mapred.userlog.retain.hours", by 
default , 24hrs.

Thanks
Amareshwari
> Also I set in the log4j.properties the hadoop.root.logger=WARN but I
> still see INFO messages in datanode,jobtracker etc logs
>
> Thanks, in advance
>
> Cheers Arv
>

Reducing Hadoop Logs

Posted by Arv Mistry <ar...@kindsight.net>.

 
I'm using hadoop 0.17.0. Unfortunately I cant upgrade to 0.19.0 just
yet.

I'm trying to control the amount of extraneous files. I noticed there
are the following log files produced by hadoop;

	On Slave
		- userlogs (for each map/reduce job)
			- stderr
			- stdout
			- syslog
		- datanode .log file
		- datanode .out file
		- tasktracker .log file
		- tasktracker .out file

	On Master
		- jobtracker .log file
		- jobtracker .out file
		- namenode   .log file
		- namenode   .out file
		- secondarynamenode .log file
		- secondarynamenode .out file		
		- job .xml file
		- history
			- xml file for job

	
Does any body know of how to configure hadoop so I don't have to delete
these files manually? Or just so that they don't get created at all.

For the history files, I set hadoop.job.history.user.location to none in
the hadoop-site.xml file but I still get the history files created.
Also I set in the log4j.properties the hadoop.root.logger=WARN but I
still see INFO messages in datanode,jobtracker etc logs

Thanks, in advance

Cheers Arv

Re: File loss at Nebraska

Posted by Steve Loughran <st...@apache.org>.

Doug Cutting wrote:
> Steve Loughran wrote:
>> Alternatively, "why we should be exploring the configuration space 
>> more widely"
> 
> Are you volunteering?
> 
> Doug

Not yet. I think we have a fair bit of what is needed, and it would make 
for some interesting uses of the Yahoo!-HP-Intel cirrus testbed.

There was some good work done at U Maryland on Skoll; a tool for 
efficiently exploring the configuration space of an application so as to 
find defects:

http://www.cs.umd.edu/~aporter/Docs/skollJournal.pdf
http://www.cs.umd.edu/~aporter/MemonPorterSkoll.ppt

What you need is something like that applied to the hadoop config space 
and a set of tests that will shop up problems in a timely manner.

-steve

Re: File loss at Nebraska

Posted by Doug Cutting <cu...@apache.org>.

Steve Loughran wrote:
> Alternatively, "why we should be exploring the configuration space more 
> widely"

Are you volunteering?

Doug

Re: File loss at Nebraska

Posted by Brian Bockelman <bb...@cse.unl.edu>.

On Dec 9, 2008, at 5:31 PM, Raghu Angadi wrote:

> Brian Bockelman wrote:
>> On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:
>>> Also it might be useful to strongly word hadoop-default.conf as many
>>> people might not know a downside exists for using 2 rather then 3 as
>>> the replication factor. Before reading this thread I would have
>>> thought 2 to be sufficient.
>> I think 2 should be sufficient, but running with 2 replicas instead  
>> of 3 exposes some namenode bugs which are harder to trigger.
>
> Whether 2 is sufficient or not, I completely agree with later part.  
> We should treat this as what I think it fundamentally is : fixing  
> Namenode.
>
> I guess lately some of these bugs either got more likely or some  
> similar bugs crept in.
>
> Sticking with 3 is a very good advise for maximizing reliability..  
> but from a opportunistic developer point of view a big cluster  
> running with replication of 2 is great test case :-).. over all I  
> think is a good thing for Hadoop.
>

Well, we're most likely here to stay: this is the secondary site for  
most of these files.  As long as we can indeed identify lost files,  
it's fairly automated to retransfer.  The amount of unique files on  
this site is around .1% or less of total, and we plan on setting only  
those to 3 replicas.

So, we'll be happy to provide whatever logs or debugging info is  
needed, as long as someone cares to keep on fixing bugs.

Brian

Re: End of block/file for Map

Posted by Owen O'Malley <om...@apache.org>.

On Dec 9, 2008, at 7:34 PM, Aaron Kimball wrote:

> That's true, but you should be aware that you no longer have an
> OutputCollector available in the close() method.

True, but in practice you can keep a handle to it from the map method  
and it will work perfectly. This is required for both streaming and  
pipes to work. (Both of them do their processing asynchronously, so  
the close needs to wait for the subprocess to finish. Because of this,  
the contract with the Mapper and Reducer are very loose and the  
collect method may be called in between calls to the map method.)  In  
the context object api (hadoop-1230), the api will include the context  
object in cleanup, to make it clear that cleanup can also write records.

-- Owen

Re: End of block/file for Map

Posted by Aaron Kimball <aa...@cloudera.com>.

That's true, but you should be aware that you no longer have an
OutputCollector available in the close() method.  So if you are planning to
have each mapper emit some sort of "end" record along to the reducer, you
can't do so there. In general, there is not a good solution to that; you
should rethink your algorithm if possible so that you don't need to do that.

(I am not sure what happens if you memoize the OutputCollector you got as a
parameter to your map() method and try to use it. Probably nothing good.)

- Aaron

On Tue, Dec 9, 2008 at 11:42 AM, Owen O'Malley <om...@apache.org> wrote:

>
> On Dec 9, 2008, at 11:35 AM, Songting Chen wrote:
>
>  Is there a way for the Map process to know it's the end of records?
>>
>> I need to flush some additional data at the end of the Map process, but
>> wondering where I should put that code.
>>
>
> The close() method is called at the end of the map.
>
> -- Owen
>

Re: End of block/file for Map

Posted by Owen O'Malley <om...@apache.org>.

On Dec 9, 2008, at 11:35 AM, Songting Chen wrote:

> Is there a way for the Map process to know it's the end of records?
>
> I need to flush some additional data at the end of the Map process,  
> but wondering where I should put that code.

The close() method is called at the end of the map.

-- Owen

End of block/file for Map

Posted by Songting Chen <ke...@yahoo.com>.

Is there a way for the Map process to know it's the end of records?

I need to flush some additional data at the end of the Map process, but wondering where I should put that code.

Thanks,
-Songting

Re: File loss at Nebraska

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Brian Bockelman wrote:
> 
> On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:
> 
>> Also it might be useful to strongly word hadoop-default.conf as many
>> people might not know a downside exists for using 2 rather then 3 as
>> the replication factor. Before reading this thread I would have
>> thought 2 to be sufficient.
> 
> I think 2 should be sufficient, but running with 2 replicas instead of 3 
> exposes some namenode bugs which are harder to trigger.

Whether 2 is sufficient or not, I completely agree with later part. We 
should treat this as what I think it fundamentally is : fixing Namenode.

I guess lately some of these bugs either got more likely or some similar 
bugs crept in.

Sticking with 3 is a very good advise for maximizing reliability.. but 
from a opportunistic developer point of view a big cluster running with 
replication of 2 is great test case :-).. over all I think is a good 
thing for Hadoop.

Raghu.

Re: File loss at Nebraska

Posted by Brian Bockelman <bb...@cse.unl.edu>.

On Dec 9, 2008, at 4:58 PM, Edward Capriolo wrote:

> Also it might be useful to strongly word hadoop-default.conf as many
> people might not know a downside exists for using 2 rather then 3 as
> the replication factor. Before reading this thread I would have
> thought 2 to be sufficient.

I think 2 should be sufficient, but running with 2 replicas instead of  
3 exposes some namenode bugs which are harder to trigger.

For example, let's say your system has 100 nodes and 1M blocks.  Let's  
say a namenode bug affects replica of block X on node Y and the  
namenode doesn't realize it.  Then, there is a 1% chance that when  
another node goes down, the block becomes missing.  If this bug is  
cumulative or affects many blocks (I suspect about 500-1000 blocks are  
problematic out of 1M), you're almost guaranteed to lose data whenever  
a single node goes down.

On the other hand, if you have 1000 block replica problems on the same  
cluster with 3 replicas, in order to lose files, two of the block  
replica problems must be the same block and the node which goes down  
must hold the third block.  The probability of this happening is  
(1e-6) * (1e-6) * (1/100) = 1e-14, or 0.0000000000001%.

So, even assuming that I did all my probability calculations wrong, a  
site running with 2 replicas is more than 10 orders of magnitude more  
likely to discover inconsistencies or other bugs in the name node than  
a site with 3 replicas.

Accordingly, these sites are the "canaries in the coal mine" to  
discover NameNode bugs.

Brian

Re: File loss at Nebraska

Posted by Edward Capriolo <ed...@gmail.com>.

Also it might be useful to strongly word hadoop-default.conf as many
people might not know a downside exists for using 2 rather then 3 as
the replication factor. Before reading this thread I would have
thought 2 to be sufficient.

Re: File loss at Nebraska

Posted by Steve Loughran <st...@apache.org>.

Doug Cutting wrote:
> Brian Bockelman wrote:
>> To some extent, this whole issue is caused because we only have enough 
>> space for 2 replicas; I'd imagine that at 3 replicas, the issue would 
>> be much harder to trigger.
> 
> The unfortunate reality is that if you run a configuration that's 
> different than most you'll likely run into bugs that others do not.  (As 
> a side note, this is why we should try to minimize configuration 
> options, so that everyone is running much the same system.) 

Alternatively, "why we should be exploring the configuration space more 
widely"

Re: File loss at Nebraska

Posted by Doug Cutting <cu...@apache.org>.

Brian Bockelman wrote:
> To some extent, this whole issue is caused because we only have enough 
> space for 2 replicas; I'd imagine that at 3 replicas, the issue would be 
> much harder to trigger.

The unfortunate reality is that if you run a configuration that's 
different than most you'll likely run into bugs that others do not.  (As 
a side note, this is why we should try to minimize configuration 
options, so that everyone is running much the same system.)  Hopefully 
squashing the two bugs you've filed will substantially help things.

Doug