You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Stefan Groschupf <sg...@media-style.com> on 2006/03/17 23:05:44 UTC

mapoutputs from killed job are reduced.

Hi,

I note a strange behavior but I can not find any code that is  
responsible for that.
I had very many killed jobs that successful map but never reduced  
completely.
So I had many files like /hadoop/mapred/local/task_m_5zta35.out. Now  
I note when running a reduce task it happen that all boxes try to  
download such map files from one specific box.
First of all it make no sense to process old map data in a new job in  
combination with the new mapoutput data.
Second it happen that the one box is that much under load the the  
other reduces task crash with a connection timeout.

Wouldn't be a good idea to clean up /hadoop/mapred/local/ until  
tasktracker start up.
Also I do not clearly understand how the map ouput files are  
collected but this should be may be improved in a way that it can not  
happen that a newer job tries to process also the map output of a  
older job.

Any hints where to search for the problem?

Thanks.
Stefan


---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com