You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Stefan Groschupf <sg...@media-style.com> on 2006/03/17 23:05:44 UTC
mapoutputs from killed job are reduced.
Hi,
I note a strange behavior but I can not find any code that is
responsible for that.
I had very many killed jobs that successful map but never reduced
completely.
So I had many files like /hadoop/mapred/local/task_m_5zta35.out. Now
I note when running a reduce task it happen that all boxes try to
download such map files from one specific box.
First of all it make no sense to process old map data in a new job in
combination with the new mapoutput data.
Second it happen that the one box is that much under load the the
other reduces task crash with a connection timeout.
Wouldn't be a good idea to clean up /hadoop/mapred/local/ until
tasktracker start up.
Also I do not clearly understand how the map ouput files are
collected but this should be may be improved in a way that it can not
happen that a newer job tries to process also the map output of a
older job.
Any hints where to search for the problem?
Thanks.
Stefan
---------------------------------------------
blog: http://www.find23.org
company: http://www.media-style.com