You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by cu...@apache.org on 2006/10/04 19:25:11 UTC

svn commit: r452945 - in /lucene/hadoop/trunk: CHANGES.txt src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java

Author: cutting
Date: Wed Oct  4 10:25:10 2006
New Revision: 452945

URL: http://svn.apache.org/viewvc?view=rev&rev=452945
Log:
HADOOP-343.  Fix mapred copying so that a failed tasktracker does not slow other copies.  Contributed by Sameer.

Modified:
    lucene/hadoop/trunk/CHANGES.txt
    lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java

Modified: lucene/hadoop/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/CHANGES.txt?view=diff&rev=452945&r1=452944&r2=452945
==============================================================================
--- lucene/hadoop/trunk/CHANGES.txt (original)
+++ lucene/hadoop/trunk/CHANGES.txt Wed Oct  4 10:25:10 2006
@@ -132,6 +132,9 @@
     permits, e.g., TextInputFormat to again operate on non-UTF-8 data.
     (Hairong and Mahadev via cutting)
 
+32. HADOOP-343.  Fix mapred copying so that a failed tasktracker
+    doesn't cause other copies to slow.  (Sameer Paranjpye via cutting)
+
 
 Release 0.6.2 - 2006-09-18
 

Modified: lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java
URL: http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java?view=diff&rev=452945&r1=452944&r2=452945
==============================================================================
--- lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java (original)
+++ lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/ReduceTaskRunner.java Wed Oct  4 10:25:10 2006
@@ -455,6 +455,20 @@
             LOG.warn(reduceTask.getTaskId() + " adding host " +
                      cr.getHost() + " to penalty box, next contact in " +
                      ((nextContact-currentTime)/1000) + " seconds");
+
+            // other outputs from the failed host may be present in the
+            // knownOutputs cache, purge them. This is important in case
+            // the failure is due to a lost tasktracker (causes many
+            // unnecessary backoffs). If not, we only take a small hit
+            // polling the jobtracker a few more times
+            ListIterator locIt = knownOutputs.listIterator();
+            while (locIt.hasNext()) {
+              MapOutputLocation loc = (MapOutputLocation)locIt.next();
+              if (cr.getHost().equals(loc.getHost())) {
+                locIt.remove();
+                neededOutputs.add(new Integer(loc.getMapId()));
+              }
+            }
           }
           uniqueHosts.remove(cr.getHost());
           numInFlight--;