You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by om...@apache.org on 2008/11/21 01:10:48 UTC

svn commit: r719431 - in /hadoop/core/trunk: CHANGES.txt src/mapred/org/apache/hadoop/mapred/JobConf.java

Author: omalley
Date: Thu Nov 20 16:10:47 2008
New Revision: 719431

URL: http://svn.apache.org/viewvc?rev=719431&view=rev
Log:
HADOOP-4668. Improve documentation for setCombinerClass to clarify the
restrictions on combiners. (omalley)

Modified:
    hadoop/core/trunk/CHANGES.txt
    hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobConf.java

Modified: hadoop/core/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/CHANGES.txt?rev=719431&r1=719430&r2=719431&view=diff
==============================================================================
--- hadoop/core/trunk/CHANGES.txt (original)
+++ hadoop/core/trunk/CHANGES.txt Thu Nov 20 16:10:47 2008
@@ -121,6 +121,9 @@
     it down by monitoring for cumulative memory usage across tasks.
     (Vinod Kumar Vavilapalli via yhemanth)
 
+    HADOOP-4668. Improve documentation for setCombinerClass to clarify the
+    restrictions on combiners. (omalley)
+
   OPTIMIZATIONS
 
     HADOOP-3293. Fixes FileInputFormat to do provide locations for splits

Modified: hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobConf.java
URL: http://svn.apache.org/viewvc/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobConf.java?rev=719431&r1=719430&r2=719431&view=diff
==============================================================================
--- hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobConf.java (original)
+++ hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/JobConf.java Thu Nov 20 16:10:47 2008
@@ -775,11 +775,20 @@
    * Set the user-defined <i>combiner</i> class used to combine map-outputs 
    * before being sent to the reducers. 
    * 
-   * <p>The combiner is a task-level aggregation operation which, in some cases,
-   * helps to cut down the amount of data transferred from the {@link Mapper} to
-   * the {@link Reducer}, leading to better performance.</p>
-   *  
-   * <p>Typically the combiner is same as the the <code>Reducer</code> for the  
+   * <p>The combiner is an application-specified aggregation operation, which
+   * can help cut down the amount of data transferred between the 
+   * {@link Mapper} and the {@link Reducer}, leading to better performance.</p>
+   * 
+   * <p>The framework may invoke the combiner 0, 1, or multiple times, in both
+   * the mapper and reducer tasks. In general, the combiner is called as the
+   * sort/merge result is written to disk. The combiner must:
+   * <ul>
+   *   <li> be side-effect free</li>
+   *   <li> have the same input and output key types and the same input and 
+   *        output value types</li>
+   * </ul></p>
+   * 
+   * <p>Typically the combiner is same as the <code>Reducer</code> for the  
    * job i.e. {@link #setReducerClass(Class)}.</p>
    * 
    * @param theClass the user-defined combiner class used to combine 
@@ -1155,7 +1164,7 @@
 
   /**
    * Set whether the system should collect profiler information for some of 
-   * the tasks in this job? The information is stored in the the user log 
+   * the tasks in this job? The information is stored in the user log 
    * directory.
    * @param newValue true means it should be gathered
    */