You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by zsongbo <zs...@gmail.com> on 2009/04/21 19:00:15 UTC

Additional Combiner before reduce

Hi all,
I am running hadoop from SVN branch-0.19. I found the additional combiner
running appear again in reduce (before reduce) phase.

In 0.17.2->0.18.0 release nodes, I found:
HADOOP-3226 <https://issues.apache.org/jira/browse/HADOOP-3226> Changed
policy for running combiner. The combiner may be run multiple times as the
map's output is sorted and merged. Additionally, it may be run on the reduce
side as data is merged. The old semantics are available in Hadoop 0.18 if
the user calls:
job.setCombineOnceOnly(true);

And I had experience in 0.18.1 that this feature was there. (following is a
wordcount statistic)
*0.18.1:
*09/02/02 00:20:28 INFO mapred.JobClient:     Map output bytes=434077
09/02/02 00:20:28 INFO mapred.JobClient:     Map input bytes=299871

09/02/02 00:20:28 INFO mapred.JobClient:     Map input records=7363
09/02/02 00:20:28 INFO mapred.JobClient:     Map output records=39193

09/02/02 00:20:28 INFO mapred.JobClient:     Combine input records=50053
09/02/02 00:20:28 INFO mapred.JobClient:     Combine output records=19857

09/02/02 00:20:28 INFO mapred.JobClient:     Reduce input groups=8997
09/02/02 00:20:28 INFO mapred.JobClient:     Reduce input records=8997
09/02/02 00:20:28 INFO mapred.JobClient:     Reduce output records=8997


And in 0.18.2->0.19.0 release nodes, I found:
HADOOP-3595 <https://issues.apache.org/jira/browse/HADOOP-3595> Removed
deprecated methods for mapred.combine.once functionality

And this feature is removed: (Following is the wordcount statistic)
0.19.0:
09/02/01 13:44:51 INFO mapred.JobClient:     Map input bytes=299871
09/02/01 13:44:51 INFO mapred.JobClient:     Map output bytes=434077

09/02/01 13:44:51 INFO mapred.JobClient:     Map input records=7363
09/02/01 13:44:51 INFO mapred.JobClient:     Map output records=39193

09/02/01 13:44:51 INFO mapred.JobClient:     Combine input records=39193
09/02/01 13:44:51 INFO mapred.JobClient:     Combine output records=10860

09/02/01 13:44:51 INFO mapred.JobClient:     Reduce input groups=8997
09/02/01 13:44:51 INFO mapred.JobClient:     Reduce input records=10860
09/02/01 13:44:51 INFO mapred.JobClient:     Reduce output records=8997


But I found it appears in branch-0.19 now.


Schubert