You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2011/08/11 20:12:31 UTC

[jira] [Moved] (MAPREDUCE-2812) Combiner that aggregates all the mappers from a machine

     [ https://issues.apache.org/jira/browse/MAPREDUCE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins moved HADOOP-5340 to MAPREDUCE-2812:
------------------------------------------------

    Affects Version/s:     (was: 0.19.1)
                  Key: MAPREDUCE-2812  (was: HADOOP-5340)
              Project: Hadoop Map/Reduce  (was: Hadoop Common)

> Combiner that aggregates all the mappers from a machine
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-2812
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2812
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Nathan Marz
>
> From what I can tell, the Combiner just aggregates data from a single map task. It would be useful, especially during map-only jobs, to have a combiner that aggregates data from all the map tasks on a given machine. My use case for this is to vertically partition a set of records which start out in the same files. By doing this in a map-only task, way too many files are created (About 50 files are created per input split). By pumping all the data through a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I would get 50*number of machines files rather than 50*number of input splits files for this use case.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira