You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2011/08/11 20:12:31 UTC
[jira] [Moved] (MAPREDUCE-2812) Combiner that aggregates all the
mappers from a machine
[ https://issues.apache.org/jira/browse/MAPREDUCE-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eli Collins moved HADOOP-5340 to MAPREDUCE-2812:
------------------------------------------------
Affects Version/s: (was: 0.19.1)
Key: MAPREDUCE-2812 (was: HADOOP-5340)
Project: Hadoop Map/Reduce (was: Hadoop Common)
> Combiner that aggregates all the mappers from a machine
> -------------------------------------------------------
>
> Key: MAPREDUCE-2812
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2812
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Reporter: Nathan Marz
>
> From what I can tell, the Combiner just aggregates data from a single map task. It would be useful, especially during map-only jobs, to have a combiner that aggregates data from all the map tasks on a given machine. My use case for this is to vertically partition a set of records which start out in the same files. By doing this in a map-only task, way too many files are created (About 50 files are created per input split). By pumping all the data through a reducer, a lot of unnecessary overhead occurs. With the proposed feature, I would get 50*number of machines files rather than 50*number of input splits files for this use case.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira