You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Harsh J (Resolved) (JIRA)" <ji...@apache.org> on 2012/01/16 11:38:40 UTC

[jira] [Resolved] (MAPREDUCE-201) Map directly to HDFS or reduce()

     [ https://issues.apache.org/jira/browse/MAPREDUCE-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J resolved MAPREDUCE-201.
-------------------------------

    Resolution: Not A Problem

This should've been closed out before but was not. Closing out now.
                
> Map directly to HDFS or reduce()
> --------------------------------
>
>                 Key: MAPREDUCE-201
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-201
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>         Environment: all
>            Reporter: Doug Judd
>
> For situations where you know that the output of the Map phase is already aggregated (e.g. the input is the output of another Map-reduce job and map() preserves the aggregation), then there should be a way to tell the framework that this is the case so that it can pipe the map() output directly to the reduce() function, or HDFS in the case of IdentityReducer.  This will probably require forcing the number of map tasks to equal the number of reduce tasks.  This will save the disk I/O required to generate intermediate files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira