You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Tsuyoshi OZAWA (JIRA)" <ji...@apache.org> on 2013/07/26 07:43:49 UTC

[jira] [Commented] (MAPREDUCE-5153) Support for running combiners without reducers

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720426#comment-13720426 ] 

Tsuyoshi OZAWA commented on MAPREDUCE-5153:
-------------------------------------------

This discussion is "in-mapper combining vs disk-based combining" essentially. If user program including scalding and cascading does in-mapper combining and emits their values based on memory usage,  the similar effect can be gotten, although it's partially. In most case, this partial approach is enough to get more performance. What do you think?
                
> Support for running combiners without reducers
> ----------------------------------------------
>
>                 Key: MAPREDUCE-5153
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5153
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Radim Kolar
>
> scenario: Workflow mapper -> sort -> combiner -> hdfs
> No api change is need, if user set combiner class and reducers = 0 then run combiner and sent output to HDFS.
> Popular libraries such as scalding and cascading are offering this functionality, but they use caching entire mapper output in memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira