You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jeff Bean (JIRA)" <ji...@apache.org> on 2013/06/15 01:43:20 UTC

[jira] [Resolved] (MAPREDUCE-5323) Min Spills For Combine Ignored

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeff Bean resolved MAPREDUCE-5323.
----------------------------------

    Resolution: Not A Problem

Misunderstood config mapreduce.map.combine.minspills as the number of spills to require before the first combine. Instead, it's the number of spills required for a second and subsequent combines on merge.
                
> Min Spills For Combine Ignored
> ------------------------------
>
>                 Key: MAPREDUCE-5323
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5323
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: Jeff Bean
>            Priority: Minor
>
> We've observed for some time that combiners always run when specified. However there is a config called mapreduce.map.combine.minspills which sort of implies that the developer or administrator ought to be able to control when combiners are invoked.
> I spelunked into the code and found this gem in MapTask.java:
> if (combinerRunner == null || numSpills < minSpillsForCombine) { Merger.writeFile(kvIter, writer, reporter, job); } else { combineCollector.setWriter(writer); combinerRunner.combine(kvIter, combineCollector); }
> That looks way buggy to me. If ( A || B ) is made false by A then B is never executed. I spelunked around the code some more and it looks like combinerRunner is never null except on reflection failure. So it looks like the intention is for minSpillsForCombine to be respected, but due to this logic error it's totally ignored.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira