You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/17 23:22:04 UTC

[jira] [Resolved] (MAPREDUCE-286) Optimize the last merge of the map output files

     [ https://issues.apache.org/jira/browse/MAPREDUCE-286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-286.
----------------------------------------

    Resolution: Fixed

Stale.

> Optimize the last merge of the map output files
> -----------------------------------------------
>
>                 Key: MAPREDUCE-286
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-286
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Devaraj Das
>
> In ReduceTask, today we do merges of io.sort.factor number of files everytime we merge and write the result back to disk. The last merge can probably be better. For example, if there are io.sort.factor + 10 files at the end, today we will merge 100 files into one and then return an iterator over the remaining 11 files. This can be improved (in terms of disk I/O) to merge the smallest 11 files and then return an iterator over the 100 remaining files. Other option is to not do any single level merge when we have io.sort.factor + n files remaining (where n << io.sort.factor) but just return the iterator directly. Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)