You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "jaehoon ko (JIRA)" <ji...@apache.org> on 2014/06/27 02:45:26 UTC
[jira] [Created] (MAPREDUCE-5946) Last spill of map task is not
necessary for final merge
jaehoon ko created MAPREDUCE-5946:
-------------------------------------
Summary: Last spill of map task is not necessary for final merge
Key: MAPREDUCE-5946
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5946
Project: Hadoop Map/Reduce
Issue Type: Improvement
Components: performance, security
Affects Versions: 2.4.0
Reporter: jaehoon ko
Assignee: jaehoon ko
In map task, merge starts only after the last spill is completely written to disk. This is not necessary nor efficient because the last spill should to be reloaded soon for merge, probably immediately because spills are merged in the order of their sizes and the last spill is likely smallest. OS page cache is not the answer due to its opportunistic nature.
I'm starting to work on it. Please give me your thoughts.
--
This message was sent by Atlassian JIRA
(v6.2#6252)