You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/17 22:27:04 UTC

[jira] [Resolved] (MAPREDUCE-427) Earlier key-value buffer from MapTask.java is still referenced even though its not required anymore.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-427.
----------------------------------------

    Resolution: Fixed

Resolved!

> Earlier key-value buffer from MapTask.java is still referenced even though its not required anymore.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-427
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-427
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Amar Kamat
>            Priority: Minor
>
> Consider the following events for a map task
> Before HADOOP-1965:
> || Stage || Description || Buffers used || Memory used||
> |Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect) | io.sort.mb|
> |Stage-2 | KeyVal1 buffer is full and needs spilling so Sort-Spill starts | KeyVal1 (by Sort-Spill) | io.sort.mb|
> |Stage-3 | Sort-Spill finished | KeyVal1 (referenced by comparator ) | io.sort.mb|
> |Stage-4 |  MapOutputBuffer starts collecting | KeyVal2(by collect) + KeyVal1(by comparator) | 2*io.sort.mb|
> |Stage-5 | KeyVal2 buffer is full and needs spilling so Sort-Spill starts  | KeyVal2 (by Sort-Spill) | io.sort.mb|
> So for the time duration between Stage-4 and Stage-5 the memory used becomes {{2 * io.sort.mb}} which can be totally avoided by removing the comparator's reference to the earlier key-val buffer. So the maximum memory usage can be clamped to {{io.sort.mb}}
> After HADOOP-1965:
> || Stage || Description || Buffers used || Memory used ||
> |Stage-1 | MapOutputBuffer simply collects | KeyVal1 (by collect)| io.sort.mb/2|
> |Stage-2 | KeyVal1 buffer is full and needs spilling, so Sort-Spill starts in parallel | KeyVal1 (by Sort-Spill) | io.sort.mb/2|
> |Stage-3 |  MapOutputBuffer simply collects + Sort-Spill | KeyVal2(by collect) + KeyVal1(by Sort-Spill) | io.sort.mb|
> |Stage-4 | MapOutputBuffer simply collects + Sort-Spill finishes, Sort-Impl's are closed but the comparators still hold the reference to KeyVal1 buffer | KeyVal2 (by collect) + KeyVal1 (referred by comparator) | io.sort.mb|
> |Stage-5 | KeyVal2 buffer is full and needs spilling, so Sort-Spill starts in parallel | KeyVal2 (by Sort-Spill) | io.sort.mb/2|
> So for the time duration between Stage-4 and Stage-5 there is an unwanted reference to the keyval buffer which prevents the GC from claiming it. However the maximum memory usage will be {{io.sort.mb}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)