You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2016/04/27 22:46:12 UTC

[jira] [Comment Edited] (TEZ-3195) TezMerger OOM: unreserve called while memory still held

    [ https://issues.apache.org/jira/browse/TEZ-3195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15260890#comment-15260890 ] 

Jonathan Eagles edited comment on TEZ-3195 at 4/27/16 8:45 PM:
---------------------------------------------------------------

I can shed a little more light on this. This is actually race condition with accounting. However, the GC shouldn't give up too easily to recovery memory. To address this issue, the job needs to give more new gen space so that it can more easily garbage collect the unreachable memory.
In this case the user needed to change new ratio from 8 to 4 for a 1GB heap size.
{noformat}
-XX:NewRatio=4
{noformat}

I still think we should consider pursuing this jira since we do unreserve the memory while still holding references.


was (Author: jeagles):
I can shed a little more light on this. This is actually race condition with accounting. However, the GC shouldn't give up too easily to recovery memory. To address this issue, the job needs to give more new gen space so that it can more easily garbage collect the unreachable memory.
In this case the user needed to change new ratio from 8 to 4 for a 1GB heap size.
{noformat}
-XX:NewRatio=4
{noformat}

> TezMerger OOM: unreserve called while memory still held
> -------------------------------------------------------
>
>                 Key: TEZ-3195
>                 URL: https://issues.apache.org/jira/browse/TEZ-3195
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3195.1-branch-0.7.patch, TEZ-3195.1.patch, TEZ-3195.2-branch-0.7.patch, TEZ-3195.2.patch
>
>
> When the reader is closed in MergeQueue#adjustPriorityQueue, the byte buffer is still held in several places in the code while unreserve is called. In the case below, the Fetcher was trying to fetch a nearly 100MB map output which exposed this race condition.
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
> 	at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.<init>(MapOutput.java:75)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput.createMemoryMapOutput(MapOutput.java:124)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.unconditionalReserve(MergeManager.java:437)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.reserve(MergeManager.java:427)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyMapOutput(FetcherOrderedGrouped.java:481)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.copyFromHost(FetcherOrderedGrouped.java:286)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.fetchNext(FetcherOrderedGrouped.java:176)
> 	at org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetcherOrderedGrouped.run(FetcherOrderedGrouped.java:191)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)