You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2013/02/13 18:20:13 UTC

[jira] [Commented] (PIG-3148) OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag.

    [ https://issues.apache.org/jira/browse/PIG-3148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13577743#comment-13577743 ] 

Koji Noguchi commented on PIG-3148:
-----------------------------------

Rohini asked me to clarify why I'm adding extra param instead of simply calling gc() at the top of handleNotification(). 

Reason I added extra param is,
* When I tried just adding gc() at the top, suddenly I saw all of my mappers stuck, spending 99% of cputime on gc.  I then learned that handleNotification is called much more frequently than I first anticipated when the application is using more than the threshold and have nothing much to spill.  That convinced me to add more condition to reduce the gc() calls.  
* Motivation of my patch here is to avoid OutOfMemory when application is holding a reference to a large stale bag while spilling unnecessarily. For that, bag being spilled has to be proportional to the heap size of the application to cause OOM.

                
> OutOfMemory exception while spilling stale DefaultDataBag. Extra option to gc() before spilling large bag.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3148
>                 URL: https://issues.apache.org/jira/browse/PIG-3148
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Koji Noguchi
>            Assignee: Koji Noguchi
>         Attachments: pig-3148-v01.patch
>
>
> Our user reported that one of their jobs in pig 0.10 occasionally failed with 
> 'Error: GC overhead limit exceeded' or 'Error: Java heap space', but rerunning it sometimes finishes successfully.
> For 1G heap reducer, heap dump showed it contained two huge DefaultDataBag with 300-400MBytes each when failing with OOM.
> Jstack at the time of OOM always showed that spill was running.
> {noformat}
> "Low Memory Detector" daemon prio=10 tid=0xb9c11800 nid=0xa52 runnable [0xb9afc000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:260)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> 	- locked <0xe57c6390> (a java.io.BufferedOutputStream)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	- locked <0xe57c60b8> (a java.io.DataOutputStream)
> 	at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
> 	at org.apache.pig.data.utils.SedesHelper.writeBytes(SedesHelper.java:46)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:537)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:435)
> 	at org.apache.pig.data.utils.SedesHelper.writeGenericTuple(SedesHelper.java:135)
> 	at org.apache.pig.data.BinInterSedes.writeTuple(BinInterSedes.java:613)
> 	at org.apache.pig.data.BinInterSedes.writeDatum(BinInterSedes.java:443)
> 	at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:106)
> 	- locked <0xceb16190> (a java.util.ArrayList)
> 	at org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:243)
> 	- locked <0xbeb86318> (a java.util.LinkedList)
> 	at sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> 	at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> 	at sun.management.MemoryPoolImpl$PoolSensor.triggerAction(MemoryPoolImpl.java:272)
> 	at sun.management.Sensor.trigger(Sensor.java:120)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira