You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2008/06/11 23:50:45 UTC

[jira] Updated: (PIG-235) Performance issues with memory spills.

     [ https://issues.apache.org/jira/browse/PIG-235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-235:
-------------------------------

    Attachment: gcoverhead.patch

Proposal for memory allocation overhead issue:
==============================================

Currently in org.apache.pig.impl.util.SpillableMemoryManger:

1) We use MemoryManagement interface to get notified when the "collection threshold" exceeds a limit (we set this to biggest_heap*0.5). With this in place we are still seeing "GC overhead limit" issues when trying large dataset operations. Observing some runs, it looks like the notification is not frequent enough and early enough to prevent memory issues possibly because this notification only occurs after GC.

2) We only attempt to free upto :
long toFree = info.getUsage().getUsed() - (long)(info.getUsage().getMax()*.5);
This is only the excess amount over the threshold which caused the notification and is not sufficient to not be called again soon.

3) While iterating over spillables, if current spillable's memory size is > gcActivationSize, we try to invoke System.gc

4) We *always* invoke System.gc() after iterating over spillables


Proposed changes are:
=====================
1) In addition to "collection threshold" of biggest_heap*0.5, a "usage threshold" of biggest_heap*0.7 will be used so we get notified early and often irrespective of whether garbage collection has occured.

2) We will attempt to free 
toFree = info.getUsage().getUsed() - threshold + (long)(threshold * 0.5); where threshold is (info.getUsage().getMax() * 0.7) if the handleNotification() method is handling a "usage threshold exceeded" notification and (info.getUsage().getMax() * 0.5) otherwise ("collection threshold exceeded" case)

3) While iterating over spillables, if the *accumulated memory freed thus far* is > gcActivationSize OR if we have freed sufficient memory (based on 2) above), then we set a flag to invoke System.gc when we exit the loop.  Acummulated Free memory is the memory freed across spills and between GC invocations.

4) We will invoke System.gc() only if the flag is set in 3) above

> Performance issues with memory spills.
> --------------------------------------
>
>                 Key: PIG-235
>                 URL: https://issues.apache.org/jira/browse/PIG-235
>             Project: Pig
>          Issue Type: Bug
>         Environment: Pig + Hadoop
>            Reporter: Amir Youssefi
>            Assignee: Utkarsh Srivastava
>            Priority: Critical
>         Attachments: gcoverhead.patch
>
>
> We have are hitting low performance issue with memory spills.
> A reducer gets stuck in following state for *tens* of hours while
> thousands of small files are spilled. This is besides skewed keys issue.
> Note that size of spills become smaller as times goes. We can use this
> and try to address the issue by spilling in larger chunks.
> I tried different sub-set of data. Then made it work with all kind of tricks/hacks but would like to have this working easily to say the least: 
> cogroup large_data_set by $0, small_date_set by $0
> small_date_set fits in memory.
> Log:
> 2008-05-06 23:24:06,014 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251352480(245461K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:06,734 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251401304(245509K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:07,455 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251254912(245366K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:08,175 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251281808(245392K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:08,895 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251309400(245419K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:09,615 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251358232(245467K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:10,336 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251267696(245378K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:11,056 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251307352(245417K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:11,776 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251335344(245444K) committed =
> 437256192(427008K) max = 477233152(466048K)
> (used column slowly increasing)
> Actual spill example:
>  8000 files 
> Sorting by time (new to old) I see small spills:
> -rw-------  1 amiry users     3675 May  6 23:44 pigbag635090.tmp
> -rw-------  1 amiry users        4 May  6 23:44 pigbag635091.tmp
> -rw-------  1 amiry users     3917 May  6 23:44 pigbag635086.tmp
> -rw-------  1 amiry users     3949 May  6 23:44 pigbag635088.tmp
> -rw-------  1 amiry users        4 May  6 23:44 pigbag635089.tmp
> -rw-------  1 amiry users     3969 May  6 23:44 pigbag635084.tmp
> -rw-------  1 amiry users     4073 May  6 23:44 pigbag635068.tmp
> -rw-------  1 amiry users    47634 May  6 23:44 pigbag635070.tmp
> -rw-------  1 amiry users     5101 May  6 23:44 pigbag635065.tmp
> -rw-------  1 amiry users     5722 May  6 23:44 pigbag635059.tmp
> -rw-------  1 amiry users     7570 May  6 23:44 pigbag635062.tmp
> -rw-------  1 amiry users     7802 May  6 23:44 pigbag635056.tmp
> -rw-------  1 amiry users     7514 May  6 23:44 pigbag635051.tmp
> -rw-------  1 amiry users     3929 May  6 23:44 pigbag635054.tmp
> -rw-------  1 amiry users     5342 May  6 23:44 pigbag635045.tmp
> -rw-------  1 amiry users     7361 May  6 23:44 pigbag635048.tmp
> -rw-------  1 amiry users     6663 May  6 23:44 pigbag635042.tmp
> -rw-------  1 amiry users     7511 May  6 23:44 pigbag635036.tmp
> -rw-------  1 amiry users     7520 May  6 23:44 pigbag635039.tmp
> -rw-------  1 amiry users     3873 May  6 23:44 pigbag635034.tmp
> -rw-------  1 amiry users     4029 May  6 23:44 pigbag635032.tmp
> -rw-------  1 amiry users     3823 May  6 23:44 pigbag635028.tmp
> -rw-------  1 amiry users     3726 May  6 23:44 pigbag635030.tmp
> -rw-------  1 amiry users     3934 May  6 23:44 pigbag635024.tmp
> Sorting by time (old to new) I see a few large spills then quickly (in
> less than 15min) come small ones:
> -rw-------  1 amiry users 45221453 May  6 21:23 pigbag59657.tmp
> -rw-------  1 amiry users 56161613 May  6 21:23 pigbag59658.tmp
> -rw-------  1 amiry users 70661942 May  6 21:23 pigbag59659.tmp
> -rw-------  1 amiry users 75308107 May  6 21:23 pigbag59660.tmp
> -rw-------  1 amiry users 76381091 May  6 21:24 pigbag59661.tmp
> -rw-------  1 amiry users 74691366 May  6 21:24 pigbag81914.tmp
> -rw-------  1 amiry users 73133098 May  6 21:24 pigbag103839.tmp
> -rw-------  1 amiry users 72750330 May  6 21:24 pigbag125123.tmp
> -rw-------  1 amiry users 71267460 May  6 21:25 pigbag146472.tmp
> -rw-------  1 amiry users 69638363 May  6 21:25 pigbag167358.tmp
> -rw-------  1 amiry users 68010250 May  6 21:25 pigbag187566.tmp
> -rw-------  1 amiry users 66312739 May  6 21:25 pigbag207447.tmp
> -rw-------  1 amiry users 64601422 May  6 21:26 pigbag226895.tmp
> -rw-------  1 amiry users 62997501 May  6 21:26 pigbag245690.tmp
> -rw-------  1 amiry users 62525926 May  6 21:26 pigbag264154.tmp
> -rw-------  1 amiry users 60940107 May  6 21:26 pigbag282367.tmp
> -rw-------  1 amiry users 59540198 May  6 21:26 pigbag300215.tmp
> -rw-------  1 amiry users 57918140 May  6 21:27 pigbag317750.tmp
> -rw-------  1 amiry users 57728845 May  6 21:27 pigbag334505.tmp
> -rw-------  1 amiry users 55427771 May  6 21:27 pigbag351436.tmp
> -rw-------  1 amiry users 55405942 May  6 21:27 pigbag367615.tmp
> -rw-------  1 amiry users 54600778 May  6 21:28 pigbag383872.tmp
> -rw-------  1 amiry users 52438311 May  6 21:28 pigbag399722.tmp
> -rw-------  1 amiry users 51250459 May  6 21:28 pigbag415094.tmp
> -rw-------  1 amiry users 50489324 May  6 21:28 pigbag430026.tmp
> -rw-------  1 amiry users 48311361 May  6 21:28 pigbag444835.tmp
> -rw-------  1 amiry users 47296555 May  6 21:28 pigbag458869.tmp
> -rw-------  1 amiry users 45703372 May  6 21:29 pigbag472771.tmp
> -rw-------  1 amiry users 46243949 May  6 21:29 pigbag486062.tmp
> -rw-------  1 amiry users 46195603 May  6 21:29 pigbag499549.tmp
> -rw-------  1 amiry users 43916731 May  6 21:29 pigbag513154.tmp
> -rw-------  1 amiry users 42970027 May  6 21:29 pigbag525921.tmp
> -rw-------  1 amiry users     5965 May  6 21:29 pigbag538555.tmp
> -rw-------  1 amiry users 46288099 May  6 21:30 pigbag538558.tmp
> -rw-------  1 amiry users     7735 May  6 21:30 pigbag539078.tmp
> -rw-------  1 amiry users     8058 May  6 21:30 pigbag539075.tmp
> -rw-------  1 amiry users 34034358 May  6 21:30 pigbag539079.tmp
> -rw-------  1 amiry users     8800 May  6 21:30 pigbag549021.tmp
> -rw-------  1 amiry users 45054789 May  6 21:31 pigbag549025.tmp
> -rw-------  1 amiry users     5750 May  6 21:31 pigbag549950.tmp
> -rw-------  1 amiry users     6794 May  6 21:31 pigbag549953.tmp
> -rw-------  1 amiry users 35112392 May  6 21:31 pigbag549956.tmp
> -rw-------  1 amiry users     7330 May  6 21:31 pigbag560146.tmp
> -rw-------  1 amiry users     7856 May  6 21:31 pigbag560143.tmp
> -rw-------  1 amiry users 44039882 May  6 21:32 pigbag560149.tmp
> -rw-------  1 amiry users     7508 May  6 21:32 pigbag561917.tmp
> -rw-------  1 amiry users     4567 May  6 21:32 pigbag561921.tmp
> -rw-------  1 amiry users 33031655 May  6 21:32 pigbag561924.tmp
> -rw-------  1 amiry users     7744 May  6 21:32 pigbag571510.tmp
> -rw-------  1 amiry users     7709 May  6 21:32 pigbag571507.tmp
> -rw-------  1 amiry users 42771860 May  6 21:33 pigbag571513.tmp
> -rw-------  1 amiry users 29912726 May  6 21:33 pigbag572784.tmp
> -rw-------  1 amiry users 42563305 May  6 21:34 pigbag581561.tmp
> -rw-------  1 amiry users 29961190 May  6 21:34 pigbag583024.tmp
> -rw-------  1 amiry users     7704 May  6 21:34 pigbag591783.tmp
> -rw-------  1 amiry users 41448656 May  6 21:35 pigbag591786.tmp
> -rw-------  1 amiry users     7679 May  6 21:35 pigbag592482.tmp
> -rw-------  1 amiry users     7328 May  6 21:35 pigbag592488.tmp
> -rw-------  1 amiry users     7809 May  6 21:35 pigbag592485.tmp
> -rw-------  1 amiry users   128909 May  6 21:35 pigbag592491.tmp
> -rw-------  1 amiry users 33069409 May  6 21:35 pigbag592530.tmp
> -rw-------  1 amiry users     7004 May  6 21:35 pigbag602114.tmp
> -rw-------  1 amiry users     4083 May  6 21:35 pigbag602121.tmp
> -rw-------  1 amiry users    11533 May  6 21:35 pigbag602117.tmp
> -rw-------  1 amiry users     7603 May  6 21:35 pigbag602124.tmp
> -rw-------  1 amiry users     7611 May  6 21:35 pigbag602130.tmp
> -rw-------  1 amiry users     7692 May  6 21:35 pigbag602127.tmp
> -rw-------  1 amiry users     7821 May  6 21:35 pigbag602136.tmp
> -rw-------  1 amiry users     7467 May  6 21:35 pigbag602133.tmp
> -rw-------  1 amiry users     1147 May  6 21:35 pigbag602139.tmp
> -rw-------  1 amiry users     8543 May  6 21:35 pigbag602144.tmp
> -rw-------  1 amiry users     8517 May  6 21:35 pigbag602141.tmp
> -rw-------  1 amiry users     3893 May  6 21:35 pigbag602150.tmp
> -rw-------  1 amiry users     6029 May  6 21:35 pigbag602147.tmp
> -rw-------  1 amiry users     7687 May  6 21:35 pigbag602152.tmp
> -rw-------  1 amiry users     7411 May  6 21:35 pigbag602158.tmp
> -rw-------  1 amiry users     6079 May  6 21:35 pigbag602155.tmp
> -rw-------  1 amiry users     6964 May  6 21:35 pigbag602161.tmp
> -rw-------  1 amiry users     5404 May  6 21:35 pigbag602168.tmp
> -rw-------  1 amiry users     7915 May  6 21:35 pigbag602164.tmp
> -rw-------  1 amiry users     4132 May  6 21:35 pigbag602174.tmp
> -rw-------  1 amiry users     7708 May  6 21:35 pigbag602171.tmp
> -rw-------  1 amiry users     7802 May  6 21:35 pigbag602176.tmp
> -rw-------  1 amiry users     6089 May  6 21:35 pigbag602182.tmp
> -rw-------  1 amiry users     8161 May  6 21:35 pigbag602179.tmp
> -rw-------  1 amiry users     7356 May  6 21:35 pigbag602188.tmp
> -rw-------  1 amiry users     8231 May  6 21:35 pigbag602185.tmp
> -rw-------  1 amiry users     8340 May  6 21:35 pigbag602191.tmp
> -rw-------  1 amiry users     3870 May  6 21:35 pigbag602198.tmp
> -rw-------  1 amiry users     7587 May  6 21:35 pigbag602195.tmp
> -rw-------  1 amiry users     5884 May  6 21:35 pigbag602200.tmp
> -rw-------  1 amiry users     7644 May  6 21:35 pigbag602206.tmp
> -rw-------  1 amiry users     7756 May  6 21:35 pigbag602203.tmp
> -rw-------  1 amiry users     8054 May  6 21:35 pigbag602209.tmp
> -rw-------  1 amiry users     6182 May  6 21:35 pigbag602215.tmp
> -rw-------  1 amiry users     7740 May  6 21:35 pigbag602212.tmp
> -rw-------  1 amiry users     6264 May  6 21:35 pigbag602222.tmp
> -rw-------  1 amiry users     7709 May  6 21:35 pigbag602218.tmp
> -rw-------  1 amiry users    10195 May  6 21:35 pigbag602225.tmp
> -rw-------  1 amiry users     7914 May  6 21:35 pigbag602232.tmp
> -rw-------  1 amiry users     4170 May  6 21:35 pigbag602229.tmp
> -rw-------  1 amiry users     7670 May  6 21:35 pigbag602235.tmp
> -rw-------  1 amiry users     2823 May  6 21:35 pigbag602241.tmp
> -rw-------  1 amiry users     6947 May  6 21:35 pigbag602238.tmp
> -rw-------  1 amiry users     7311 May  6 21:35 pigbag602246.tmp
> -rw-------  1 amiry users     7521 May  6 21:35 pigbag602243.tmp
> -rw-------  1 amiry users     7772 May  6 21:35 pigbag602249.tmp
> -rw-------  1 amiry users     5842 May  6 21:35 pigbag602255.tmp
> -rw-------  1 amiry users     7689 May  6 21:35 pigbag602252.tmp
> -rw-------  1 amiry users     5399 May  6 21:35 pigbag602261.tmp
> -rw-------  1 amiry users     5951 May  6 21:35 pigbag602258.tmp
> -rw-------  1 amiry users     7717 May  6 21:35 pigbag602264.tmp
> -rw-------  1 amiry users     4911 May  6 21:35 pigbag602271.tmp
> -rw-------  1 amiry users    11384 May  6 21:35 pigbag602267.tmp
> -rw-------  1 amiry users     6714 May  6 21:35 pigbag602277.tmp
> -rw-------  1 amiry users     6639 May  6 21:35 pigbag602274.tmp
> -rw-------  1 amiry users     7807 May  6 21:35 pigbag602280.tmp
> -rw-------  1 amiry users     6754 May  6 21:35 pigbag602286.tmp
> -rw-------  1 amiry users     6159 May  6 21:35 pigbag602283.tmp
> -rw-------  1 amiry users     7643 May  6 21:35 pigbag602292.tmp
> -rw-------  1 amiry users     7636 May  6 21:35 pigbag602289.tmp
> -rw-------  1 amiry users     6007 May  6 21:35 pigbag602295.tmp
> -rw-------  1 amiry users     7245 May  6 21:35 pigbag602301.tmp
> -rw-------  1 amiry users     7742 May  6 21:35 pigbag602298.tmp
> -rw-------  1 amiry users     1994 May  6 21:35 pigbag602304.tmp
> -rw-------  1 amiry users     8080 May  6 21:35 pigbag602309.tmp
> -rw-------  1 amiry users     8030 May  6 21:35 pigbag602306.tmp
> -rw-------  1 amiry users     6009 May  6 21:35 pigbag602315.tmp
> -rw-------  1 amiry users     5019 May  6 21:35 pigbag602312.tmp
> -rw-------  1 amiry users     3746 May  6 21:35 pigbag602318.tmp
> -rw-------  1 amiry users     5977 May  6 21:35 pigbag602323.tmp

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.