You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2008/06/11 23:50:45 UTC
[jira] Updated: (PIG-235) Performance issues with memory spills.
[ https://issues.apache.org/jira/browse/PIG-235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pradeep Kamath updated PIG-235:
-------------------------------
Attachment: gcoverhead.patch
Proposal for memory allocation overhead issue:
==============================================
Currently in org.apache.pig.impl.util.SpillableMemoryManger:
1) We use MemoryManagement interface to get notified when the "collection threshold" exceeds a limit (we set this to biggest_heap*0.5). With this in place we are still seeing "GC overhead limit" issues when trying large dataset operations. Observing some runs, it looks like the notification is not frequent enough and early enough to prevent memory issues possibly because this notification only occurs after GC.
2) We only attempt to free upto :
long toFree = info.getUsage().getUsed() - (long)(info.getUsage().getMax()*.5);
This is only the excess amount over the threshold which caused the notification and is not sufficient to not be called again soon.
3) While iterating over spillables, if current spillable's memory size is > gcActivationSize, we try to invoke System.gc
4) We *always* invoke System.gc() after iterating over spillables
Proposed changes are:
=====================
1) In addition to "collection threshold" of biggest_heap*0.5, a "usage threshold" of biggest_heap*0.7 will be used so we get notified early and often irrespective of whether garbage collection has occured.
2) We will attempt to free
toFree = info.getUsage().getUsed() - threshold + (long)(threshold * 0.5); where threshold is (info.getUsage().getMax() * 0.7) if the handleNotification() method is handling a "usage threshold exceeded" notification and (info.getUsage().getMax() * 0.5) otherwise ("collection threshold exceeded" case)
3) While iterating over spillables, if the *accumulated memory freed thus far* is > gcActivationSize OR if we have freed sufficient memory (based on 2) above), then we set a flag to invoke System.gc when we exit the loop. Acummulated Free memory is the memory freed across spills and between GC invocations.
4) We will invoke System.gc() only if the flag is set in 3) above
> Performance issues with memory spills.
> --------------------------------------
>
> Key: PIG-235
> URL: https://issues.apache.org/jira/browse/PIG-235
> Project: Pig
> Issue Type: Bug
> Environment: Pig + Hadoop
> Reporter: Amir Youssefi
> Assignee: Utkarsh Srivastava
> Priority: Critical
> Attachments: gcoverhead.patch
>
>
> We have are hitting low performance issue with memory spills.
> A reducer gets stuck in following state for *tens* of hours while
> thousands of small files are spilled. This is besides skewed keys issue.
> Note that size of spills become smaller as times goes. We can use this
> and try to address the issue by spilling in larger chunks.
> I tried different sub-set of data. Then made it work with all kind of tricks/hacks but would like to have this working easily to say the least:
> cogroup large_data_set by $0, small_date_set by $0
> small_date_set fits in memory.
> Log:
> 2008-05-06 23:24:06,014 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251352480(245461K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:06,734 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251401304(245509K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:07,455 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251254912(245366K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:08,175 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251281808(245392K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:08,895 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251309400(245419K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:09,615 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251358232(245467K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:10,336 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251267696(245378K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:11,056 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251307352(245417K) committed =
> 437256192(427008K) max = 477233152(466048K)
> 2008-05-06 23:24:11,776 INFO
> org.apache.pig.impl.util.SpillableMemoryManager: low memory handler
> called init = 4194304(4096K) used = 251335344(245444K) committed =
> 437256192(427008K) max = 477233152(466048K)
> (used column slowly increasing)
> Actual spill example:
> 8000 files
> Sorting by time (new to old) I see small spills:
> -rw------- 1 amiry users 3675 May 6 23:44 pigbag635090.tmp
> -rw------- 1 amiry users 4 May 6 23:44 pigbag635091.tmp
> -rw------- 1 amiry users 3917 May 6 23:44 pigbag635086.tmp
> -rw------- 1 amiry users 3949 May 6 23:44 pigbag635088.tmp
> -rw------- 1 amiry users 4 May 6 23:44 pigbag635089.tmp
> -rw------- 1 amiry users 3969 May 6 23:44 pigbag635084.tmp
> -rw------- 1 amiry users 4073 May 6 23:44 pigbag635068.tmp
> -rw------- 1 amiry users 47634 May 6 23:44 pigbag635070.tmp
> -rw------- 1 amiry users 5101 May 6 23:44 pigbag635065.tmp
> -rw------- 1 amiry users 5722 May 6 23:44 pigbag635059.tmp
> -rw------- 1 amiry users 7570 May 6 23:44 pigbag635062.tmp
> -rw------- 1 amiry users 7802 May 6 23:44 pigbag635056.tmp
> -rw------- 1 amiry users 7514 May 6 23:44 pigbag635051.tmp
> -rw------- 1 amiry users 3929 May 6 23:44 pigbag635054.tmp
> -rw------- 1 amiry users 5342 May 6 23:44 pigbag635045.tmp
> -rw------- 1 amiry users 7361 May 6 23:44 pigbag635048.tmp
> -rw------- 1 amiry users 6663 May 6 23:44 pigbag635042.tmp
> -rw------- 1 amiry users 7511 May 6 23:44 pigbag635036.tmp
> -rw------- 1 amiry users 7520 May 6 23:44 pigbag635039.tmp
> -rw------- 1 amiry users 3873 May 6 23:44 pigbag635034.tmp
> -rw------- 1 amiry users 4029 May 6 23:44 pigbag635032.tmp
> -rw------- 1 amiry users 3823 May 6 23:44 pigbag635028.tmp
> -rw------- 1 amiry users 3726 May 6 23:44 pigbag635030.tmp
> -rw------- 1 amiry users 3934 May 6 23:44 pigbag635024.tmp
> Sorting by time (old to new) I see a few large spills then quickly (in
> less than 15min) come small ones:
> -rw------- 1 amiry users 45221453 May 6 21:23 pigbag59657.tmp
> -rw------- 1 amiry users 56161613 May 6 21:23 pigbag59658.tmp
> -rw------- 1 amiry users 70661942 May 6 21:23 pigbag59659.tmp
> -rw------- 1 amiry users 75308107 May 6 21:23 pigbag59660.tmp
> -rw------- 1 amiry users 76381091 May 6 21:24 pigbag59661.tmp
> -rw------- 1 amiry users 74691366 May 6 21:24 pigbag81914.tmp
> -rw------- 1 amiry users 73133098 May 6 21:24 pigbag103839.tmp
> -rw------- 1 amiry users 72750330 May 6 21:24 pigbag125123.tmp
> -rw------- 1 amiry users 71267460 May 6 21:25 pigbag146472.tmp
> -rw------- 1 amiry users 69638363 May 6 21:25 pigbag167358.tmp
> -rw------- 1 amiry users 68010250 May 6 21:25 pigbag187566.tmp
> -rw------- 1 amiry users 66312739 May 6 21:25 pigbag207447.tmp
> -rw------- 1 amiry users 64601422 May 6 21:26 pigbag226895.tmp
> -rw------- 1 amiry users 62997501 May 6 21:26 pigbag245690.tmp
> -rw------- 1 amiry users 62525926 May 6 21:26 pigbag264154.tmp
> -rw------- 1 amiry users 60940107 May 6 21:26 pigbag282367.tmp
> -rw------- 1 amiry users 59540198 May 6 21:26 pigbag300215.tmp
> -rw------- 1 amiry users 57918140 May 6 21:27 pigbag317750.tmp
> -rw------- 1 amiry users 57728845 May 6 21:27 pigbag334505.tmp
> -rw------- 1 amiry users 55427771 May 6 21:27 pigbag351436.tmp
> -rw------- 1 amiry users 55405942 May 6 21:27 pigbag367615.tmp
> -rw------- 1 amiry users 54600778 May 6 21:28 pigbag383872.tmp
> -rw------- 1 amiry users 52438311 May 6 21:28 pigbag399722.tmp
> -rw------- 1 amiry users 51250459 May 6 21:28 pigbag415094.tmp
> -rw------- 1 amiry users 50489324 May 6 21:28 pigbag430026.tmp
> -rw------- 1 amiry users 48311361 May 6 21:28 pigbag444835.tmp
> -rw------- 1 amiry users 47296555 May 6 21:28 pigbag458869.tmp
> -rw------- 1 amiry users 45703372 May 6 21:29 pigbag472771.tmp
> -rw------- 1 amiry users 46243949 May 6 21:29 pigbag486062.tmp
> -rw------- 1 amiry users 46195603 May 6 21:29 pigbag499549.tmp
> -rw------- 1 amiry users 43916731 May 6 21:29 pigbag513154.tmp
> -rw------- 1 amiry users 42970027 May 6 21:29 pigbag525921.tmp
> -rw------- 1 amiry users 5965 May 6 21:29 pigbag538555.tmp
> -rw------- 1 amiry users 46288099 May 6 21:30 pigbag538558.tmp
> -rw------- 1 amiry users 7735 May 6 21:30 pigbag539078.tmp
> -rw------- 1 amiry users 8058 May 6 21:30 pigbag539075.tmp
> -rw------- 1 amiry users 34034358 May 6 21:30 pigbag539079.tmp
> -rw------- 1 amiry users 8800 May 6 21:30 pigbag549021.tmp
> -rw------- 1 amiry users 45054789 May 6 21:31 pigbag549025.tmp
> -rw------- 1 amiry users 5750 May 6 21:31 pigbag549950.tmp
> -rw------- 1 amiry users 6794 May 6 21:31 pigbag549953.tmp
> -rw------- 1 amiry users 35112392 May 6 21:31 pigbag549956.tmp
> -rw------- 1 amiry users 7330 May 6 21:31 pigbag560146.tmp
> -rw------- 1 amiry users 7856 May 6 21:31 pigbag560143.tmp
> -rw------- 1 amiry users 44039882 May 6 21:32 pigbag560149.tmp
> -rw------- 1 amiry users 7508 May 6 21:32 pigbag561917.tmp
> -rw------- 1 amiry users 4567 May 6 21:32 pigbag561921.tmp
> -rw------- 1 amiry users 33031655 May 6 21:32 pigbag561924.tmp
> -rw------- 1 amiry users 7744 May 6 21:32 pigbag571510.tmp
> -rw------- 1 amiry users 7709 May 6 21:32 pigbag571507.tmp
> -rw------- 1 amiry users 42771860 May 6 21:33 pigbag571513.tmp
> -rw------- 1 amiry users 29912726 May 6 21:33 pigbag572784.tmp
> -rw------- 1 amiry users 42563305 May 6 21:34 pigbag581561.tmp
> -rw------- 1 amiry users 29961190 May 6 21:34 pigbag583024.tmp
> -rw------- 1 amiry users 7704 May 6 21:34 pigbag591783.tmp
> -rw------- 1 amiry users 41448656 May 6 21:35 pigbag591786.tmp
> -rw------- 1 amiry users 7679 May 6 21:35 pigbag592482.tmp
> -rw------- 1 amiry users 7328 May 6 21:35 pigbag592488.tmp
> -rw------- 1 amiry users 7809 May 6 21:35 pigbag592485.tmp
> -rw------- 1 amiry users 128909 May 6 21:35 pigbag592491.tmp
> -rw------- 1 amiry users 33069409 May 6 21:35 pigbag592530.tmp
> -rw------- 1 amiry users 7004 May 6 21:35 pigbag602114.tmp
> -rw------- 1 amiry users 4083 May 6 21:35 pigbag602121.tmp
> -rw------- 1 amiry users 11533 May 6 21:35 pigbag602117.tmp
> -rw------- 1 amiry users 7603 May 6 21:35 pigbag602124.tmp
> -rw------- 1 amiry users 7611 May 6 21:35 pigbag602130.tmp
> -rw------- 1 amiry users 7692 May 6 21:35 pigbag602127.tmp
> -rw------- 1 amiry users 7821 May 6 21:35 pigbag602136.tmp
> -rw------- 1 amiry users 7467 May 6 21:35 pigbag602133.tmp
> -rw------- 1 amiry users 1147 May 6 21:35 pigbag602139.tmp
> -rw------- 1 amiry users 8543 May 6 21:35 pigbag602144.tmp
> -rw------- 1 amiry users 8517 May 6 21:35 pigbag602141.tmp
> -rw------- 1 amiry users 3893 May 6 21:35 pigbag602150.tmp
> -rw------- 1 amiry users 6029 May 6 21:35 pigbag602147.tmp
> -rw------- 1 amiry users 7687 May 6 21:35 pigbag602152.tmp
> -rw------- 1 amiry users 7411 May 6 21:35 pigbag602158.tmp
> -rw------- 1 amiry users 6079 May 6 21:35 pigbag602155.tmp
> -rw------- 1 amiry users 6964 May 6 21:35 pigbag602161.tmp
> -rw------- 1 amiry users 5404 May 6 21:35 pigbag602168.tmp
> -rw------- 1 amiry users 7915 May 6 21:35 pigbag602164.tmp
> -rw------- 1 amiry users 4132 May 6 21:35 pigbag602174.tmp
> -rw------- 1 amiry users 7708 May 6 21:35 pigbag602171.tmp
> -rw------- 1 amiry users 7802 May 6 21:35 pigbag602176.tmp
> -rw------- 1 amiry users 6089 May 6 21:35 pigbag602182.tmp
> -rw------- 1 amiry users 8161 May 6 21:35 pigbag602179.tmp
> -rw------- 1 amiry users 7356 May 6 21:35 pigbag602188.tmp
> -rw------- 1 amiry users 8231 May 6 21:35 pigbag602185.tmp
> -rw------- 1 amiry users 8340 May 6 21:35 pigbag602191.tmp
> -rw------- 1 amiry users 3870 May 6 21:35 pigbag602198.tmp
> -rw------- 1 amiry users 7587 May 6 21:35 pigbag602195.tmp
> -rw------- 1 amiry users 5884 May 6 21:35 pigbag602200.tmp
> -rw------- 1 amiry users 7644 May 6 21:35 pigbag602206.tmp
> -rw------- 1 amiry users 7756 May 6 21:35 pigbag602203.tmp
> -rw------- 1 amiry users 8054 May 6 21:35 pigbag602209.tmp
> -rw------- 1 amiry users 6182 May 6 21:35 pigbag602215.tmp
> -rw------- 1 amiry users 7740 May 6 21:35 pigbag602212.tmp
> -rw------- 1 amiry users 6264 May 6 21:35 pigbag602222.tmp
> -rw------- 1 amiry users 7709 May 6 21:35 pigbag602218.tmp
> -rw------- 1 amiry users 10195 May 6 21:35 pigbag602225.tmp
> -rw------- 1 amiry users 7914 May 6 21:35 pigbag602232.tmp
> -rw------- 1 amiry users 4170 May 6 21:35 pigbag602229.tmp
> -rw------- 1 amiry users 7670 May 6 21:35 pigbag602235.tmp
> -rw------- 1 amiry users 2823 May 6 21:35 pigbag602241.tmp
> -rw------- 1 amiry users 6947 May 6 21:35 pigbag602238.tmp
> -rw------- 1 amiry users 7311 May 6 21:35 pigbag602246.tmp
> -rw------- 1 amiry users 7521 May 6 21:35 pigbag602243.tmp
> -rw------- 1 amiry users 7772 May 6 21:35 pigbag602249.tmp
> -rw------- 1 amiry users 5842 May 6 21:35 pigbag602255.tmp
> -rw------- 1 amiry users 7689 May 6 21:35 pigbag602252.tmp
> -rw------- 1 amiry users 5399 May 6 21:35 pigbag602261.tmp
> -rw------- 1 amiry users 5951 May 6 21:35 pigbag602258.tmp
> -rw------- 1 amiry users 7717 May 6 21:35 pigbag602264.tmp
> -rw------- 1 amiry users 4911 May 6 21:35 pigbag602271.tmp
> -rw------- 1 amiry users 11384 May 6 21:35 pigbag602267.tmp
> -rw------- 1 amiry users 6714 May 6 21:35 pigbag602277.tmp
> -rw------- 1 amiry users 6639 May 6 21:35 pigbag602274.tmp
> -rw------- 1 amiry users 7807 May 6 21:35 pigbag602280.tmp
> -rw------- 1 amiry users 6754 May 6 21:35 pigbag602286.tmp
> -rw------- 1 amiry users 6159 May 6 21:35 pigbag602283.tmp
> -rw------- 1 amiry users 7643 May 6 21:35 pigbag602292.tmp
> -rw------- 1 amiry users 7636 May 6 21:35 pigbag602289.tmp
> -rw------- 1 amiry users 6007 May 6 21:35 pigbag602295.tmp
> -rw------- 1 amiry users 7245 May 6 21:35 pigbag602301.tmp
> -rw------- 1 amiry users 7742 May 6 21:35 pigbag602298.tmp
> -rw------- 1 amiry users 1994 May 6 21:35 pigbag602304.tmp
> -rw------- 1 amiry users 8080 May 6 21:35 pigbag602309.tmp
> -rw------- 1 amiry users 8030 May 6 21:35 pigbag602306.tmp
> -rw------- 1 amiry users 6009 May 6 21:35 pigbag602315.tmp
> -rw------- 1 amiry users 5019 May 6 21:35 pigbag602312.tmp
> -rw------- 1 amiry users 3746 May 6 21:35 pigbag602318.tmp
> -rw------- 1 amiry users 5977 May 6 21:35 pigbag602323.tmp
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.