You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2010/08/17 02:46:18 UTC

[jira] Updated: (PIG-1447) Tune memory usage of InternalCachedBag

     [ https://issues.apache.org/jira/browse/PIG-1447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thejas M Nair updated PIG-1447:
-------------------------------

    Attachment: L15_modified.pig

The quest for better value for a new default value for pig.cachedbag.memusage was prompted by changes in PIG-1443 and PIG-1492 . Before the changes made as part of those jiras, pig was underestimating the memory footprint of data.
In data of 'typical' sizes  (chararray/bytearray with less than 20 chars), the new memory size estimates can be upto 2 times the old version without any changes (0.6.0).

I tried running pig queries with max heap size setting for tasks as 1GB, and compared the use of 0.1f and 0.2f as values for pig.cachedbag.memusage. I ran pigmix v1  queries(L1-L12) ,  modified pigmix v1 that specifies types , and modified L15 query which has several distincts in a nested foreach statement.
Only queries L5, L7 and L15 had proactive spills. I see that the number of spills goes down with 0.2f as the value, but the total runtime is practically the same. 

(See PIG-1524 for more on spills currently reported )

|| query || spills with 0.1f || spills with 0.2f || 
| L5 (original pigmix) | 496k | 0 |
| L7 (original pigmix) | 82k | 0 |
| L5 (with types) | 609k | 82k |
| L7 (with types) | 128k | 0 |
| L15_modified (attached to jira) |  501k | 326k |


Some other factors to consider while determining a new value for this property -
- as a result of issue described in PIG-1544, all proactive-spill bags don't share the memory limit.
- the default value should be low enough, so that queries work fine in most cases. Expert users can tweak this to improve performance
- the value of 0.1f has been used for a long time (with old memory estimate formula), and seems to work for most cases.
- during the above tests, no other queries were running, so the disks were relatively free. 

I propose that we increase the default value to 0.15f accommodate for changes in memory size estimation so that the spill behavior is closer to what it has been with 0.6 and 0.7. 


> Tune memory usage of InternalCachedBag
> --------------------------------------
>
>                 Key: PIG-1447
>                 URL: https://issues.apache.org/jira/browse/PIG-1447
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.7.0
>            Reporter: Daniel Dai
>            Assignee: Thejas M Nair
>             Fix For: 0.8.0
>
>         Attachments: L15_modified.pig
>
>
> We need to find a better value for "pig.cachedbag.memusage".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.