You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2010/07/10 03:39:49 UTC
[jira] Created: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
DefaultTuple and DefaultMemory understimate their memory footprint
------------------------------------------------------------------
Key: PIG-1492
URL: https://issues.apache.org/jira/browse/PIG-1492
Project: Pig
Issue Type: Bug
Reporter: Thejas M Nair
There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888952#action_12888952 ]
Hadoop QA commented on PIG-1492:
--------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12449531/PIG-1492.1.patch
against trunk revision 964182.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
-1 contrib tests. The patch failed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/370/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/370/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/370/console
This message is automatically generated.
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1492:
-------------------------------
Status: Resolved (was: Patch Available)
Resolution: Fixed
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1492:
--------------------------------
Assignee: Thejas M Nair
Fix Version/s: 0.8.0
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1492:
-------------------------------
Status: Patch Available (was: Open)
Affects Version/s: 0.8.0
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12888905#action_12888905 ]
Daniel Dai commented on PIG-1492:
---------------------------------
Talked with Tejas, he get more observations than listed above. So I believe the formula should be good. +1 for commit once hudson pass.
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890792#action_12890792 ]
Thejas M Nair commented on PIG-1492:
------------------------------------
Committed to trunk.
The contrib tests pass in my machine. The errors in hudson run seem to be caused by some environment specific issues.
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1492) DefaultTuple and DefaultMemory
understimate their memory footprint
Posted by "Thejas M Nair (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thejas M Nair updated PIG-1492:
-------------------------------
Attachment: PIG-1492.1.patch
This patch updates the memory size calculations . This changes were made so that the estimated sizes are closer to what is seen in 32 bit Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode).
It is based on some of the observations in http://www.javamex.com/tutorials/memory/string_memory_usage.shtm . The header sizes of objects has been taken to be 8 bytes. The objects size is rounded to multiple of 8 bytes. Some other adjustments for minimum size of array in a ArrayList were made based on observed size values.
The follow tables shows the tuple estimated sizes before/after the patch and what is actually observed, for the types whose calculation logic changed -
|| type || num of columns of this type in the tuple || before || patched || observed ||
| BYTEARRAY with 5 bytes| 10|254 | 504|495 |
| BYTEARRAY with 5 bytes| 1000| 21044| 44064|44127 |
| DOUBLE| 10|364 | 264|255 |
| DOUBLE| 1000|32044 | 20064| 20127 |
| LONG |10 | 284|264 |255 |
| LONG | 1000 | 24044 | 20064 | 20127 |
|| Tuple containing a single - || patched || observed ||
| BAG with 10 empty tuples|524| 1092|1159 |
| BAG with 1000 empty tuples| 48044| 100092| 100211|
| map with 10 integer key-value pairs| 380| 824| 775|
| map with 1000 integer key-value pairs| 32060| 64184| 64346|
> DefaultTuple and DefaultMemory understimate their memory footprint
> ------------------------------------------------------------------
>
> Key: PIG-1492
> URL: https://issues.apache.org/jira/browse/PIG-1492
> Project: Pig
> Issue Type: Bug
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.8.0
>
> Attachments: PIG-1492.1.patch
>
>
> There are several places where we highly underestimate the memory footprint . For example, for map datatypes, we don't account for the per entry cost for the map container data structures. The estimated size of a tuple having map with 100 integer key-value entries , as per current version of code is 3260 bytes, while what is observed is around 6775 bytes . To verify the memory footprint, i checked free memory before and after creating multiple instances of the object , using code on the lines of http://www.javaspecialists.eu/archive/Issue029.html .
> In PIG-1443 similar change was done to fix this for CHARARRAY .
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.