You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Muhammad Samir Khan (JIRA)" <ji...@apache.org> on 2017/07/26 18:17:00 UTC
[jira] [Commented] (TEZ-3752) Reduce Object size of
InMemoryMapOutput for large jobs
[ https://issues.apache.org/jira/browse/TEZ-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16102046#comment-16102046 ]
Muhammad Samir Khan commented on TEZ-3752:
------------------------------------------
JOL dump:
Before:
-internals:
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 78 12 01 f8 (01111000 00010010 00000001 11111000) (-134147464)
12 4 int MapOutput.id 1
16 1 boolean MapOutput.primaryMapOutput false
17 3 (alignment/padding gap)
20 4 org.apache.tez.runtime.library.common.InputAttemptIdentifier MapOutput.attemptIdentifier null
24 4 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped MapOutput.callback null
28 4 org.apache.hadoop.io.BoundedByteArrayOutputStream InMemoryMapOutput.byteStream (object)
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
-footprint:
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput@10bdf5e5d footprint:
COUNT AVG SUM DESCRIPTION
1 16 16 [B
1 32 32 org.apache.hadoop.io.BoundedByteArrayOutputStream
1 32 32 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
3 80 (total)
{code}
After:
-internals:
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput object internals:
OFFSET SIZE TYPE DESCRIPTION VALUE
0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1)
4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0)
8 4 (object header) 77 12 01 f8 (01110111 00010010 00000001 11111000) (-134147465)
12 4 int MapOutput.id 1
16 1 boolean MapOutput.primaryMapOutput false
17 3 (alignment/padding gap)
20 4 org.apache.tez.runtime.library.common.InputAttemptIdentifier MapOutput.attemptIdentifier null
24 4 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped MapOutput.callback null
28 4 byte[] InMemoryMapOutput.byteArray []
Instance size: 32 bytes
Space losses: 3 bytes internal + 0 bytes external = 3 bytes total
{code}
-footprint
{code}
# Running 64-bit HotSpot VM.
# Using compressed oop with 3-bit shift.
# Using compressed klass with 3-bit shift.
# Objects are 8 bytes aligned.
# Field sizes by type: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
# Array element sizes: 4, 1, 1, 2, 2, 4, 4, 8, 8 [bytes]
Instantiated the sample instance via org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput(org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.FetchedInputAllocatorOrderedGrouped,long,boolean,org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$1)
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput@148080bbd footprint:
COUNT AVG SUM DESCRIPTION
1 16 16 [B
1 32 32 org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MapOutput$InMemoryMapOutput
2 48 (total)
{code}
> Reduce Object size of InMemoryMapOutput for large jobs
> ------------------------------------------------------
>
> Key: TEZ-3752
> URL: https://issues.apache.org/jira/browse/TEZ-3752
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jonathan Eagles
> Assignee: Jonathan Eagles
> Attachments: TEZ-3752.001.patch
>
>
> Follow-on jira from TEZ-3732. The InMemoryMapOutput has a BoundedByteArrayOutputStream that is only used in the Merged MapOutput case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)