You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2016/01/27 01:06:39 UTC
[jira] [Comment Edited] (TEZ-3076) Reduce merge memory overhead to support large number of in-memory mapoutputs

    [ https://issues.apache.org/jira/browse/TEZ-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118323#comment-15118323 ] 

Jonathan Eagles edited comment on TEZ-3076 at 1/27/16 12:06 AM:
----------------------------------------------------------------

MapOutput data size reduction
* means removed as part of patch
{noformat}
12   // class overhead assuming < 32GB heap with CompressedOops 
04   private final int id;
04*  private final Type type;
04   private InputAttemptIdentifier attemptIdentifier;
08* private final long size;
01  private final boolean primaryMapOutput;
04  private final FetchedInputAllocatorOrderedGrouped callback;
04* private final byte[] memory;
04  private BoundedByteArrayOutputStream byteStream;
04* private final FileSystem localFS;
04  private final Path tmpOutputPath;
04  private final FileChunk outputPath;
04* private OutputStream disk;
{noformat}

(rounded up to 8 bytes alignment)
Shallow memory size reduction
Original 61->64 bytes
Patched 37->40 byte
===================
24 byte savings

// InputIdentifier to int conversion
use 4 bytes for int instead of 4 bytes pointing to 16 bytes InputIdentifier object
==========================
16 bytes savings

jdk hashset -> fastutil inthashset
 (32 * SIZE + 4 * CAPACITY) -> (8 * CAPACITY)
// assuming 0.75 load factor
28 * CAPACITY -> 8 * CAPACITY
==============
20 byte savings per entry

Total savings
24 + 16 + ~20 (more actually)
===================
60 bytes per entry

for the original problem in the description with 500000 mapouts
===================
*28MB* 

// this is a significant in proportion to the 87MB memory allocated to this merge operation.




was (Author: jeagles):
MapOutput data size reduction
* means removed as part of patch
{noformat}
12   // class overhead assuming < 32GB heap with CompressedOops 
04   private final int id;
04*  private final Type type;
04   private InputAttemptIdentifier attemptIdentifier;
08* private final long size;
01  private final boolean primaryMapOutput;
04  private final FetchedInputAllocatorOrderedGrouped callback;
04* private final byte[] memory;
04  private BoundedByteArrayOutputStream byteStream;
04* private final FileSystem localFS;
04  private final Path tmpOutputPath;
04  private final FileChunk outputPath;
04* private OutputStream disk;
{noformat}

(rounded up to 8 bytes alignment)
Shallow memory size reduction
Original 61->64 bytes
Patched 37->40 byte
===================
24 byte savings

// InputIdentifier to int conversion
use 4 bytes for int instead of 4 bytes pointing to 16 bytes InputIdentifier object
==========================
16 bytes savings

jdk hashset -> fastutil inthashset
 (32 * SIZE + 4 * CAPACITY) -> (8 * CAPACITY)
// assuming 0.75 load factor
28 * CAPACITY -> 8 * CAPACITY
==============
20 byte savings per entry

Total savings
24 + 16 + ~20 (more actually)
===================
60 bytes per entry

for the original problem in the description with 500000 mapouts
===================
28MB 

// this is a significant in proportion to the 87MB memory allocated to this merge operation.



> Reduce merge memory overhead to support large number of in-memory mapoutputs
> ----------------------------------------------------------------------------
>
>                 Key: TEZ-3076
>                 URL: https://issues.apache.org/jira/browse/TEZ-3076
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>         Attachments: TEZ-3076.1.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)