You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2017/06/02 15:37:04 UTC

[jira] [Updated] (TEZ-3732) Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs

     [ https://issues.apache.org/jira/browse/TEZ-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Eagles updated TEZ-3732:
---------------------------------
    Attachment: TEZ-3732.3.patch

Removing the more extreme BoundedByteArrayOutputStream removal in the round since I am not fully able to confirm that the size of the array will always be the number of bytes written to the merged array. In fact I'm almost positive it won't be. Will put that in a follow on jira later when I have time.

> Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs
> -------------------------------------------------------------------------
>
>                 Key: TEZ-3732
>                 URL: https://issues.apache.org/jira/browse/TEZ-3732
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: TEZ-3732.1.patch, TEZ-3732.2.patch, TEZ-3732.3.patch
>
>
> Objects in 64bit java are 12bytes + member size aligned to 8 bytes
> InputAttemptIdentifier -> 33Bytes gets aligned up to 40 bytes
> This class is just one byte over the 32 byte alignment. Reducing object size by one byte can save 8 bytes per object.
> This is ~8MB savings for 1,000,000 inputs and ~80 MB savings for tasks with 10,000,000 inputs to fetch (Yes this is a real job)
> MapOutput -> 45 bytes gets aligned to 48 bytes
> This class can be sub-classed to avoid all sub-classes paying the object size cost for the other sub-classes
> Wait InMemory and DiskDirect -> 32 bytes
> Disk -> 40 bytes
> Total savings is harder to account for but more than the above case.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)