You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2016/04/06 23:06:25 UTC

[jira] [Commented] (TEZ-3202) Reduce the memory need for jobs with high number of segments

    [ https://issues.apache.org/jira/browse/TEZ-3202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229111#comment-15229111 ] 

Jonathan Eagles commented on TEZ-3202:
--------------------------------------

The approach taken in the above patch is to create a bridge between the upper layers and lower layers to reduce the amount of storage needed per segment. I have seen several cases where the number of segments is near 1000000 and accounts for almost 300MB in the task attempt heap.

> Reduce the memory need for jobs with high number of segments
> ------------------------------------------------------------
>
>                 Key: TEZ-3202
>                 URL: https://issues.apache.org/jira/browse/TEZ-3202
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>         Attachments: HADOOPPF-10832.2.patch
>
>
> Segment has a 'key' member that holds accounting information to the reader's current key buffer, position, and length. There is a 384 byte overhead per segment since the account is done with the DataInputBuffer class which derives from DataInputStream which has underlying byte[80] and char[80] among significant pieces. This jira aims to reduce the overhead per segment



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)