You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "TezQA (JIRA)" <ji...@apache.org> on 2017/08/09 19:12:00 UTC

[jira] [Commented] (TEZ-3159) Reduce memory utilization while serializing keys and values

    [ https://issues.apache.org/jira/browse/TEZ-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120484#comment-16120484 ] 

TezQA commented on TEZ-3159:
----------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment
  http://issues.apache.org/jira/secure/attachment/12881049/TEZ-3159.001.patch
  against master revision 8dcf8a1.

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of javac compiler warnings.

    {color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

    {color:red}-1 findbugs{color}.  The patch appears to introduce 4 new Findbugs (version 3.0.1) warnings.

        {color:red}-1 release audit{color}.  The applied patch generated 1 release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2606//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2606//artifact/patchprocess/patchReleaseAuditProblems.txt
Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2606//artifact/patchprocess/newPatchFindbugsWarningstez-runtime-library.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2606//artifact/patchprocess/newPatchFindbugsWarningstez-common.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2606//console

This message is automatically generated.

> Reduce memory utilization while serializing keys and values
> -----------------------------------------------------------
>
>                 Key: TEZ-3159
>                 URL: https://issues.apache.org/jira/browse/TEZ-3159
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Muhammad Samir Khan
>         Attachments: TEZ-3159.001.patch
>
>
>   Currently DataOutputBuffer is used for serializing. The underlying buffer keeps doubling in size when it reaches capacity. In some of the Pig scripts which serialize big bags, we end up with OOM in Tez as there is no space to double the array size. Mapreduce mode runs fine in those cases with 1G heap. The scenarios are
>     - When combiner runs in reducer and some of the fields after combining are still big bags (For eg: distinct). Currently with mapreduce combiner does not run in reducer - MAPREDUCE-5221. Since input sort buffers hold good amount of memory at that time it can easily go OOM.
>    -  While serializing output with bags when there are multiple inputs and outputs and the sort buffers for those take up space.
> It is a pain especially after buffer size hits 128MB. Doubling at 128MB will require 128MB (existing array) +256MB (new array). Any doubling after that requires even more space. But most of the time the data is probably not going to fill up that 256MB leading to wastage.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)