You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2018/05/15 21:12:00 UTC

[jira] [Updated] (TEZ-3936) Reduce TezEvent messaging overhead

     [ https://issues.apache.org/jira/browse/TEZ-3936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Eagles updated TEZ-3936:
---------------------------------
    Attachment: TEZ-3936.001.patch

> Reduce TezEvent messaging overhead
> ----------------------------------
>
>                 Key: TEZ-3936
>                 URL: https://issues.apache.org/jira/browse/TEZ-3936
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jonathan Eagles
>            Assignee: Jonathan Eagles
>            Priority: Major
>         Attachments: TEZ-3936.001.patch
>
>
> Revisiting TEZ-3145, and found that in addition to improving the way empty partitions are send from Maps to AM and AM to Reducers, message serialization can be improved to reduce network traffic.
> For example in a job with 42000 Maps and 7500 reduces where 95% of the partition data produced is empty. Tez DME events send from the AM to the Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty partitions message size is 450 bytes where 260 bytes is needed for sending empty partitions and 190 bytes for messaging. Total messaging is 132 GBs 
> 76 GBs for empty partition data and 56 GBs for non-empty partition messaging. This jira aims to reduce the non-empty partition messaging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)