You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2018/05/15 20:24:00 UTC

[jira] [Created] (TEZ-3936) Reduce TezEvent messaging overhead

Jonathan Eagles created TEZ-3936:
------------------------------------

             Summary: Reduce TezEvent messaging overhead
                 Key: TEZ-3936
                 URL: https://issues.apache.org/jira/browse/TEZ-3936
             Project: Apache Tez
          Issue Type: Bug
            Reporter: Jonathan Eagles
            Assignee: Jonathan Eagles


Revisiting TEZ-3145, and found that in addition to improving the way empty partitions are send from Maps to AM and AM to Reducers, message serialization can be improved to reduce network traffic.

For example in a job with 42000 Maps and 7500 reduces where 95% of the partition data produced is empty. Tez DME events send from the AM to the Reducers is num(Maps) * num(Reducers) * size (Wrapped DME). With 95% empty partitions message size is 450 bytes where 260 bytes is needed for sending empty partitions and 190 bytes for messaging. Total messaging is 132 GBs 
76 GBs for empty partition data and 56 GBs for non-empty partition messaging. This jira aims to reduce the non-empty partition messaging.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)