You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Ming Ma (JIRA)" <ji...@apache.org> on 2016/04/19 21:23:25 UTC

[jira] [Updated] (TEZ-3206) Have unordered partitioned KV output send partition stats via VertexManagerEvent

     [ https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ming Ma updated TEZ-3206:
-------------------------
    Attachment: TEZ-3206.patch

Thanks [~sseth]. Yes, it is due to the fact that Tez makes use of fewer bits for encode the size.

Here is the draft patch to have {{UnorderedPartitionedKVWriter}} send partition stats in terms of compressed output size via VertexManagerEvent. Given compressed size is only available after spill and could be called on the spill finish callback threads, make the global stat thread safe.

Note that in the current protocol for sorted partitioned case {{VertexManagerEventPayloadProto.Builder}}'s  {{setOutputSize}} takes uncompressed size, but {{setPartitionStats}} takes compressed size. Based on how {{ShuffleVertexManager}} consumes partition stats, it doesn't matter if it is compressed or not.

Maybe we should use uncompressed size for partition stats? If so, the patch will be simpler. And I can file a separate jira to have sorted partitioned switch to send uncompressed size.

> Have unordered partitioned KV output send partition stats via VertexManagerEvent 
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3206
>                 URL: https://issues.apache.org/jira/browse/TEZ-3206
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Ming Ma
>         Attachments: TEZ-3206.patch
>
>
> As part of the auto-parallelism feature, ordered partitioned KV output's partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But this isn't available for unordered partitioned output. Having {{UnorderedPartitionedKVWriter}} send partition stats will enable the auto-parallelism support for unordered KV or other custom data routing mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)