You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Zhiyuan Yang (JIRA)" <ji...@apache.org> on 2017/06/27 23:20:00 UTC

[jira] [Comment Edited] (TEZ-3769) Unordered: Fix wrong stats being sent out in the last event, when final merge is disabled

    [ https://issues.apache.org/jira/browse/TEZ-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065653#comment-16065653 ] 

Zhiyuan Yang edited comment on TEZ-3769 at 6/27/17 11:19 PM:
-------------------------------------------------------------

General discussion beyond this patch: 
1. about counter ADDITIONAL_SPILLS_BYTES_WRITTEN, there are difference between the usage (final spill stats) and documentation(bytes written due to unnecessary spills). If final spill size is not useful, we can merge it into normal counter. Or we just fix the documentation/comments.
2. Think we should refactor this unordered writer later sometime. Right now it's stuffed with too many things and so many code path was multiplexed. It'll be harder and harder to modify or review.


was (Author: aplusplus):
General discussion beyond this patch: 
1. about counter ADDITIONAL_SPILLS_BYTES_WRITTEN, there are difference between the usage (final spill stats) and documentation(bytes written due to unnecessary spills).
2. Think we should refactor this unordered writer later sometime. Right now it's stuffed with too many things and so many code path was multiplexed. It'll be harder and harder to modify or review.

> Unordered: Fix wrong stats being sent out in the last event, when final merge is disabled
> -----------------------------------------------------------------------------------------
>
>                 Key: TEZ-3769
>                 URL: https://issues.apache.org/jira/browse/TEZ-3769
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: TEZ-3769.1.patch, TEZ-3769.2.patch
>
>
> When final merge is disabled (without pipelining), wrong stats was sent out in the last event. 
> It was based on {{numRecordsPerPartition}} which contains the overall partition data. It should be ideally be based on the spill result and its buffers.
> Also, {{finalSpill}} was unncessarily sending events when no data was present (i.e, when currentBuffer didn't have any data).  This can be optimized to reduce the number of events being sent across.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)