You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Kuhu Shukla (JIRA)" <ji...@apache.org> on 2017/06/26 19:20:00 UTC

[jira] [Comment Edited] (TEZ-3605) Detect and prune empty partitions for the Ordered case

    [ https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063606#comment-16063606 ] 

Kuhu Shukla edited comment on TEZ-3605 at 6/26/17 7:19 PM:
-----------------------------------------------------------

bq. and invokes a merger on an empty list (not sure how this is handled)
Empty List is handled fine 
{quote}
if (segments.size() == 0) {
        LOG.info("Nothing to merge. Returning an empty iterator");
        return new EmptyIterator();
      }
{quote}
It is when the segment size is zero when it gets into trouble due to a stream with no bytes to read.
The latest patch fixes this issue by segments for only non-empty partitions.


was (Author: kshukla):
bq. and invokes a merger on an empty list (not sure how this is handled)
Empty List is handled fine 
{quote}
if (segments.size() == 0) {
        LOG.info("Nothing to merge. Returning an empty iterator");
        return new EmptyIterator();
      }
{quote}
It is when the segment size is zero when it gets into trouble due to a stream with no bytes to read.

> Detect and prune empty partitions for the Ordered case
> ------------------------------------------------------
>
>                 Key: TEZ-3605
>                 URL: https://issues.apache.org/jira/browse/TEZ-3605
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, TEZ-3605.009.patch, TEZ-3605.010.patch, TEZ-3605.011.patch
>
>
> Analogous to the Unordered case we should not have empty partition entries/segments in the Ordered/DefaultSorter case. This will save writing unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced job, this change would allow not fetching empty partitions and then throwing them away.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)