You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2015/09/08 01:45:45 UTC

[jira] [Commented] (TEZ-2643) Minimize number of empty spills in Pipelined Sorter

    [ https://issues.apache.org/jira/browse/TEZ-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734071#comment-14734071 ] 

Rajesh Balamohan commented on TEZ-2643:
---------------------------------------


Sorry about the delay [~saikatr].  Patch optimizes for the cases when SpanHeap size is 0 and avoids creating empty files. Minor comments
- Rename ignoreSpillIfNeeded to ignoreEmptySpills?
- Should sendPipelinedShuffleEvents be moved from sort to spill? If so, spill does not need to return any flag.
{noformat}
     if (pipelinedShuffle) {
        sendPipelinedShuffleEvents();
      }
{noformat}
- In spill(), should spillRec / filename creation, adding to spillFilePaths be moved after ignoreSpillIfNeeded check?
{noformat}
	  // create spill file
      final long size = capacity +
          + (partitions * APPROX_HEADER_LENGTH);
      final TezSpillRecord spillRec = new TezSpillRecord(partitions);
      final Path filename =
          mapOutputFile.getSpillFileForWrite(numSpills, size);
      spillFilePaths.put(numSpills, filename);
{noformat}

> Minimize number of empty spills in Pipelined Sorter
> ---------------------------------------------------
>
>                 Key: TEZ-2643
>                 URL: https://issues.apache.org/jira/browse/TEZ-2643
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Saikat
>            Assignee: Saikat
>         Attachments: TEZ-2643.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)