You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Rajesh Balamohan (JIRA)" <ji...@apache.org> on 2015/09/08 01:45:45 UTC
[jira] [Commented] (TEZ-2643) Minimize number of empty spills in
Pipelined Sorter
[ https://issues.apache.org/jira/browse/TEZ-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734071#comment-14734071 ]
Rajesh Balamohan commented on TEZ-2643:
---------------------------------------
Sorry about the delay [~saikatr]. Patch optimizes for the cases when SpanHeap size is 0 and avoids creating empty files. Minor comments
- Rename ignoreSpillIfNeeded to ignoreEmptySpills?
- Should sendPipelinedShuffleEvents be moved from sort to spill? If so, spill does not need to return any flag.
{noformat}
if (pipelinedShuffle) {
sendPipelinedShuffleEvents();
}
{noformat}
- In spill(), should spillRec / filename creation, adding to spillFilePaths be moved after ignoreSpillIfNeeded check?
{noformat}
// create spill file
final long size = capacity +
+ (partitions * APPROX_HEADER_LENGTH);
final TezSpillRecord spillRec = new TezSpillRecord(partitions);
final Path filename =
mapOutputFile.getSpillFileForWrite(numSpills, size);
spillFilePaths.put(numSpills, filename);
{noformat}
> Minimize number of empty spills in Pipelined Sorter
> ---------------------------------------------------
>
> Key: TEZ-2643
> URL: https://issues.apache.org/jira/browse/TEZ-2643
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Saikat
> Assignee: Saikat
> Attachments: TEZ-2643.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)