You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/04/22 01:21:59 UTC

[jira] [Commented] (SPARK-7041) Avoid writing empty files in ExternalSorter

    [ https://issues.apache.org/jira/browse/SPARK-7041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14506000#comment-14506000 ] 

Apache Spark commented on SPARK-7041:
-------------------------------------

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/5622

> Avoid writing empty files in ExternalSorter
> -------------------------------------------
>
>                 Key: SPARK-7041
>                 URL: https://issues.apache.org/jira/browse/SPARK-7041
>             Project: Spark
>          Issue Type: Improvement
>          Components: Shuffle
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> In ExternalSorter, we may end up opening disk writers files for empty partitions; this occurs because we manually call {{open()}} after creating the writer, causing serialization and compression input streams to be created; these streams may write headers to the output stream, resulting in non-zero-length files being created for partitions that contain no records.  This is unnecessary, though, since the disk object writer will automatically open itself when the first write is performed.  Removing this eager {{open()}} call and rewriting the consumers to cope with the non-existence of empty files results in a large performance benefit for certain sparse workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org