You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/04/22 01:07:58 UTC

[jira] [Created] (SPARK-7041) Avoid writing empty files in ExternalSorter

Josh Rosen created SPARK-7041:
---------------------------------

             Summary: Avoid writing empty files in ExternalSorter
                 Key: SPARK-7041
                 URL: https://issues.apache.org/jira/browse/SPARK-7041
             Project: Spark
          Issue Type: Improvement
          Components: Shuffle
            Reporter: Josh Rosen
            Assignee: Josh Rosen


In ExternalSorter, we may end up opening disk writers files for empty partitions; this occurs because we manually call {{open()}} after creating the writer, causing serialization and compression input streams to be created; these streams may write headers to the output stream, resulting in non-zero-length files being created for partitions that contain no records.  This is unnecessary, though, since the disk object writer will automatically open itself when the first write is performed.  Removing this eager {{open()}} call and rewriting the consumers to cope with the non-existence of empty files results in a large performance benefit for certain sparse workloads when using sort-based shuffle.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org