You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2015/04/22 01:07:58 UTC
[jira] [Created] (SPARK-7041) Avoid writing empty files in
ExternalSorter
Josh Rosen created SPARK-7041:
---------------------------------
Summary: Avoid writing empty files in ExternalSorter
Key: SPARK-7041
URL: https://issues.apache.org/jira/browse/SPARK-7041
Project: Spark
Issue Type: Improvement
Components: Shuffle
Reporter: Josh Rosen
Assignee: Josh Rosen
In ExternalSorter, we may end up opening disk writers files for empty partitions; this occurs because we manually call {{open()}} after creating the writer, causing serialization and compression input streams to be created; these streams may write headers to the output stream, resulting in non-zero-length files being created for partitions that contain no records. This is unnecessary, though, since the disk object writer will automatically open itself when the first write is performed. Removing this eager {{open()}} call and rewriting the consumers to cope with the non-existence of empty files results in a large performance benefit for certain sparse workloads when using sort-based shuffle.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org