You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Joseph Fourny (JIRA)" <ji...@apache.org> on 2016/06/16 03:01:05 UTC
[jira] [Comment Edited] (SPARK-15690) Fast single-node (single-process) in-memory shuffle

    [ https://issues.apache.org/jira/browse/SPARK-15690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332960#comment-15332960 ] 

Joseph Fourny edited comment on SPARK-15690 at 6/16/16 3:00 AM:
----------------------------------------------------------------

I am trying to develop single-node clusters on large servers (30+ CPU cores) with 2-3 TB or RAM. Our use cases involve small to medium size datasets, but with a huge amount of concurrent jobs (shared, multi-tenant environments). Efficiency and sub-second response times are the primary requirements. This shuffle between stages is the current bottleneck. Writing anything to disk is just a waste of time if all computations are done in the same JVM (or a small set of JVMs on the same machine). We tried using RAMFS to avoid disk I/O, but still a lot of CPU time is spent in compression and serialization. I would rather not hack my way out of this one. Is it wishful thinking to have this officially supported?


was (Author: josephfourny):
+1 on this. I am trying to develop single-node clusters on large servers (30+ CPU cores) with 2-3 TB or RAM. Our use cases involve small to medium size datasets, but with a huge amount of concurrent jobs (shared, multi-tenant environments). Efficiency and sub-second response times are the primary requirements. This shuffle between stages is the current bottleneck. Writing anything to disk is just a waste of time if all computations are done in the same JVM (or a small set of JVMs on the same machine). We tried using RAMFS to avoid disk I/O, but still a lot of CPU time is spent in compression and serialization. I would rather not hack my way out of this one. Is it wishful thinking to have this officially supported?

> Fast single-node (single-process) in-memory shuffle
> ---------------------------------------------------
>
>                 Key: SPARK-15690
>                 URL: https://issues.apache.org/jira/browse/SPARK-15690
>             Project: Spark
>          Issue Type: New Feature
>          Components: Shuffle, SQL
>            Reporter: Reynold Xin
>
> Spark's current shuffle implementation sorts all intermediate data by their partition id, and then write the data to disk. This is not a big bottleneck because the network throughput on commodity clusters tend to be low. However, an increasing number of Spark users are using the system to process data on a single-node. When in a single node operating against intermediate data that fits in memory, the existing shuffle code path can become a big bottleneck.
> The goal of this ticket is to change Spark so it can use in-memory radix sort to do data shuffling on a single node, and still gracefully fallback to disk if the data size does not fit in memory. Given the number of partitions is usually small (say less than 256), it'd require only a single pass do to the radix sort with pretty decent CPU efficiency.
> Note that there have been many in-memory shuffle attempts in the past. This ticket has a smaller scope (single-process), and aims to actually productionize this code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org