You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/18 14:53:39 UTC

[jira] [Resolved] (SPARK-1529) Support DFS based shuffle in addition to Netty shuffle

     [ https://issues.apache.org/jira/browse/SPARK-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-1529.
------------------------------
    Resolution: Won't Fix

> Support DFS based shuffle in addition to Netty shuffle
> ------------------------------------------------------
>
>                 Key: SPARK-1529
>                 URL: https://issues.apache.org/jira/browse/SPARK-1529
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Patrick Wendell
>            Assignee: Kannan Rajah
>         Attachments: Spark Shuffle using HDFS.pdf
>
>
> In some environments, like with MapR, local volumes are accessed through the Hadoop filesystem interface. Shuffle is implemented by writing intermediate data to local disk and serving it to remote node using Netty as a transport mechanism. We want to provide an HDFS based shuffle such that data can be written to HDFS (instead of local disk) and served using HDFS API on the remote nodes. This could involve exposing a file system abstraction to Spark shuffle and have 2 modes of running it. In default mode, it will write to local disk and in the DFS mode, it will write to HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org