You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Zane Hu (JIRA)" <ji...@apache.org> on 2014/12/19 05:10:13 UTC

[jira] [Created] (SPARK-4895) Support a shared RDD store among different Spark contexts

Zane Hu created SPARK-4895:
------------------------------

             Summary: Support a shared RDD store among different Spark contexts
                 Key: SPARK-4895
                 URL: https://issues.apache.org/jira/browse/SPARK-4895
             Project: Spark
          Issue Type: New Feature
            Reporter: Zane Hu


It seems a valid requirement to allow jobs from different Spark contexts to share RDDs. It would be limited if we only allow sharing RDDs within a SparkContext, as in Ooyala (SPARK-818). A more generic way for collaboration among jobs from different Spark contexts is to support a shared RDD store managed by a RDD store master and workers running in separate processes from SparkContext and executor JVMs. This shared RDD store doesn't do any RDD transformations, but accepts requests from jobs of different Spark contexts to read and write shared RDDs in memory or on disks on distributed machines, and manages the life cycle of these RDDs.

Tachyon might be used for sharing data. But I think Tachyon is more designed as an in-memory distributed file system for any applications, not only for RDDs and Spark.

If people agree, I may draft out a design document for further discussions.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org