You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yijie Shen <he...@gmail.com> on 2015/03/08 11:29:36 UTC

A way to share RDD directly using Tachyon?

Hi,

I would like to share a RDD in several Spark Applications, 
i.e, create one in application A, publish the ID somewhere and get the RDD back directly using ID in Application B.

I know I can use Tachyon just as a filesystem and s.saveAsTextFile("tachyon://localhost:19998/Y”) like this.

But get a RDD directly from tachyon instead of a file can sometimes avoid parsing the same file repeatedly in different Apps, I think.

What am I supposed to do in order to share RDDs to get a better performance?  


— 
Best Regards!
Yijie Shen

Re: A way to share RDD directly using Tachyon?

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Did you try something like:

myRDD.saveAsObjectFile("tachyon://localhost:19998/Y")
val newRDD = sc.objectFile[MyObject]("tachyon://localhost:19998/Y")


Thanks
Best Regards

On Sun, Mar 8, 2015 at 3:59 PM, Yijie Shen <he...@gmail.com>
wrote:

> Hi,
>
> I would like to share a RDD in several Spark Applications,
> i.e, create one in application A, publish the ID somewhere and get the RDD
> back directly using ID in Application B.
>
> I know I can use Tachyon just as a filesystem and
> s.saveAsTextFile("tachyon://localhost:19998/Y”) like this.
>
> But get a RDD directly from tachyon instead of a file can sometimes avoid
> parsing the same file repeatedly in different Apps, I think.
>
> What am I supposed to do in order to share RDDs to get a better
> performance?
>
>
> —
> Best Regards!
> Yijie Shen
>