You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Boris Litvak <bo...@skf.com> on 2020/06/04 07:11:11 UTC

[Spark RDD] Persisting Spark RDDs across spark contexts/applications - options

I would like to cache Apache Spark RDDs and share them between Spark applications.

Alluxio (Tachyon), Redis & Ignite all offer such capabilities.

For instance, see Ignite's proposal:
[cid:image003.png@01D63A58.74971600]

Are there any comparison studies on performance/maintenance burden/installation experience of the above frameworks?

If you have you had any experience using spark with any of these technologies, please share.

Thanks, Boris


Re: [Spark RDD] Persisting Spark RDDs across spark contexts/applications - options

Posted by Bin Fan <fa...@gmail.com>.
Hi Boris,

This is actually why Alluxio (by-then Tachyon) was created initially in
AMPLab.
Checkout the documentation
https://docs.alluxio.io/os/user/stable/en/compute/Spark.html on persisting
RDD/Dataframes to Alluxio.

some example
https://www.alluxio.io/resources/case-studies/making-the-impossible-possible-with-alluxio-accelerate-spark-jobs-from-hours-to-seconds/
https://www.alluxio.io/blog/tencent-case-study-delivering-customized-news-to-over-100-million-users-per-month-with-alluxio/
<https://www.alluxio.io/resources/case-studies/making-the-impossible-possible-with-alluxio-accelerate-spark-jobs-from-hours-to-seconds/>
Happy to provide you more info

- Bin

On Thu, Jun 4, 2020 at 12:26 AM Boris Litvak <bo...@skf.com> wrote:

> I would like to cache Apache Spark RDDs and share them between Spark
> applications.
>
> Alluxio (Tachyon), Redis & Ignite all offer such capabilities.
>
> For instance, see Ignite's proposal:
>
> Are there any comparison studies on performance/maintenance
> burden/installation experience of the above frameworks?
>
> If you have you had any experience using spark with any of these
> technologies, please share.
>
> Thanks, Boris
>
>
>