You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by swetha <sw...@gmail.com> on 2015/07/22 19:56:46 UTC

How to keep RDDs in memory between two different batch jobs?

Hi,

We have a requirement wherein we need to keep RDDs in memory between Spark
batch processing that happens every one hour. The idea here is to have RDDs
that have active user sessions in memory between two jobs so that once a job
processing is  done and another job is run after an hour the RDDs with
active sessions are still available for joining with those in the current
job. So, what do we need to keep the data in memory in between two batch
jobs? Can we use Tachyon?

Thanks,
Swetha



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-RDDs-in-memory-between-two-different-batch-jobs-tp23957.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to keep RDDs in memory between two different batch jobs?

Posted by harirajaram <ha...@gmail.com>.

I was about say whatever the previous post said,so +1 to the previous
post,from my understanding (gut feeling) of your requirement it very easy to
do this with spark-job-server.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-RDDs-in-memory-between-two-different-batch-jobs-tp23957p23960.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to keep RDDs in memory between two different batch jobs?

Posted by ericacm <er...@gmail.com>.

Actually, I should clarify - Tachyon is a way to keep your data in RAM, but
it's not exactly the same as keeping it cached in Spark.  Spark Job Server
is a way to keep it cached in Spark.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-RDDs-in-memory-between-two-different-batch-jobs-tp23957p23959.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to keep RDDs in memory between two different batch jobs?

Posted by ericacm <er...@gmail.com>.

Tachyon is one way.  Also check out the  Spark Job Server
<https://github.com/spark-jobserver/spark-jobserver>  .



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-RDDs-in-memory-between-two-different-batch-jobs-tp23957p23958.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: How to keep RDDs in memory between two different batch jobs?

Posted by Haoyuan Li <ha...@gmail.com>.

Yes. Tachyon can handle this well: http://tachyon-project.org/

Best,

Haoyuan

On Wed, Jul 22, 2015 at 10:56 AM, swetha <sw...@gmail.com> wrote:

> Hi,
>
> We have a requirement wherein we need to keep RDDs in memory between Spark
> batch processing that happens every one hour. The idea here is to have RDDs
> that have active user sessions in memory between two jobs so that once a
> job
> processing is  done and another job is run after an hour the RDDs with
> active sessions are still available for joining with those in the current
> job. So, what do we need to keep the data in memory in between two batch
> jobs? Can we use Tachyon?
>
> Thanks,
> Swetha
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-RDDs-in-memory-between-two-different-batch-jobs-tp23957.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>


-- 
Haoyuan Li
CEO, Tachyon Nexus <http://www.tachyonnexus.com/>