You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "neelesh.sa" <sa...@gmail.com> on 2017/05/17 07:01:38 UTC

Re: checkpointing without streaming?

Is it possible to checkpoint a RDD in one run of my application and use the
saved RDD in the next run of my application?

For example, with the following code:
val x = List(1,2,3,4)
val y = sc.parallelize(x ,2).map( c => c*2)
y.checkpoint
y.count

Is it possible to read the checkpointed RDD in another application?





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: checkpointing without streaming?

Posted by Tathagata Das <ta...@gmail.com>.
You can use *SparkContext.checkpointFile(<path to the dir containing RDD
checkpoint>)*. However note that the checkpoint file contains Java
serialized data. So if your data types change in between writing and
reading of the checkpoint file for whatever reason (Spark version change,
your code was recompiled, etc.), you may not be able to read from the
checkpoint. So use carefully :)




On Thu, May 18, 2017 at 12:18 AM, Neelesh Sambhajiche <
sambhajicheneelesh@gmail.com> wrote:

> That is exactly what we are currently doing - storing it in a csv file.
> However, as checkpointing permanently writes to disk, if we use
> checkpointing along with saving the RDD to a text file, the data gets
> stored twice on the disk. That is why I was looking for a way to read the
> checkpointed data in a different program.
>
> On Wed, May 17, 2017 at 12:59 PM, Tathagata Das <
> tathagata.das1565@gmail.com> wrote:
>
>> Why not just save the RDD to a proper file? text file, sequence, file,
>> many options. Then its standard to read it back in different program.
>>
>> On Wed, May 17, 2017 at 12:01 AM, neelesh.sa <
>> sambhajicheneelesh@gmail.com> wrote:
>>
>>> Is it possible to checkpoint a RDD in one run of my application and use
>>> the
>>> saved RDD in the next run of my application?
>>>
>>> For example, with the following code:
>>> val x = List(1,2,3,4)
>>> val y = sc.parallelize(x ,2).map( c => c*2)
>>> y.checkpoint
>>> y.count
>>>
>>> Is it possible to read the checkpointed RDD in another application?
>>>
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context: http://apache-spark-user-list.
>>> 1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>>
>>>
>>
>
>
> --
>
>
> *Regards,Neelesh SambhajicheMobile: 8058437181 <(805)%20843-7181>*
>
> [image: Inline image 1]
> *Birla Institute of Technology & Science,* Pilani
> Pilani Campus, Rajasthan 333 031, INDIA
>

Re: checkpointing without streaming?

Posted by Neelesh Sambhajiche <sa...@gmail.com>.
That is exactly what we are currently doing - storing it in a csv file.
However, as checkpointing permanently writes to disk, if we use
checkpointing along with saving the RDD to a text file, the data gets
stored twice on the disk. That is why I was looking for a way to read the
checkpointed data in a different program.

On Wed, May 17, 2017 at 12:59 PM, Tathagata Das <tathagata.das1565@gmail.com
> wrote:

> Why not just save the RDD to a proper file? text file, sequence, file,
> many options. Then its standard to read it back in different program.
>
> On Wed, May 17, 2017 at 12:01 AM, neelesh.sa <sambhajicheneelesh@gmail.com
> > wrote:
>
>> Is it possible to checkpoint a RDD in one run of my application and use
>> the
>> saved RDD in the next run of my application?
>>
>> For example, with the following code:
>> val x = List(1,2,3,4)
>> val y = sc.parallelize(x ,2).map( c => c*2)
>> y.checkpoint
>> y.count
>>
>> Is it possible to read the checkpointed RDD in another application?
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.
>> 1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>>
>>
>


-- 


*Regards,Neelesh SambhajicheMobile: 8058437181*

[image: Inline image 1]
*Birla Institute of Technology & Science,* Pilani
Pilani Campus, Rajasthan 333 031, INDIA

Re: checkpointing without streaming?

Posted by Tathagata Das <ta...@gmail.com>.
Why not just save the RDD to a proper file? text file, sequence, file, many
options. Then its standard to read it back in different program.

On Wed, May 17, 2017 at 12:01 AM, neelesh.sa <sa...@gmail.com>
wrote:

> Is it possible to checkpoint a RDD in one run of my application and use the
> saved RDD in the next run of my application?
>
> For example, with the following code:
> val x = List(1,2,3,4)
> val y = sc.parallelize(x ,2).map( c => c*2)
> y.checkpoint
> y.count
>
> Is it possible to read the checkpointed RDD in another application?
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/checkpointing-without-streaming-tp4541p28691.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>