You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Roch Denis <rd...@exostatic.com> on 2014/07/18 17:39:44 UTC

Python: saving/reloading RDD

Hello,

Just to make sure I correctly read the doc and the forums. It's my
understanding that currently in python with Spark 1.0.1 there is no way to
save my RDD to disk that I can just reload. The hadoop RDD are not yet
present in Python.

Is that correct? I just want to make sure that's the case before I write a
workaround.

Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-saving-reloading-RDD-tp10172.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python: saving/reloading RDD

Posted by Shannon Quinn <sq...@gatech.edu>.

+1, had to learn this the hard way when some of my objects were written 
as pointers, rather than translated correctly to strings :)

On 7/18/14, 11:52 AM, Xiangrui Meng wrote:
> You can save RDDs to text files using RDD.saveAsTextFile and load it back using sc.textFile. But make sure the record to string conversion is correctly implemented if the type is not primitive and you have the parser to load them back. -Xiangrui
>
>> On Jul 18, 2014, at 8:39 AM, Roch Denis <rd...@exostatic.com> wrote:
>>
>> Hello,
>>
>> Just to make sure I correctly read the doc and the forums. It's my
>> understanding that currently in python with Spark 1.0.1 there is no way to
>> save my RDD to disk that I can just reload. The hadoop RDD are not yet
>> present in Python.
>>
>> Is that correct? I just want to make sure that's the case before I write a
>> workaround.
>>
>> Thanks!
>>
>>
>>
>> --
>> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-saving-reloading-RDD-tp10172.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python: saving/reloading RDD

Posted by Roch Denis <rd...@exostatic.com>.

Yeah but I would still have to do a map pass with an ast.litteral_eval() for
each line, correct?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-saving-reloading-RDD-tp10172p10179.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Python: saving/reloading RDD

Posted by Xiangrui Meng <me...@gmail.com>.

You can save RDDs to text files using RDD.saveAsTextFile and load it back using sc.textFile. But make sure the record to string conversion is correctly implemented if the type is not primitive and you have the parser to load them back. -Xiangrui

> On Jul 18, 2014, at 8:39 AM, Roch Denis <rd...@exostatic.com> wrote:
> 
> Hello,
> 
> Just to make sure I correctly read the doc and the forums. It's my
> understanding that currently in python with Spark 1.0.1 there is no way to
> save my RDD to disk that I can just reload. The hadoop RDD are not yet
> present in Python.
> 
> Is that correct? I just want to make sure that's the case before I write a
> workaround.
> 
> Thanks!
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Python-saving-reloading-RDD-tp10172.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.