You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Jahagirdar, Madhu" <ma...@philips.com> on 2014/07/08 04:15:41 UTC

Spark RDD Disk Persistance

Should i use Disk based Persistance for RDD's and if the machine goes down during the program execution, next time when i rerun the program would the data be intact and not lost ?

Regards,
Madhu Jahagirdar

________________________________
The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.

答复: Spark RDD Disk Persistance

Posted by "Lizhengbing (bing, BIPA)" <zh...@huawei.com>.
You might  let your data stored in tachyon

发件人: Jahagirdar, Madhu [mailto:madhu.jahagirdar@philips.com]
发送时间: 2014年7月8日 10:16
收件人: user@spark.apache.org
主题: Spark RDD Disk Persistance

Should i use Disk based Persistance for RDD's and if the machine goes down during the program execution, next time when i rerun the program would the data be intact and not lost ?

Regards,
Madhu Jahagirdar

________________________________
The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.

RE: Spark RDD Disk Persistance

Posted by "Shao, Saisai" <sa...@intel.com>.
Hi Madhu,

I don't think you can reuse the persistent RDD the next time you run the program, because the folder for RDD materialization will be changed, also Spark will lose the information of how to retrieve the previous persisted RDD.

AFAIK Spark has fault tolerance mechanism, node failure will lead to recomputation of the affected partitions.

Thanks
Jerry

From: Jahagirdar, Madhu [mailto:madhu.jahagirdar@philips.com]
Sent: Tuesday, July 08, 2014 10:16 AM
To: user@spark.apache.org
Subject: Spark RDD Disk Persistance

Should i use Disk based Persistance for RDD's and if the machine goes down during the program execution, next time when i rerun the program would the data be intact and not lost ?

Regards,
Madhu Jahagirdar

________________________________
The information contained in this message may be confidential and legally protected under applicable law. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, forwarding, dissemination, or reproduction of this message is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message.