You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mridul Muralidharan (JIRA)" <ji...@apache.org> on 2014/05/19 06:22:37 UTC
[jira] [Commented] (SPARK-1855) Provide memory-and-local-disk RDD
checkpointing
[ https://issues.apache.org/jira/browse/SPARK-1855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14001377#comment-14001377 ]
Mridul Muralidharan commented on SPARK-1855:
--------------------------------------------
Did not realize that mail replies to JIRA mails did not get mirrored to JIRA ! Replicating my mail here :
– cut and paste –
We don't have 3x replication in spark :-)
And if we use replicated storagelevel, while decreasing odds of failure, it does not eliminate it (since we are not doing a great job with replication anyway from fault tolerance point of view).
Also it does take a nontrivial performance hit with replicated levels.
Regards,
Mridul
> Provide memory-and-local-disk RDD checkpointing
> -----------------------------------------------
>
> Key: SPARK-1855
> URL: https://issues.apache.org/jira/browse/SPARK-1855
> Project: Spark
> Issue Type: New Feature
> Components: MLlib, Spark Core
> Affects Versions: 1.0.0
> Reporter: Xiangrui Meng
>
> Checkpointing is used to cut long lineage while maintaining fault tolerance. The current implementation is HDFS-based. Using the BlockRDD we can create in-memory-and-local-disk (with replication) checkpoints that are not as reliable as HDFS-based solution but faster.
> It can help applications that require many iterations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)