You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2014/10/27 07:55:33 UTC

[jira] [Commented] (SPARK-4094) checkpoint should still be available after rdd actions

    [ https://issues.apache.org/jira/browse/SPARK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14184897#comment-14184897 ] 

Apache Spark commented on SPARK-4094:
-------------------------------------

User 'liyezhang556520' has created a pull request for this issue:
https://github.com/apache/spark/pull/2956

> checkpoint should still be available after rdd actions
> ------------------------------------------------------
>
>                 Key: SPARK-4094
>                 URL: https://issues.apache.org/jira/browse/SPARK-4094
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.1.0
>            Reporter: Zhang, Liye
>
> rdd.checkpoint() must be called before any actions on this rdd, if there is any other actions before, checkpoint would never succeed. For the following code as example:
> *rdd = sc.makeRDD(...)*
> *rdd.collect()*
> *rdd.checkpoint()*
> *rdd.count()*
> This rdd would never be checkpointed. But this would not happen for RDD cache. RDD cache would always make successfully before rdd actions no matter whether there is any actions before cache().
> So rdd.checkpoint() should also be with the same behavior with rdd.cache().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org