You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Or (JIRA)" <ji...@apache.org> on 2016/11/21 21:11:59 UTC

[jira] [Resolved] (SPARK-18361) Expose RDD localCheckpoint in PySpark

     [ https://issues.apache.org/jira/browse/SPARK-18361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Or resolved SPARK-18361.
-------------------------------
          Resolution: Fixed
       Fix Version/s: 2.1.0
    Target Version/s: 2.1.0

> Expose RDD localCheckpoint in PySpark
> -------------------------------------
>
>                 Key: SPARK-18361
>                 URL: https://issues.apache.org/jira/browse/SPARK-18361
>             Project: Spark
>          Issue Type: New Feature
>          Components: PySpark
>            Reporter: Gabriel Huang
>            Assignee: Gabriel Huang
>             Fix For: 2.1.0
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> As of today, I could not access rdd.localCheckpoint() in pyspark.
> This is an important issue for machine learning people, as we often have to iterate algorithms and perform operations like joins in each iteration. 
> If the lineage is not truncated, the memory usage, the lineage, and computation time explode. rdd.localCheckpoint()  seems like the most straightforward way of truncating the lineage, but the python API does not expose it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org