You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabriel Huang (JIRA)" <ji...@apache.org> on 2016/11/08 18:03:58 UTC

[jira] [Created] (SPARK-18361) Expose RDD localCheckpoint in PySpark

Gabriel Huang created SPARK-18361:
-------------------------------------

             Summary: Expose RDD localCheckpoint in PySpark
                 Key: SPARK-18361
                 URL: https://issues.apache.org/jira/browse/SPARK-18361
             Project: Spark
          Issue Type: New Feature
          Components: PySpark
            Reporter: Gabriel Huang


As of today, I could not access rdd.localCheckpoint() in pyspark.

This is an important issue for machine learning people, as we often have to iterate algorithms and perform operations like joins in each iteration. 

If the lineage is not truncated, the memory usage, the lineage, and computation time explode. rdd.localCheckpoint()  seems like the most straightforward way of truncating the lineage, but the python API does not expose it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org