You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gabriel Huang (JIRA)" <ji...@apache.org> on 2016/11/08 18:03:58 UTC
[jira] [Created] (SPARK-18361) Expose RDD localCheckpoint in
PySpark
Gabriel Huang created SPARK-18361:
-------------------------------------
Summary: Expose RDD localCheckpoint in PySpark
Key: SPARK-18361
URL: https://issues.apache.org/jira/browse/SPARK-18361
Project: Spark
Issue Type: New Feature
Components: PySpark
Reporter: Gabriel Huang
As of today, I could not access rdd.localCheckpoint() in pyspark.
This is an important issue for machine learning people, as we often have to iterate algorithms and perform operations like joins in each iteration.
If the lineage is not truncated, the memory usage, the lineage, and computation time explode. rdd.localCheckpoint() seems like the most straightforward way of truncating the lineage, but the python API does not expose it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org