You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Aaron Staple (JIRA)" <ji...@apache.org> on 2014/09/25 23:38:33 UTC

[jira] [Resolved] (SPARK-3488) cache deserialized python RDDs before iterative learning

     [ https://issues.apache.org/jira/browse/SPARK-3488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aaron Staple resolved SPARK-3488.
---------------------------------
    Resolution: Won't Fix

> cache deserialized python RDDs before iterative learning
> --------------------------------------------------------
>
>                 Key: SPARK-3488
>                 URL: https://issues.apache.org/jira/browse/SPARK-3488
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>            Reporter: Aaron Staple
>
> When running an iterative learning algorithm, it makes sense that the input RDD be cached for improved performance. When learning is applied to a python RDD, currently the python RDD is always cached, then in scala that cached RDD is mapped to an uncached deserialized RDD, and the uncached RDD is passed to the learning algorithm. Instead the deserialized RDD should be cached.
> This was originally discussed here:
> https://github.com/apache/spark/pull/2347#issuecomment-55181535



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org