You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "holdenk (JIRA)" <ji...@apache.org> on 2016/01/23 08:39:40 UTC

[jira] [Closed] (SPARK-12151) Improve PySpark MLLib prediction performance when using pickled vectors

     [ https://issues.apache.org/jira/browse/SPARK-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

holdenk closed SPARK-12151.
---------------------------
    Resolution: Not A Problem

Checked the models, all of the ones not doing these were doing there prediction in python rather than java so not relevant.

> Improve PySpark MLLib prediction performance when using pickled vectors
> -----------------------------------------------------------------------
>
>                 Key: SPARK-12151
>                 URL: https://issues.apache.org/jira/browse/SPARK-12151
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>            Reporter: holdenk
>            Priority: Minor
>
> In a number of places inside of PySpark MLLib when calling predict on an RDD we map the Python prediction function over the RDD, instead we could convert the RDD to an RDD of pickled Vectors and then use the Java prediction function. This would be useful for models which have optimized predicting on batches of objects (e.g. by broadcasting the relevant parts of the model or similar).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org