You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2014/11/22 00:04:33 UTC

[jira] [Closed] (SPARK-4531) Cache serialized java objects instead of serialized python objects in MLlib

     [ https://issues.apache.org/jira/browse/SPARK-4531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xiangrui Meng closed SPARK-4531.
--------------------------------
       Resolution: Fixed
    Fix Version/s: 1.2.0
         Assignee: Davies Liu

> Cache serialized java objects instead of serialized python objects in MLlib
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-4531
>                 URL: https://issues.apache.org/jira/browse/SPARK-4531
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib, PySpark
>    Affects Versions: 1.2.0
>            Reporter: Davies Liu
>            Assignee: Davies Liu
>            Priority: Blocker
>             Fix For: 1.2.0
>
>
> The Pyrolite is pretty slow (comparing to the adhoc serializer in 1.1), it cause much performance regression in 1.2, because we cache the serialized Python object in JVM, deserialize them into Java object in each step.
> We should change to cache the deserialized JavaRDD instead of PythonRDD to avoid the deserialization of Pyrolite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org