You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiangrui Meng (JIRA)" <ji...@apache.org> on 2015/05/01 06:12:06 UTC

[jira] [Commented] (SPARK-6288) Pyrolite calls hashCode to cache previously serialized objects

    [ https://issues.apache.org/jira/browse/SPARK-6288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522743#comment-14522743 ] 

Xiangrui Meng commented on SPARK-6288:
--------------------------------------

[~joshrosen] Are we going to upgrade Pyrolite in 1.4? It seems to bring significant performance improvement by disabling `useMemo`.

> Pyrolite calls hashCode to cache previously serialized objects
> --------------------------------------------------------------
>
>                 Key: SPARK-6288
>                 URL: https://issues.apache.org/jira/browse/SPARK-6288
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 1.0.2, 1.1.1, 1.2.1, 1.3.0
>            Reporter: Xiangrui Meng
>            Assignee: Josh Rosen
>         Attachments: Screen Shot 2015-03-13 at 10.45.35 AM.png
>
>
> https://github.com/irmen/Pyrolite/blob/v2.0/java/src/net/razorvine/pickle/Pickler.java#L140
> This operation could be quite expensive, compared to serializing the object directly, because hashCode usually needs to access all data stored in the object. Maybe we should disable this feature by default.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org