You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "William Benton (JIRA)" <ji...@apache.org> on 2014/05/06 17:15:28 UTC

[jira] [Commented] (SPARK-729) Closures not always serialized at capture time

    [ https://issues.apache.org/jira/browse/SPARK-729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990729#comment-13990729 ] 

William Benton commented on SPARK-729:
--------------------------------------

So the straightforward approach (immediately serializing and deserializing a closure in ClosureCleaner.clean) causes a couple of problems in Spark 1.0 that weren't obvious from the 0.9.1 test suite (that is, they existed but weren't exposed by the suite).  

Most notably, if we serialize closures immediately, we might replace the only reference to a broadcast variable object with a serialized copy of that object, the original could be cleaned up by ContextCleaner before the closure has a chance to execute.  

I've been looking at ways to solve this but thought I'd provide a status update here in the meantime.

> Closures not always serialized at capture time
> ----------------------------------------------
>
>                 Key: SPARK-729
>                 URL: https://issues.apache.org/jira/browse/SPARK-729
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.7.1
>            Reporter: Matei Zaharia
>            Assignee: William Benton
>
> As seen in https://groups.google.com/forum/?fromgroups=#!topic/spark-users/8pTchwuP2Kk and its corresponding fix on https://github.com/mesos/spark/commit/adba773fab6294b5764d101d248815a7d3cb3558, it is possible for a closure referencing a var to see the latest version of that var, instead of the version that was there when the closure was passed to Spark. This is not good when failures or recomputations happen. We need to serialize the closures on capture if possible, perhaps as part of ClosureCleaner.clean.



--
This message was sent by Atlassian JIRA
(v6.2#6252)