You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Kan Zhang (JIRA)" <ji...@apache.org> on 2014/05/30 05:52:02 UTC

[jira] [Commented] (SPARK-1866) Closure cleaner does not null shadowed fields when outer scope is referenced

    [ https://issues.apache.org/jira/browse/SPARK-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14013264#comment-14013264 ] 

Kan Zhang commented on SPARK-1866:
----------------------------------

Unfortunately this type of error will pop up whenever a closure references user objects (any objects other than nested closure objects) that are not serializable. Our current approach is we don't clone (or nulling) user objects since we can't be sure it is safe to do so (see commit f346e64637fa4f9bd95fcc966caa496bea5feca0). 

Spark shell (REPL) synthesizes a class for each line. In this case, the class for the closure line imports {{instances}} as a field (since the parser thinks it is referenced by this line) and the corresponding line object is referenced by the closure via {{x}}.                     

My take on this is to advise users to avoid name collisions as a workaround.

> Closure cleaner does not null shadowed fields when outer scope is referenced
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-1866
>                 URL: https://issues.apache.org/jira/browse/SPARK-1866
>             Project: Spark
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Aaron Davidson
>            Assignee: Kan Zhang
>            Priority: Critical
>             Fix For: 1.1.0, 1.0.1
>
>
> Take the following example:
> {code}
> val x = 5
> val instances = new org.apache.hadoop.fs.Path("/") /* non-serializable */
> sc.parallelize(0 until 10).map { _ =>
>   val instances = 3
>   (instances, x)
> }.collect
> {code}
> This produces a "java.io.NotSerializableException: org.apache.hadoop.fs.Path", despite the fact that the outer instances is not actually used within the closure. If you change the name of the outer variable instances to something else, the code executes correctly, indicating that it is the fact that the two variables share a name that causes the issue.
> Additionally, if the outer scope is not used (i.e., we do not reference "x" in the above example), the issue does not appear.



--
This message was sent by Atlassian JIRA
(v6.2#6252)