You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2017/10/26 20:46:00 UTC

[jira] [Resolved] (SPARK-22328) ClosureCleaner misses referenced superclass fields, gives them null values

     [ https://issues.apache.org/jira/browse/SPARK-22328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-22328.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.0
                   2.2.1

Issue resolved by pull request 19556
[https://github.com/apache/spark/pull/19556]

> ClosureCleaner misses referenced superclass fields, gives them null values
> --------------------------------------------------------------------------
>
>                 Key: SPARK-22328
>                 URL: https://issues.apache.org/jira/browse/SPARK-22328
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Ryan Williams
>             Fix For: 2.2.1, 2.3.0
>
>
> [Runnable repro here|https://github.com/ryan-williams/spark-bugs/tree/closure]:
> Superclass with some fields:
> {code}
> abstract class App extends Serializable {
>   // SparkContext stub
>   @transient lazy val sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local[4]").set("spark.ui.showConsoleProgress", "false"))
>   // These fields get missed by the ClosureCleaner in some situations
>   val n1 = 111
>   val s1 = "aaa"
>   // Simple scaffolding to exercise passing a closure to RDD.foreach in subclasses
>   def rdd = sc.parallelize(1 to 1)
>   def run(name: String): Unit = {
>     print(s"$name:\t")
>     body()
>     sc.stop()
>   }
>   def body(): Unit
> }
> {code}
> Running a simple Spark job with various instantiations of this class:
> {code}
> object Main {
>   /** [[App]]s generated this way will not correctly detect references to [[App.n1]] in Spark closures */
>   val fn = () ⇒ new App {
>     val n2 = 222
>     val s2 = "bbb"
>     def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") }
>   }
>   /** Doesn't serialize closures correctly */
>   val app1 = fn()
>   /** Works fine */
>   val app2 =
>     new App {
>       val n2 = 222
>       val s2 = "bbb"
>       def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") }
>     }
>   /** [[App]]s created this way also work fine */
>   def makeApp(): App =
>     new App {
>       val n2 = 222
>       val s2 = "bbb"
>       def body(): Unit = rdd.foreach { _ ⇒ println(s"$n1, $n2, $s1, $s2") }
>     }
>   val app3 = makeApp()  // ok
>   val fn2 = () ⇒ makeApp()  // ok
>   def main(args: Array[String]): Unit = {
>     fn().run("fn")    // bad: n1 → 0, s1 → null
>     app1.run("app1")  // bad: n1 → 0, s1 → null
>     app2.run("app2")  // ok
>     app3.run("app3")  // ok
>     fn2().run("fn2")  // ok
>   }
> }
> {code}
> Build + Run:
> {code}
> $ sbt run
> …
> fn:	0, 222, null, bbb
> app1:	0, 222, null, bbb
> app2:	111, 222, aaa, bbb
> app3:	111, 222, aaa, bbb
> fn2:	111, 222, aaa, bbb
> {code}
> The first two versions have {{0}} and {{null}}, resp., for the {{A.n1}} and {{A.s1}} fields.
> Something about this syntax causes the problem:
> {code}
> () => new App { … }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org