You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "sam (JIRA)" <ji...@apache.org> on 2019/01/04 15:38:00 UTC

[jira] [Created] (SPARK-26534) Closure Cleaner Bug

sam created SPARK-26534:
---------------------------

             Summary: Closure Cleaner Bug
                 Key: SPARK-26534
                 URL: https://issues.apache.org/jira/browse/SPARK-26534
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.3.1
            Reporter: sam


I've found a strange combination of closures where the closure cleaner doesn't seem to be smart enough to figure out how to remove a reference that is not used. I.e. we get a `org.apache.spark.SparkException: Task not serializable` for a Task that is perfectly serializable.  

 

In the example below, the only `val` that is actually needed for the closure of the `map` is `foo`, but it tries to serialise `thingy`.  What is odd is changing this code in a number of subtle ways eliminates the error, which I've tried to highlight using comments inline.

 
{code:java}
import org.apache.spark.sql._

object Test {
  val sparkSession: SparkSession =
    SparkSession.builder.master("local").appName("app").getOrCreate()

  def apply(): Unit = {
    import sparkSession.implicits._

    val landedData: Dataset[String] = sparkSession.sparkContext.makeRDD(Seq("foo", "bar")).toDS()

    // thingy has to be in this outer scope to reproduce, if in someFunc, cannot reproduce
    val thingy: Thingy = new Thingy

    // If not wrapped in someFunc cannot reproduce
    val someFunc = () => {
      // If don't reference this foo inside the closer (e.g. just use identity function) cannot reproduce
      val foo: String = "foo"

      thingy.run(block = () => {
        landedData.map(r => {
          r + foo
        })
        .count()
      })
    }

    someFunc()

  }
}

class Thingy {
  def run[R](block: () => R): R = {
    block()
  }
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org