You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang-Chi Hsieh (JIRA)" <ji...@apache.org> on 2019/01/06 08:56:00 UTC
[jira] [Commented] (SPARK-26534) Closure Cleaner Bug
[ https://issues.apache.org/jira/browse/SPARK-26534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16735130#comment-16735130 ]
Liang-Chi Hsieh commented on SPARK-26534:
-----------------------------------------
I do a test as below:
{code:java}
object Test {
val sc = new SparkContext("local", "test")
def apply(): Unit = {
val rdd = sc.parallelize(1 to 2)
val thingy: Thingy = new Thingy
val someFunc = () => {
val foo: String = "foo"
thingy.run(block = () => {
rdd.map(r => {
r + foo
}).count()
})
}
someFunc()
}
}
Test.apply()
class Thingy {
def run[R](block: () => R): R = {
block()
}
}
{code}
But it can't reproduce described error.
> Closure Cleaner Bug
> -------------------
>
> Key: SPARK-26534
> URL: https://issues.apache.org/jira/browse/SPARK-26534
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.3.1
> Reporter: sam
> Priority: Major
>
> I've found a strange combination of closures where the closure cleaner doesn't seem to be smart enough to figure out how to remove a reference that is not used. I.e. we get a `org.apache.spark.SparkException: Task not serializable` for a Task that is perfectly serializable.
>
> In the example below, the only `val` that is actually needed for the closure of the `map` is `foo`, but it tries to serialise `thingy`. What is odd is changing this code in a number of subtle ways eliminates the error, which I've tried to highlight using comments inline.
>
> {code:java}
> import org.apache.spark.sql._
> object Test {
> val sparkSession: SparkSession =
> SparkSession.builder.master("local").appName("app").getOrCreate()
> def apply(): Unit = {
> import sparkSession.implicits._
> val landedData: Dataset[String] = sparkSession.sparkContext.makeRDD(Seq("foo", "bar")).toDS()
> // thingy has to be in this outer scope to reproduce, if in someFunc, cannot reproduce
> val thingy: Thingy = new Thingy
> // If not wrapped in someFunc cannot reproduce
> val someFunc = () => {
> // If don't reference this foo inside the closer (e.g. just use identity function) cannot reproduce
> val foo: String = "foo"
> thingy.run(block = () => {
> landedData.map(r => {
> r + foo
> })
> .count()
> })
> }
> someFunc()
> }
> }
> class Thingy {
> def run[R](block: () => R): R = {
> block()
> }
> }
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org