You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/10/19 07:27:00 UTC

[jira] [Resolved] (SPARK-22288) Tricky interaction between closure-serialization and inheritance results in confusing failure

     [ https://issues.apache.org/jira/browse/SPARK-22288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-22288.
-------------------------------
    Resolution: Not A Problem

OK, fine to note here, but I think this is just a Java issue. You're right re: Kryo.

> Tricky interaction between closure-serialization and inheritance results in confusing failure
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-22288
>                 URL: https://issues.apache.org/jira/browse/SPARK-22288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.2.0
>            Reporter: Ryan Williams
>            Priority: Minor
>
> Documenting this since I've run into it a few times; [full repro / discussion here|https://github.com/ryan-williams/spark-bugs/tree/serde].
> Given 3 possible super-classes:
> {code}
> class Super1(n: Int)
> class Super2(n: Int) extends Serializable
> class Super3
> {code}
> A subclass that passes a closure to an RDD operation (e.g. {{map}} or {{filter}}), where the closure references one of the subclass's fields, will throw an {{java.io.InvalidClassException: …; no valid constructor}} exception when the subclass extends {{Super1}} but not {{Super2}} or {{Super3}}. Referencing method-local variables (instead of fields) is fine in all cases:
> {code}
> class App extends Super1(4) with Serializable {
>   val s = "abc"
>   def run(): Unit = {
>     val sc = new SparkContext(new SparkConf().set("spark.master", "local[4]").set("spark.app.name", "serde-test"))
>     try {
>       sc
>         .parallelize(1 to 10)
>         .filter(Main.fn(_, s))  // danger! closure references `s`, crash ensues
>         .collect()              // driver stack-trace points here
>     } finally {
>       sc.stop()
>     }
>   }
> }
> object App {
>   def main(args: Array[String]): Unit = { new App().run() }
>   def fn(i: Int, s: String): Boolean = i % 2 == 0
> }
> {code}
> The task-failure stack trace looks like:
> {code}
> java.io.InvalidClassException: com.MyClass; no valid constructor
> 	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
> 	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:790)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1782)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> {code}
> and a driver stack-trace will point to the first line that initiates a Spark job that exercises the closure/RDD-operation in question.
> Not sure how much this should be considered a problem with Spark's closure-serialization logic vs. Java serialization, but maybe if the former gets looked at or improved (e.g. with 2.12 support), this kind of interaction can be improved upon.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org