You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/10/19 07:27:00 UTC
[jira] [Resolved] (SPARK-22288) Tricky interaction between
closure-serialization and inheritance results in confusing failure
[ https://issues.apache.org/jira/browse/SPARK-22288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-22288.
-------------------------------
Resolution: Not A Problem
OK, fine to note here, but I think this is just a Java issue. You're right re: Kryo.
> Tricky interaction between closure-serialization and inheritance results in confusing failure
> ---------------------------------------------------------------------------------------------
>
> Key: SPARK-22288
> URL: https://issues.apache.org/jira/browse/SPARK-22288
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 2.2.0
> Reporter: Ryan Williams
> Priority: Minor
>
> Documenting this since I've run into it a few times; [full repro / discussion here|https://github.com/ryan-williams/spark-bugs/tree/serde].
> Given 3 possible super-classes:
> {code}
> class Super1(n: Int)
> class Super2(n: Int) extends Serializable
> class Super3
> {code}
> A subclass that passes a closure to an RDD operation (e.g. {{map}} or {{filter}}), where the closure references one of the subclass's fields, will throw an {{java.io.InvalidClassException: …; no valid constructor}} exception when the subclass extends {{Super1}} but not {{Super2}} or {{Super3}}. Referencing method-local variables (instead of fields) is fine in all cases:
> {code}
> class App extends Super1(4) with Serializable {
> val s = "abc"
> def run(): Unit = {
> val sc = new SparkContext(new SparkConf().set("spark.master", "local[4]").set("spark.app.name", "serde-test"))
> try {
> sc
> .parallelize(1 to 10)
> .filter(Main.fn(_, s)) // danger! closure references `s`, crash ensues
> .collect() // driver stack-trace points here
> } finally {
> sc.stop()
> }
> }
> }
> object App {
> def main(args: Array[String]): Unit = { new App().run() }
> def fn(i: Int, s: String): Boolean = i % 2 == 0
> }
> {code}
> The task-failure stack trace looks like:
> {code}
> java.io.InvalidClassException: com.MyClass; no valid constructor
> at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:150)
> at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:790)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1782)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
> {code}
> and a driver stack-trace will point to the first line that initiates a Spark job that exercises the closure/RDD-operation in question.
> Not sure how much this should be considered a problem with Spark's closure-serialization logic vs. Java serialization, but maybe if the former gets looked at or improved (e.g. with 2.12 support), this kind of interaction can be improved upon.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org