You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:35:08 UTC
[jira] [Resolved] (SPARK-10569) Kryo serialization fails on sortByKey operation on registered RDDs

     [ https://issues.apache.org/jira/browse/SPARK-10569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-10569.
----------------------------------
    Resolution: Incomplete

> Kryo serialization fails on sortByKey operation on registered RDDs
> ------------------------------------------------------------------
>
>                 Key: SPARK-10569
>                 URL: https://issues.apache.org/jira/browse/SPARK-10569
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Glenn Strycker
>            Priority: Major
>              Labels: bulk-closed
>
> I have code that creates RDDs, persists, checkpoints, and materializes (using count()), and these RDDs are serialized with Kryo, using the standard code.
> I have "kryo.setRegistrationRequired(true)", which is useful for debugging my code to find out which RDDs I haven't registered.  Unfortunately, having this setting turned on does not seem compatible with Spark internals.
> When my code encounters a sortByKey, it fails, giving my an error:
> {noformat}
> User class threw exception: Job aborted due to stage failure: Task 1 in stage 25.0 failed 40 times, most recent failure: Lost task 1.39 in stage 25.0 (TID 232, <server name>): java.lang.IllegalArgumentException: Class is not registered: scala.Tuple3[]
> Note: To register this class use: kryo.register(scala.Tuple3[].class);
> at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:442)
> at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:79)
> at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:472)
> at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:565)
> at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:162)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
> {noformat}
> Why is scala.Tuple3[] not registered?  I attempted to register it using various forms of "kryo.register(scala.Tuple3[].class)", but this didn't seem to work.
> I tried making sure that both my keys and values of my RDD are both registered in addition to the entire RDD.  I have lines like this:
> {code}
>     kryo.register(classOf[(((Any,Any),(Any,Any)),((Any,Any),Any))])
>     kryo.register(classOf[((Any,Any),(Any,Any))])
>     kryo.register(classOf[((Any, Any),Any)])
> {code}
> Again, my program is only dying on the sortByKey command.  If I get rid of it, the code proceeds just fine, but I need this for certain operations (assigning indices based on sort order).
> FYI, it is failing of RDDs of all types... I verified this in several places in my program.
> {code}
> myRDD.sortByKey(ascending=true).collect().foreach(println)
> {code}
> doesn't work (gives the error above), but
> {code}
> myRDD.collect().foreach(println)
> {code}
> works just fine.  My code also works if I turn off "kryo.setRegistrationRequired(true)".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org