You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by hi...@thomsonreuters.com on 2013/11/15 17:38:28 UTC
objectFile throws ClassNotFoundException for custom class objects
I have an RDD that contains instances of a class I wrote. I was able to write it to HDFS using saveAsObjectFile, but when I try to read the same file using sc.objectFile, it throws a ClassNotFoundException saying it can't find the class. I wrote a simple program to reproduce the problem:
val list = mutable.MutableList[MyClass]()
for (x <- 1 to 10)
{
val obj = new MyClass()
list += obj
}
val rdd = sc.parallelize(list)
rdd.saveAsObjectFile(outFile)
val objs:RDD[MyClass] = sc.objectFile(outFile) // <== This throws ClassNotFoundException
MyClass is just a simple class that extends Serializable (with no member fields).
Here is the stack trace I get:
java.lang.ClassNotFoundException: com.mycom.MyClass
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:623)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1610)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1661)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1342)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at org.apache.spark.util.Utils$.deserialize(Utils.scala:60)
at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:474)
at org.apache.spark.SparkContext$$anonfun$objectFile$1.apply(SparkContext.scala:474)
at scala.collection.Iterator$$anon$21.hasNext(Iterator.scala:440)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:679)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:677)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)
at org.apache.spark.scheduler.ResultTask.run(ResultTask.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:158)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Is there any workaround for this?
Thanks,
Hiroko