You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Christian Kumpe <ch...@kumpe.de> on 2011/03/03 10:55:54 UTC

Bug or Problem with JavaSerialization

Hi,

when reading a Serializable-type which is contained inside the job's
jar-file I get an Exception.

I can reproduce the Exception with a fresh unpacked
hadoop-0.20.2.tar.gz and hadoop-0.21.0.tar.gz and tested under Sun's JDK
1.6.0_24.

I've attached a short example to reproduce the bug. I've compiled it and
packed it into a jar.

When running from command line I get the following output for
hadoop-0.21.0:

$ /tmp/hadoop-0.21.0/bin/hadoop jar demonstration.jar DemonstrationForPossibleBugInJavaSerialization
My ClassLoaderjava.net.URLClassLoader@1cbfe9d
JavaSerialization's ClassLoader: sun.misc.Launcher$AppClassLoader@cac268
11/03/03 10:40:42 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000
Exception in thread "main" java.io.IOException: java.lang.ClassNotFoundException: DemonstrationForPossibleBugInJavaSerialization$MyValueClass
	at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:62)
	at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:43)
	at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1886)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1859)
	at DemonstrationForPossibleBugInJavaSerialization.main(DemonstrationForPossibleBugInJavaSerialization.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

And the same for hadoop-0.20.2:

$ /tmp/hadoop-0.20.2/bin/hadoop jar demonstration.jar DemonstrationForPossibleBugInJavaSerialization
My ClassLoaderjava.net.URLClassLoader@e45076
JavaSerialization's ClassLoader: sun.misc.Launcher$AppClassLoader@1a16869
Exception in thread "main" java.io.IOException: java.lang.ClassNotFoundException: DemonstrationForPossibleBugInJavaSerialization$MyValueClass
	at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:55)
	at org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer.deserialize(JavaSerialization.java:36)
	at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:1817)
	at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:1790)
	at DemonstrationForPossibleBugInJavaSerialization.main(DemonstrationForPossibleBugInJavaSerialization.java:52)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


Possible reason:

I think it's a ClassLoader-issue. 

The DemonstrationForPossibleBugInJavaSerialization is loaded from the
URLClassLoader created in org.apache.hadoop.util.RunJar. It contains a
reference to job's jar-file so there's no problem to load
DemonstrationForPossibleBugInJavaSerialization$MyValueClass.

JavaSerialization is loaded from the URLClassLoader's parent, which in
this case is sun.misc.Launcher$AppClassLoader.

Inside org.apache.hadoop.io.serializer.JavaSerialization$JavaSerializationDeserializer
an ObjectInputStream is used for deserialization. If you look into
ObjectInputStream.resolveClass(...) it uses the
"latestUserDefinedLoader" on the stack to resolve the class. Which is in
this case the JavaSerialization's ClassLoader and not the URLClassLoader
with access to (and knowledge about) the custom MyValueClass.


Possible solution:

The ObjectInputStream.resolveClass(...) of the ObjectInputStream inside
the JavaSerializationDeserializer should be overridden and use the
ClassLoader from org.apache.hadoop.conf.Configuration.getClassLoader()

Do you think this is a bug or just wrong usage from my side?

Thanks for some answers, 
  Christian