You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Yuriy <yu...@gmail.com> on 2014/06/17 05:25:51 UTC
Job.setJar does not seem to work.
I am trying to solve an issue when a Hadoop app throws
java.lang.ClassNotFoundException:
16:49:24 INFO mapreduce.Job: Task Id :
attempt_1402524520197_0024_m_000000_0, Status : FAILED
Error: *java.lang.ClassNotFoundException:*
com.tinkerpop.blueprints.util.DefaultVertexQuery
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at
com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
The app does create a "fat" jar file, where all the dependency jars
(including the one that contains the not found class) are included under
the lib node/folder
The app does set the Job.setJar on this fat jar file.
The code does not do anything strange:
job.setJar(hadoopFileJar);
boolean success = job.waitForCompletion(true);
Besides, I looked up the configuration in the yarn-site.xml and verified
that a job dir under yarn.nodemanager.local-dirs does contain that jar (it
is renamed to job.jar though) and also that lib directory with extracted
jars in them. I.e. the jar that contains the missing class is there.
Yarn/MR recreates this dir with all these required files after each job
scheduling, so the files do get transferred there.
I've discovered so far, is that the classpath environment variable on the
java worker processes that execute the failing code is set as
C:\hdp\data\hadoop\local\usercache\user\appcache\application_1402963354379_0013\container_1402963354379_0013_02_000001\classpath-3824944728798396318.jar
and this jar just contains a manifest.mf That manifest contains paths to
the directory with the "fat.jar" file and its directories:
file:/c:/hdp/data/hadoop/loc
al/usercache/user/appcache/application_1402963354379_0013/container
_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo
p/local/usercache/user/appcache/application_1402963354379_0013/cont
ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data
/hadoop/local/usercache/user/appcache/application_1402963354379_001
3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain
fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati
on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi
tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac
he/application_1402963354379_0013/container_1402963354379_0013_02_000
001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/
application_1402963354379_0013/container_1402963354379_0013_02_000001
/job.jar/
However, these paths do not explicitly adds the jars in the directories,
i.e. the directory from above manifest
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/
does contain the jar file with the class that is not being found by yarn
(as this directory contains all the jars from the "fat" jar lib section),
but for JAVA world this kind of setting of classpath seems incorrect – this
directory was supposed to be included with *, e.g.
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/*
So I am missing something here or this is a bug on my yarn distro (HDP 2.1,
Windows x64)?
Also, who is responsible for setting the correct dependency jar path, my
job or Hadoop/Yarn infrastructure?
If that's the infrastructure, could you point me to the relevant code
please so I could read and possibly debug it?
Thanks,
Yuriy