You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Yuriy <yu...@gmail.com> on 2014/06/17 05:25:51 UTC
Job.setJar does not seem to work.

I am trying to solve an issue when a Hadoop app throws
java.lang.ClassNotFoundException:



16:49:24 INFO mapreduce.Job: Task Id :
attempt_1402524520197_0024_m_000000_0, Status : FAILED

Error: *java.lang.ClassNotFoundException:*
com.tinkerpop.blueprints.util.DefaultVertexQuery

        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

        at java.lang.ClassLoader.defineClass1(Native Method)

        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)

        at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)

        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)

        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)

        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)

        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

        at java.security.AccessController.doPrivileged(Native Method)

        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

        at
com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat.setConf(GraphSONInputFormat.java:39)

        at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)

        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)

        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:726)

        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)

        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)

        at java.security.AccessController.doPrivileged(Native Method)

        at javax.security.auth.Subject.doAs(Subject.java:415)

        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)

        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)



The app does create a "fat" jar file, where all the dependency jars
(including the one that contains the not found class) are included under
the lib node/folder

The app does set the Job.setJar on this fat jar file.



The code does not do anything strange:



            job.setJar(hadoopFileJar);

            boolean success = job.waitForCompletion(true);



Besides, I looked up the configuration in the yarn-site.xml and verified
that a job dir under yarn.nodemanager.local-dirs does contain that jar (it
is renamed to job.jar though) and also that lib directory with extracted
jars in them. I.e. the jar that contains the missing class is there.
Yarn/MR recreates this dir with all these required files after each job
scheduling, so the files do get transferred there.



I've discovered so far, is that the classpath environment variable on the
java worker processes that execute the failing code is set as

C:\hdp\data\hadoop\local\usercache\user\appcache\application_1402963354379_0013\container_1402963354379_0013_02_000001\classpath-3824944728798396318.jar



and this jar just contains a manifest.mf That manifest contains paths to
the directory with the "fat.jar" file and its directories:



file:/c:/hdp/data/hadoop/loc

al/usercache/user/appcache/application_1402963354379_0013/container

_1402963354379_0013_02_000001/job.jar/job.jar file:/c:/hdp/data/hadoo

p/local/usercache/user/appcache/application_1402963354379_0013/cont

ainer_1402963354379_0013_02_000001/job.jar/classes/ file:/c:/hdp/data

/hadoop/local/usercache/user/appcache/application_1402963354379_001

3/container_1402963354379_0013_02_000001/jobSubmitDir/job.splitmetain

fo file:/c:/hdp/data/hadoop/local/usercache/user/appcache/applicati

on_1402963354379_0013/container_1402963354379_0013_02_000001/jobSubmi

tDir/job.split file:/c:/hdp/data/hadoop/local/usercache/user/appcac

he/application_1402963354379_0013/container_1402963354379_0013_02_000

001/job.xml file:/c:/hdp/data/hadoop/local/usercache/user/appcache/

application_1402963354379_0013/container_1402963354379_0013_02_000001

/job.jar/



However, these paths do not explicitly adds the jars in the directories,
i.e. the directory from above manifest
file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/
does contain the jar file with the class that is not being found by yarn
(as this directory contains all the jars from the "fat" jar lib section),
but for JAVA world this kind of setting of classpath seems incorrect – this
directory was supposed to be included with *, e.g.

file:/c:/hdp/data/hadoop/local/usercache/user/appcache/application_1402963354379_0013/container_1402963354379_0013_02_000001/job.jar/*



So I am missing something here or this is a bug on my yarn distro (HDP 2.1,
Windows x64)?

Also, who is responsible for setting the correct dependency jar path, my
job or Hadoop/Yarn infrastructure?

If that's the infrastructure, could you point me to the relevant code
please so I could read and possibly debug it?



Thanks,

Yuriy