You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@crunch.apache.org by David Ortiz <do...@videologygroup.com> on 2015/07/21 23:12:02 UTC

ClassNotFoundException running with Oozie

Hello everyone,

     I'm getting an interesting exception running a crunch pipeline from Oozie.  I have all the crunch dependencies bundled in a fat jar of dependencies called crunch-lib.  My avro schemas all live in a jar called schemas.  These all live in a sharelib directory for java actions on HDFS.  My job itself is in a jar which lives in a directory pointed to by oozie.libpath.  As far as I can tell the Oozie job is getting all of the dependencies since my crunch client code runs and tries to spin up MR jobs.  However, it fails, with the jobs it creates having the following exception:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:472)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:452)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1541)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:452)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:232)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:468)
        ... 11 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
        ... 13 more


Anyone have any ideas how the dependencies would be making it to the crunch client, but not into the jar that crunch submits to the cluster?

Thanks,
    Dave
This email is intended only for the use of the individual(s) to whom it is addressed. If you have received this communication in error, please immediately notify the sender and delete the original email.

RE: ClassNotFoundException running with Oozie

Posted by David Ortiz <do...@videologygroup.com>.
Thanks.  That’s the exact issue I am having.  Could not find a good way to connect the Oozie classpath jars in the distributedcache to the libjars command I could pass my driver code.  For the moment the most promising approach (in our environment) seems to be using chef to push my library code out to a common location on the nodemanagers then adding it to the yarn application classpath (since my goal is to be able to update my library code without having to rerelease workflows/pipelines).  This feels much more kludgey than using oozie to pass in the dependencies, but works.

From: Josh Wills [mailto:jwills@cloudera.com]
Sent: Wednesday, July 22, 2015 1:59 PM
To: user@crunch.apache.org
Subject: Re: ClassNotFoundException running with Oozie

Mike Baretta posted about a similar issue late last year and had an ugly fix that involved copying the Crunch jars into the distributed cache. You can see the whole thread here:

https://www.mail-archive.com/user@crunch.apache.org/msg00438.html

I myself haven't run into this one.

J

On Tue, Jul 21, 2015 at 2:12 PM, David Ortiz <do...@videologygroup.com>> wrote:
Hello everyone,

     I’m getting an interesting exception running a crunch pipeline from Oozie.  I have all the crunch dependencies bundled in a fat jar of dependencies called crunch-lib.  My avro schemas all live in a jar called schemas.  These all live in a sharelib directory for java actions on HDFS.  My job itself is in a jar which lives in a directory pointed to by oozie.libpath.  As far as I can tell the Oozie job is getting all of the dependencies since my crunch client code runs and tries to spin up MR jobs.  However, it fails, with the jobs it creates having the following exception:

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:472)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:452)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1541)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:452)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
        at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:232)
        at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:468)
        ... 11 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
        ... 13 more


Anyone have any ideas how the dependencies would be making it to the crunch client, but not into the jar that crunch submits to the cluster?

Thanks,
    Dave
This email is intended only for the use of the individual(s) to whom it is addressed. If you have received this communication in error, please immediately notify the sender and delete the original email.



--
Director of Data Science
Cloudera<http://www.cloudera.com>
Twitter: @josh_wills<http://twitter.com/josh_wills>
This email is intended only for the use of the individual(s) to whom it is addressed. If you have received this communication in error, please immediately notify the sender and delete the original email.

Re: ClassNotFoundException running with Oozie

Posted by Josh Wills <jw...@cloudera.com>.
Mike Baretta posted about a similar issue late last year and had an ugly
fix that involved copying the Crunch jars into the distributed cache. You
can see the whole thread here:

https://www.mail-archive.com/user@crunch.apache.org/msg00438.html

I myself haven't run into this one.

J

On Tue, Jul 21, 2015 at 2:12 PM, David Ortiz <do...@videologygroup.com>
wrote:

>  Hello everyone,
>
>
>
>      I’m getting an interesting exception running a crunch pipeline from
> Oozie.  I have all the crunch dependencies bundled in a fat jar of
> dependencies called crunch-lib.  My avro schemas all live in a jar called
> schemas.  These all live in a sharelib directory for java actions on HDFS.
> My job itself is in a jar which lives in a directory pointed to by
> oozie.libpath.  As far as I can tell the Oozie job is getting all of the
> dependencies since my crunch client code runs and tries to spin up MR
> jobs.  However, it fails, with the jobs it creates having the following
> exception:
>
>
>
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException:
> java.lang.RuntimeException: java.lang.ClassNotFoundException: Class
> org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:472)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:452)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.callWithJobClassLoader(MRAppMaster.java:1541)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:452)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:371)
>
>         at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$4.run(MRAppMaster.java:1499)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1496)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1429)
>
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> Class org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
>
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2112)
>
>         at
> org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:232)
>
>         at
> org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.call(MRAppMaster.java:468)
>
>         ... 11 more
>
> Caused by: java.lang.ClassNotFoundException: Class
> org.apache.crunch.impl.mr.run.CrunchOutputFormat not found
>
>         at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2018)
>
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2110)
>
>         ... 13 more
>
>
>
>
>
> Anyone have any ideas how the dependencies would be making it to the
> crunch client, but not into the jar that crunch submits to the cluster?
>
>
>
> Thanks,
>
>     Dave
>  *This email is intended only for the use of the individual(s) to whom it
> is addressed. If you have received this communication in error, please
> immediately notify the sender and delete the original email.*
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>