You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by John Thompson <jo...@gmail.com> on 2008/09/29 10:01:40 UTC

ClassNotFoundException from Jython

Hi,

I'm attempting to use Jython to do some one-off map-reduce scripts.  When I
try to run a JobConf which makes use of python classes defined, I get
ClassNotFoundExceptions.  Even using WordCount.py that's shipped with Hadoop
gives me the problem:

~ jython WordCount.py in out

08/09/29 00:40:24 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
08/09/29 00:40:24 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
08/09/29 00:40:24 INFO mapred.FileInputFormat: Total input paths to process
: 1
08/09/29 00:40:24 INFO mapred.JobClient: Running job: job_local_1
08/09/29 00:40:24 INFO mapred.MapTask: numReduceTasks: 1
08/09/29 00:40:24 WARN mapred.LocalJobRunner: job_local_1
*java.lang.RuntimeException: java.lang.RuntimeException:
java.lang.ClassNotFoundException: org.python.proxies.__main__$WordCountMap$0
*
        at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
        at org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
        at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
        at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
        at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:132)


In the original script I was using, I don't get the "No job jar file set"
warning.  My suspicion is that the problem is either classpath-related (how
does Hadoop know about org.python.proxies.__main__?) or related to the fact
that we're trying to reference a python class from inside of Java (although
I think this should be ok because the python class being called inherits
from Mapper and MapReduceBase).  Do I need to compile with jythonc first to
get this work?  I've experimented around with that a bit, but no luck thus
far - and the need to compile would eliminate much of the time I hoped to
gain by writing one-offs in jython instead of Java.

Any help would be greatly appreciated!

Best,
John

Re: ClassNotFoundException from Jython

Posted by Klaas Bosteels <kl...@ugent.be>.
On Tue, Sep 30, 2008 at 12:34 AM, Karl Anderson <kr...@monkey.org> wrote:
> I recommend using streaming instead if you can, much easier to develop and
> debug.  It's also nice to not get that "stop doing that, jythonc is going
> away" message each time you compile :)  Also check out the recently
> announced Happy framework for Hadoop and Jython, it looks interesting.

If you want to use cpython/streaming instead, you might be interested in:

https://issues.apache.org/jira/browse/HADOOP-4304


-Klaas

Re: ClassNotFoundException from Jython

Posted by Karl Anderson <kr...@monkey.org>.
On 29-Sep-08, at 1:01 AM, John Thompson wrote:

> Hi,
>
> I'm attempting to use Jython to do some one-off map-reduce scripts.   
> When I
> try to run a JobConf which makes use of python classes defined, I get
> ClassNotFoundExceptions.  Even using WordCount.py that's shipped  
> with Hadoop
> gives me the problem:

[...]

> In the original script I was using, I don't get the "No job jar file  
> set"
> warning.  My suspicion is that the problem is either classpath- 
> related (how
> does Hadoop know about org.python.proxies.__main__?) or related to  
> the fact
> that we're trying to reference a python class from inside of Java  
> (although
> I think this should be ok because the python class being called  
> inherits
> from Mapper and MapReduceBase).  Do I need to compile with jythonc  
> first to
> get this work?  I've experimented around with that a bit, but no  
> luck thus
> far - and the need to compile would eliminate much of the time I  
> hoped to
> gain by writing one-offs in jython instead of Java.

In my experience, you have to compile.  There may be a way around  
this, but I don't know of it.  Does your class work when you follow  
the instructions in the wiki, which include compiling?  You also need  
to perform the Hadoop build step.

When you compile, make sure you have hadoop's build/classes, lib/ 
*.jar, and lib/jetty-ext/*.jar in your $CLASSPATH.

I recommend using streaming instead if you can, much easier to develop  
and debug.  It's also nice to not get that "stop doing that, jythonc  
is going away" message each time you compile :)  Also check out the  
recently announced Happy framework for Hadoop and Jython, it looks  
interesting.

I didn't mind the few seconds that compiling took, since I found  
realistic testing to only be possible within a real Hadoop cluster in  
any case (pseudo-distributed wouldn't always reproduce errors for  
me).  However, I was never able to use Jython to unit test my mappers  
and reducers - I would get similar class not found exceptions, even  
though they worked within the Hadoop framework.  This was probably a  
problem with my method signatures; there isn't much guidance on the  
web for how to figure this problem out.  I also found it much easier  
to develop in a memory-efficient way with a modern Python and  
generators.

Karl Anderson
kra@monkey.org
http://monkey.org/~kra