You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by M B <ma...@gmail.com> on 2010/03/26 23:24:57 UTC

ClassNotFoundException with contrib/join example

I may be having a setup issue with classpaths, would appreciate some help.

I created a jar with all the Sample* classes in contrib/DataJoin.  Here is
the listing of my samplejoin.jar file:
" zip.vim version v22
" Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
" Select a file with cursor and press ENTER
META-INF/
META-INF/MANIFEST.MF
org/
org/apache/
org/apache/hadoop/
org/apache/hadoop/contrib/
org/apache/hadoop/contrib/utils/
org/apache/hadoop/contrib/utils/join/
org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class

When I go to run this, things start to run, but every Map try errors out
with:
"java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"

Here is the command:
hadoop jar ./samplejoin.jar org.apache.hadoop.contrib.utils.join.DataJoinJob
datajoin/input datajoin/output Text 1
org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text

This is a new install of 0.20.2.

HADOOP_CLASSPATH is set
to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
Any help would be appreciated.

Re: why does 'jps' lose track of hadoop processes ?

Posted by Raymond Jennings III <ra...@yahoo.com>.

Yes, I am.

--- On Mon, 3/29/10, Bill Au <bi...@gmail.com> wrote:

> From: Bill Au <bi...@gmail.com>
> Subject: Re: why does 'jps' lose track of hadoop processes ?
> To: common-user@hadoop.apache.org
> Date: Monday, March 29, 2010, 1:04 PM
> Are you running jps under the same
> user id that the hadoop processes are
> running under?
> 
> Bill
> 
> On Mon, Mar 29, 2010 at 11:37 AM, Raymond Jennings III
> <
> raymondjiii@yahoo.com>
> wrote:
> 
> > After running hadoop for some period of time, the
> command 'jps' fails to
> > report any hadoop process on any node in the
> cluster.  The processes are
> > still running as can be seen with 'ps -ef|grep java'
> >
> > In addition, scripts like stop-dfs.sh and
> stop-mapred.sh no longer find the
> > processes to stop.
> >
> >
> >
> >
>

Question about ChainMapper

Posted by Raymond Jennings III <ra...@yahoo.com>.

I would like to try to use a ChainMapper/ChainReducer but I see that the last parameter is a JobConf which I am not creating as I am using the latest API version.  Has anyone tried to do this with the later version API?  Can I extract a JobConf object somewhere?

Thanks

Re: why does 'jps' lose track of hadoop processes ?

Posted by Bill Au <bi...@gmail.com>.

Are you running jps under the same user id that the hadoop processes are
running under?

Bill

On Mon, Mar 29, 2010 at 11:37 AM, Raymond Jennings III <
raymondjiii@yahoo.com> wrote:

> After running hadoop for some period of time, the command 'jps' fails to
> report any hadoop process on any node in the cluster.  The processes are
> still running as can be seen with 'ps -ef|grep java'
>
> In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the
> processes to stop.
>
>
>
>

Re: why does 'jps' lose track of hadoop processes ?

Posted by Steve Loughran <st...@apache.org>.

Marcos Medrado Rubinelli wrote:
> jps gets its information from the files stored under /tmp/hsperfdata_*, 
> so when a cron job clears your /tmp directory, it also erases these 
> files. You can submit jobs as long as your jobtracker and namenode are 
> responding to requests over TCP, though.

I never knew that.

ps -ef | grep java works quite well; jps has fairly steep startup costs 
and if a JVM is playing up, jps can hang too

Re: why does 'jps' lose track of hadoop processes ?

Posted by Marcos Medrado Rubinelli <ma...@buscape-inc.com>.

jps gets its information from the files stored under /tmp/hsperfdata_*, 
so when a cron job clears your /tmp directory, it also erases these 
files. You can submit jobs as long as your jobtracker and namenode are 
responding to requests over TCP, though.

- Marcos

Raymond Jennings III wrote:
> That would explain why the processes cannot be stopped but the mystery of why jps loses track of these active processes still remains.  Even when jps does not report any hadoop process I can still submit and run jobs just fine.  I will have to check the next time it happens if the the hadoop pid's are the same as what is in the file.  If different that would somehow mean the hadoop process was being restarted?
>
> --- On Mon, 3/29/10, Bill Habermaas <bi...@habermaas.us> wrote:
>
>   
>> From: Bill Habermaas <bi...@habermaas.us>
>> Subject: RE: why does 'jps' lose track of hadoop processes ?
>> To: common-user@hadoop.apache.org
>> Date: Monday, March 29, 2010, 11:44 AM
>> Sounds like your pid files are
>> getting cleaned out of whatever directory
>> they are being written (maybe garbage collection on a temp
>> directory?). 
>>
>> Look at (taken from hadoop-env.sh):
>> # The directory where pid files are stored. /tmp by
>> default.
>> # export HADOOP_PID_DIR=/var/hadoop/pids
>>
>> The hadoop shell scripts look in the directory that is
>> defined.
>>
>> Bill
>>
>> -----Original Message-----
>> From: Raymond Jennings III [mailto:raymondjiii@yahoo.com]
>>
>> Sent: Monday, March 29, 2010 11:37 AM
>> To: common-user@hadoop.apache.org
>> Subject: why does 'jps' lose track of hadoop processes ?
>>
>> After running hadoop for some period of time, the command
>> 'jps' fails to
>> report any hadoop process on any node in the cluster. 
>> The processes are
>> still running as can be seen with 'ps -ef|grep java'
>>
>> In addition, scripts like stop-dfs.sh and stop-mapred.sh no
>> longer find the
>> processes to stop.
>>
>>
>>       
>>
>>
>>
>>     
>
>
>       
>
>   


-- 
------------------------------------------------------------------------
Marcos Medrado Rubinelli
Tecnologia - BuscaPé
Tel. +55 11 3848-8700 Ramal 8788
marcosm@buscape-inc.com <ma...@buscape-inc.com>

RE: why does 'jps' lose track of hadoop processes ?

Posted by Raymond Jennings III <ra...@yahoo.com>.

That would explain why the processes cannot be stopped but the mystery of why jps loses track of these active processes still remains.  Even when jps does not report any hadoop process I can still submit and run jobs just fine.  I will have to check the next time it happens if the the hadoop pid's are the same as what is in the file.  If different that would somehow mean the hadoop process was being restarted?

--- On Mon, 3/29/10, Bill Habermaas <bi...@habermaas.us> wrote:

> From: Bill Habermaas <bi...@habermaas.us>
> Subject: RE: why does 'jps' lose track of hadoop processes ?
> To: common-user@hadoop.apache.org
> Date: Monday, March 29, 2010, 11:44 AM
> Sounds like your pid files are
> getting cleaned out of whatever directory
> they are being written (maybe garbage collection on a temp
> directory?). 
> 
> Look at (taken from hadoop-env.sh):
> # The directory where pid files are stored. /tmp by
> default.
> # export HADOOP_PID_DIR=/var/hadoop/pids
> 
> The hadoop shell scripts look in the directory that is
> defined.
> 
> Bill
> 
> -----Original Message-----
> From: Raymond Jennings III [mailto:raymondjiii@yahoo.com]
> 
> Sent: Monday, March 29, 2010 11:37 AM
> To: common-user@hadoop.apache.org
> Subject: why does 'jps' lose track of hadoop processes ?
> 
> After running hadoop for some period of time, the command
> 'jps' fails to
> report any hadoop process on any node in the cluster. 
> The processes are
> still running as can be seen with 'ps -ef|grep java'
> 
> In addition, scripts like stop-dfs.sh and stop-mapred.sh no
> longer find the
> processes to stop.
> 
> 
>       
> 
> 
>

RE: why does 'jps' lose track of hadoop processes ?

Posted by Bill Habermaas <bi...@habermaas.us>.

Sounds like your pid files are getting cleaned out of whatever directory
they are being written (maybe garbage collection on a temp directory?). 

Look at (taken from hadoop-env.sh):
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

The hadoop shell scripts look in the directory that is defined.

Bill

-----Original Message-----
From: Raymond Jennings III [mailto:raymondjiii@yahoo.com] 
Sent: Monday, March 29, 2010 11:37 AM
To: common-user@hadoop.apache.org
Subject: why does 'jps' lose track of hadoop processes ?

After running hadoop for some period of time, the command 'jps' fails to
report any hadoop process on any node in the cluster.  The processes are
still running as can be seen with 'ps -ef|grep java'

In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the
processes to stop.

Re: ClassNotFoundException with contrib/join example

Posted by M B <ma...@gmail.com>.

Right, that was the first option I tried and it fails there as well.

Maybe I need to step back and ask a higher-level question - does anyone have
a full, step-by-step example of using a reduce-side join in an M/R job?
Preferrably using the contrib/DataJoin classes, but I'll be happy with
whatever example I could get.

I'd love to see the actual code and then how it's kicked off on the command
line so I can try it on my end as a prototype.  I must be doing something
wrong, but don't know what it is.

Thanks.

On Mon, Mar 29, 2010 at 8:31 AM, Jones, Nick <ni...@amd.com> wrote:

> M B,
> I'm not sure about the -libjars argument but 'hadoop jar' is expecting the
> jarfile immediately afterwards: hadoop jar jarFile [mainClass] args...
>
> Nick Jones
>
> -----Original Message-----
> From: M B [mailto:machaca74@gmail.com]
> Sent: Monday, March 29, 2010 10:26 AM
> To: common-user@hadoop.apache.org
> Subject: Re: ClassNotFoundException with contrib/join example
>
> Sorry, I should have mentioned that I tried that as well and it also gives
> an error:
>
>  $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> datajoin/output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> Exception in thread "main" java.io.IOException: Error opening job jar:
> -libjars
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>        at java.util.zip.ZipFile.open(Native Method)
>        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
>        at java.util.jar.JarFile.<init>(JarFile.java:133)
>        at java.util.jar.JarFile.<init>(JarFile.java:70)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Has something changed or is my environment not set up correctly?
>  Appreciate
> any help.
>
>
>
> On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Then use the syntax given by
> >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > :
> >
> > $ bin/hadoop jar -libjars ./samplejoin.jar
> > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> >
> > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
> >
> > > Sorry, but where exactly do I include the libjars option?  I tried to
> put
> > > it
> > > where you stated (after the DataJoinJob class), but it just comes back
> > with
> > > usage information (as if the option is not valid):
> > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > ./samplejoin.jar
> > > datajoin/input datajoin/output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> numofParts
> > > mapper_class reducer_class map_output_value_class output_value_class
> > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > >
> > > It seems like it's not taking the option for some reason, like it's
> > failing
> > > an argument check in DataJoinJob - does that not use the standard args
> or
> > > something?
> > >
> > >
> > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in
> your
> > > > HADOOP_CLASSPATH
> > > >
> > > > I think you should specify samplejoin.jar using -libjars instead of
> > > putting
> > > > it directly after jar command:
> > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > ./samplejoin.jar
> > > > ... (same as your example)
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> > > >
> > > > > I may be having a setup issue with classpaths, would appreciate
> some
> > > > help.
> > > > >
> > > > > I created a jar with all the Sample* classes in contrib/DataJoin.
> >  Here
> > > > is
> > > > > the listing of my samplejoin.jar file:
> > > > > " zip.vim version v22
> > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > " Select a file with cursor and press ENTER
> > > > > META-INF/
> > > > > META-INF/MANIFEST.MF
> > > > > org/
> > > > > org/apache/
> > > > > org/apache/hadoop/
> > > > > org/apache/hadoop/contrib/
> > > > > org/apache/hadoop/contrib/utils/
> > > > > org/apache/hadoop/contrib/utils/join/
> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > >
> > > > > When I go to run this, things start to run, but every Map try
> errors
> > > out
> > > > > with:
> > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > >
> > > > > Here is the command:
> > > > > hadoop jar ./samplejoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > datajoin/input datajoin/output Text 1
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > >
> > > > > This is a new install of 0.20.2.
> > > > >
> > > > > HADOOP_CLASSPATH is set
> > > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > Any help would be appreciated.
> > > > >
> > > >
> > >
> >
>
>

why does 'jps' lose track of hadoop processes ?

Posted by Raymond Jennings III <ra...@yahoo.com>.

After running hadoop for some period of time, the command 'jps' fails to report any hadoop process on any node in the cluster.  The processes are still running as can be seen with 'ps -ef|grep java'

In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the processes to stop.