You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by M B <ma...@gmail.com> on 2010/03/26 23:24:57 UTC

ClassNotFoundException with contrib/join example

I may be having a setup issue with classpaths, would appreciate some help.

I created a jar with all the Sample* classes in contrib/DataJoin.  Here is
the listing of my samplejoin.jar file:
" zip.vim version v22
" Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
" Select a file with cursor and press ENTER
META-INF/
META-INF/MANIFEST.MF
org/
org/apache/
org/apache/hadoop/
org/apache/hadoop/contrib/
org/apache/hadoop/contrib/utils/
org/apache/hadoop/contrib/utils/join/
org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class

When I go to run this, things start to run, but every Map try errors out
with:
"java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"

Here is the command:
hadoop jar ./samplejoin.jar org.apache.hadoop.contrib.utils.join.DataJoinJob
datajoin/input datajoin/output Text 1
org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text

This is a new install of 0.20.2.

HADOOP_CLASSPATH is set
to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
Any help would be appreciated.

Re: why does 'jps' lose track of hadoop processes ?

Posted by Raymond Jennings III <ra...@yahoo.com>.
Yes, I am.

--- On Mon, 3/29/10, Bill Au <bi...@gmail.com> wrote:

> From: Bill Au <bi...@gmail.com>
> Subject: Re: why does 'jps' lose track of hadoop processes ?
> To: common-user@hadoop.apache.org
> Date: Monday, March 29, 2010, 1:04 PM
> Are you running jps under the same
> user id that the hadoop processes are
> running under?
> 
> Bill
> 
> On Mon, Mar 29, 2010 at 11:37 AM, Raymond Jennings III
> <
> raymondjiii@yahoo.com>
> wrote:
> 
> > After running hadoop for some period of time, the
> command 'jps' fails to
> > report any hadoop process on any node in the
> cluster.  The processes are
> > still running as can be seen with 'ps -ef|grep java'
> >
> > In addition, scripts like stop-dfs.sh and
> stop-mapred.sh no longer find the
> > processes to stop.
> >
> >
> >
> >
> 


      

Question about ChainMapper

Posted by Raymond Jennings III <ra...@yahoo.com>.
I would like to try to use a ChainMapper/ChainReducer but I see that the last parameter is a JobConf which I am not creating as I am using the latest API version.  Has anyone tried to do this with the later version API?  Can I extract a JobConf object somewhere?

Thanks


      

Re: why does 'jps' lose track of hadoop processes ?

Posted by Bill Au <bi...@gmail.com>.
Are you running jps under the same user id that the hadoop processes are
running under?

Bill

On Mon, Mar 29, 2010 at 11:37 AM, Raymond Jennings III <
raymondjiii@yahoo.com> wrote:

> After running hadoop for some period of time, the command 'jps' fails to
> report any hadoop process on any node in the cluster.  The processes are
> still running as can be seen with 'ps -ef|grep java'
>
> In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the
> processes to stop.
>
>
>
>

Re: why does 'jps' lose track of hadoop processes ?

Posted by Steve Loughran <st...@apache.org>.
Marcos Medrado Rubinelli wrote:
> jps gets its information from the files stored under /tmp/hsperfdata_*, 
> so when a cron job clears your /tmp directory, it also erases these 
> files. You can submit jobs as long as your jobtracker and namenode are 
> responding to requests over TCP, though.

I never knew that.

ps -ef | grep java works quite well; jps has fairly steep startup costs 
and if a JVM is playing up, jps can hang too


Re: why does 'jps' lose track of hadoop processes ?

Posted by Marcos Medrado Rubinelli <ma...@buscape-inc.com>.
jps gets its information from the files stored under /tmp/hsperfdata_*, 
so when a cron job clears your /tmp directory, it also erases these 
files. You can submit jobs as long as your jobtracker and namenode are 
responding to requests over TCP, though.

- Marcos

Raymond Jennings III wrote:
> That would explain why the processes cannot be stopped but the mystery of why jps loses track of these active processes still remains.  Even when jps does not report any hadoop process I can still submit and run jobs just fine.  I will have to check the next time it happens if the the hadoop pid's are the same as what is in the file.  If different that would somehow mean the hadoop process was being restarted?
>
> --- On Mon, 3/29/10, Bill Habermaas <bi...@habermaas.us> wrote:
>
>   
>> From: Bill Habermaas <bi...@habermaas.us>
>> Subject: RE: why does 'jps' lose track of hadoop processes ?
>> To: common-user@hadoop.apache.org
>> Date: Monday, March 29, 2010, 11:44 AM
>> Sounds like your pid files are
>> getting cleaned out of whatever directory
>> they are being written (maybe garbage collection on a temp
>> directory?). 
>>
>> Look at (taken from hadoop-env.sh):
>> # The directory where pid files are stored. /tmp by
>> default.
>> # export HADOOP_PID_DIR=/var/hadoop/pids
>>
>> The hadoop shell scripts look in the directory that is
>> defined.
>>
>> Bill
>>
>> -----Original Message-----
>> From: Raymond Jennings III [mailto:raymondjiii@yahoo.com]
>>
>> Sent: Monday, March 29, 2010 11:37 AM
>> To: common-user@hadoop.apache.org
>> Subject: why does 'jps' lose track of hadoop processes ?
>>
>> After running hadoop for some period of time, the command
>> 'jps' fails to
>> report any hadoop process on any node in the cluster. 
>> The processes are
>> still running as can be seen with 'ps -ef|grep java'
>>
>> In addition, scripts like stop-dfs.sh and stop-mapred.sh no
>> longer find the
>> processes to stop.
>>
>>
>>       
>>
>>
>>
>>     
>
>
>       
>
>   


-- 
------------------------------------------------------------------------
Marcos Medrado Rubinelli
Tecnologia - BuscaPé
Tel. +55 11 3848-8700 Ramal 8788
marcosm@buscape-inc.com <ma...@buscape-inc.com>

RE: why does 'jps' lose track of hadoop processes ?

Posted by Raymond Jennings III <ra...@yahoo.com>.
That would explain why the processes cannot be stopped but the mystery of why jps loses track of these active processes still remains.  Even when jps does not report any hadoop process I can still submit and run jobs just fine.  I will have to check the next time it happens if the the hadoop pid's are the same as what is in the file.  If different that would somehow mean the hadoop process was being restarted?

--- On Mon, 3/29/10, Bill Habermaas <bi...@habermaas.us> wrote:

> From: Bill Habermaas <bi...@habermaas.us>
> Subject: RE: why does 'jps' lose track of hadoop processes ?
> To: common-user@hadoop.apache.org
> Date: Monday, March 29, 2010, 11:44 AM
> Sounds like your pid files are
> getting cleaned out of whatever directory
> they are being written (maybe garbage collection on a temp
> directory?). 
> 
> Look at (taken from hadoop-env.sh):
> # The directory where pid files are stored. /tmp by
> default.
> # export HADOOP_PID_DIR=/var/hadoop/pids
> 
> The hadoop shell scripts look in the directory that is
> defined.
> 
> Bill
> 
> -----Original Message-----
> From: Raymond Jennings III [mailto:raymondjiii@yahoo.com]
> 
> Sent: Monday, March 29, 2010 11:37 AM
> To: common-user@hadoop.apache.org
> Subject: why does 'jps' lose track of hadoop processes ?
> 
> After running hadoop for some period of time, the command
> 'jps' fails to
> report any hadoop process on any node in the cluster. 
> The processes are
> still running as can be seen with 'ps -ef|grep java'
> 
> In addition, scripts like stop-dfs.sh and stop-mapred.sh no
> longer find the
> processes to stop.
> 
> 
>       
> 
> 
> 


      

RE: why does 'jps' lose track of hadoop processes ?

Posted by Bill Habermaas <bi...@habermaas.us>.
Sounds like your pid files are getting cleaned out of whatever directory
they are being written (maybe garbage collection on a temp directory?). 

Look at (taken from hadoop-env.sh):
# The directory where pid files are stored. /tmp by default.
# export HADOOP_PID_DIR=/var/hadoop/pids

The hadoop shell scripts look in the directory that is defined.

Bill

-----Original Message-----
From: Raymond Jennings III [mailto:raymondjiii@yahoo.com] 
Sent: Monday, March 29, 2010 11:37 AM
To: common-user@hadoop.apache.org
Subject: why does 'jps' lose track of hadoop processes ?

After running hadoop for some period of time, the command 'jps' fails to
report any hadoop process on any node in the cluster.  The processes are
still running as can be seen with 'ps -ef|grep java'

In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the
processes to stop.


      



Re: ClassNotFoundException with contrib/join example

Posted by M B <ma...@gmail.com>.
Right, that was the first option I tried and it fails there as well.

Maybe I need to step back and ask a higher-level question - does anyone have
a full, step-by-step example of using a reduce-side join in an M/R job?
Preferrably using the contrib/DataJoin classes, but I'll be happy with
whatever example I could get.

I'd love to see the actual code and then how it's kicked off on the command
line so I can try it on my end as a prototype.  I must be doing something
wrong, but don't know what it is.

Thanks.

On Mon, Mar 29, 2010 at 8:31 AM, Jones, Nick <ni...@amd.com> wrote:

> M B,
> I'm not sure about the -libjars argument but 'hadoop jar' is expecting the
> jarfile immediately afterwards: hadoop jar jarFile [mainClass] args...
>
> Nick Jones
>
> -----Original Message-----
> From: M B [mailto:machaca74@gmail.com]
> Sent: Monday, March 29, 2010 10:26 AM
> To: common-user@hadoop.apache.org
> Subject: Re: ClassNotFoundException with contrib/join example
>
> Sorry, I should have mentioned that I tried that as well and it also gives
> an error:
>
>  $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> datajoin/output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> Exception in thread "main" java.io.IOException: Error opening job jar:
> -libjars
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>        at java.util.zip.ZipFile.open(Native Method)
>        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
>        at java.util.jar.JarFile.<init>(JarFile.java:133)
>        at java.util.jar.JarFile.<init>(JarFile.java:70)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Has something changed or is my environment not set up correctly?
>  Appreciate
> any help.
>
>
>
> On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Then use the syntax given by
> >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > :
> >
> > $ bin/hadoop jar -libjars ./samplejoin.jar
> > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> >
> > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
> >
> > > Sorry, but where exactly do I include the libjars option?  I tried to
> put
> > > it
> > > where you stated (after the DataJoinJob class), but it just comes back
> > with
> > > usage information (as if the option is not valid):
> > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > ./samplejoin.jar
> > > datajoin/input datajoin/output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> numofParts
> > > mapper_class reducer_class map_output_value_class output_value_class
> > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > >
> > > It seems like it's not taking the option for some reason, like it's
> > failing
> > > an argument check in DataJoinJob - does that not use the standard args
> or
> > > something?
> > >
> > >
> > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in
> your
> > > > HADOOP_CLASSPATH
> > > >
> > > > I think you should specify samplejoin.jar using -libjars instead of
> > > putting
> > > > it directly after jar command:
> > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > ./samplejoin.jar
> > > > ... (same as your example)
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> > > >
> > > > > I may be having a setup issue with classpaths, would appreciate
> some
> > > > help.
> > > > >
> > > > > I created a jar with all the Sample* classes in contrib/DataJoin.
> >  Here
> > > > is
> > > > > the listing of my samplejoin.jar file:
> > > > > " zip.vim version v22
> > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > " Select a file with cursor and press ENTER
> > > > > META-INF/
> > > > > META-INF/MANIFEST.MF
> > > > > org/
> > > > > org/apache/
> > > > > org/apache/hadoop/
> > > > > org/apache/hadoop/contrib/
> > > > > org/apache/hadoop/contrib/utils/
> > > > > org/apache/hadoop/contrib/utils/join/
> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > >
> > > > > When I go to run this, things start to run, but every Map try
> errors
> > > out
> > > > > with:
> > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > >
> > > > > Here is the command:
> > > > > hadoop jar ./samplejoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > datajoin/input datajoin/output Text 1
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > >
> > > > > This is a new install of 0.20.2.
> > > > >
> > > > > HADOOP_CLASSPATH is set
> > > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > Any help would be appreciated.
> > > > >
> > > >
> > >
> >
>
>

why does 'jps' lose track of hadoop processes ?

Posted by Raymond Jennings III <ra...@yahoo.com>.
After running hadoop for some period of time, the command 'jps' fails to report any hadoop process on any node in the cluster.  The processes are still running as can be seen with 'ps -ef|grep java'

In addition, scripts like stop-dfs.sh and stop-mapred.sh no longer find the processes to stop.


      

RE: ClassNotFoundException with contrib/join example

Posted by "Jones, Nick" <ni...@amd.com>.
M B,
I'm not sure about the -libjars argument but 'hadoop jar' is expecting the jarfile immediately afterwards: hadoop jar jarFile [mainClass] args...

Nick Jones

-----Original Message-----
From: M B [mailto:machaca74@gmail.com] 
Sent: Monday, March 29, 2010 10:26 AM
To: common-user@hadoop.apache.org
Subject: Re: ClassNotFoundException with contrib/join example

Sorry, I should have mentioned that I tried that as well and it also gives
an error:

$ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
/opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
datajoin/output Text 1
org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
Exception in thread "main" java.io.IOException: Error opening job jar:
-libjars
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
        at java.util.jar.JarFile.<init>(JarFile.java:133)
        at java.util.jar.JarFile.<init>(JarFile.java:70)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
Has something changed or is my environment not set up correctly?  Appreciate
any help.



On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Then use the syntax given by
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> :
>
> $ bin/hadoop jar -libjars ./samplejoin.jar
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
>
> On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
>
> > Sorry, but where exactly do I include the libjars option?  I tried to put
> > it
> > where you stated (after the DataJoinJob class), but it just comes back
> with
> > usage information (as if the option is not valid):
> > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
>  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> ./samplejoin.jar
> > datajoin/input datajoin/output Text 1
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > *usage: DataJoinJob inputdirs outputdir map_input_file_format numofParts
> > mapper_class reducer_class map_output_value_class output_value_class
> > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> >
> > It seems like it's not taking the option for some reason, like it's
> failing
> > an argument check in DataJoinJob - does that not use the standard args or
> > something?
> >
> >
> > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in your
> > > HADOOP_CLASSPATH
> > >
> > > I think you should specify samplejoin.jar using -libjars instead of
> > putting
> > > it directly after jar command:
> > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > ./samplejoin.jar
> > > ... (same as your example)
> > >
> > > Cheers
> > >
> > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> > >
> > > > I may be having a setup issue with classpaths, would appreciate some
> > > help.
> > > >
> > > > I created a jar with all the Sample* classes in contrib/DataJoin.
>  Here
> > > is
> > > > the listing of my samplejoin.jar file:
> > > > " zip.vim version v22
> > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > " Select a file with cursor and press ENTER
> > > > META-INF/
> > > > META-INF/MANIFEST.MF
> > > > org/
> > > > org/apache/
> > > > org/apache/hadoop/
> > > > org/apache/hadoop/contrib/
> > > > org/apache/hadoop/contrib/utils/
> > > > org/apache/hadoop/contrib/utils/join/
> > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > >
> > > > When I go to run this, things start to run, but every Map try errors
> > out
> > > > with:
> > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > >
> > > > Here is the command:
> > > > hadoop jar ./samplejoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > datajoin/input datajoin/output Text 1
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > >
> > > > This is a new install of 0.20.2.
> > > >
> > > > HADOOP_CLASSPATH is set
> > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > Any help would be appreciated.
> > > >
> > >
> >
>


Re: ClassNotFoundException with contrib/join example

Posted by Ted Yu <yu...@gmail.com>.
I modified the DataJoinJob.createDataJoinJob() slightly:
    if (args[7].compareToIgnoreCase("text") != 0) {
        SequenceFileOutputFormat.setOutputCompressionType(job,
            SequenceFile.CompressionType.BLOCK);
    }
    job.setBoolean("mapred.output.compress", false);

But I still see non-text output:
output/part-00000.deflate

'hadoop fs -text output/part-00000.deflate' doesn't show readable text
either.

On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yu...@gmail.com> wrote:

> I can run the sample (I created the input files according to
> contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):
>
> [root@tyu-linux datajoin]# pwd
> /opt/ks/hadoop-0.20.2/build/contrib/datajoin
> [root@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
> hadoop-0.20.2-datajoin-examples.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> Using TextInputFormat: Text
> Using TextOutputFormat: Text
> 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> Job job_local_0001 is submitted
> Job job_local_0001 is still running.
> 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
> 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
> 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
> 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
> 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000000_0 is done. And is in the process of
> commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
> totalCount      6
>
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_000000_0' done.
> 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
> 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
> 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
> 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
> 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000001_0 is done. And is in the process of
> commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
> totalCount      5
>
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_000001_0' done.
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
> 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with 2
> segments left of total size: 939 bytes
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 10/03/29 09:01:32 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_r_000000_0 is done. And is in the process of
> commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> attempt_local_0001_r_000000_0 is allowed to commit now
> 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
> 'attempt_local_0001_r_000000_0' to
> file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount    5
> collectedCount  7
> groupCount      6
>  > reduce
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_r_000000_0' done.
> [root@tyu-linux datajoin]# date
> Mon Mar 29 09:02:37 PDT 2010
>
> It took a minute between the last INFO log and exit of DataJoinJob.
>
> Cheers
>
>
> On Mon, Mar 29, 2010 at 8:26 AM, M B <ma...@gmail.com> wrote:
>
>> Sorry, I should have mentioned that I tried that as well and it also gives
>> an error:
>>
>> $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
>> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
>> datajoin/output Text 1
>> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
>> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
>> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>> Exception in thread "main" java.io.IOException: Error opening job jar:
>> -libjars
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
>> Caused by: java.util.zip.ZipException: error in opening zip file
>>        at java.util.zip.ZipFile.open(Native Method)
>>        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
>>        at java.util.jar.JarFile.<init>(JarFile.java:133)
>>        at java.util.jar.JarFile.<init>(JarFile.java:70)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
>> Has something changed or is my environment not set up correctly?
>>  Appreciate
>> any help.
>>
>>
>>
>> On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
>>
>> > Then use the syntax given by
>> >
>> >
>> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
>> > :
>> >
>> > $ bin/hadoop jar -libjars ./samplejoin.jar
>> > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
>> >
>> > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
>> >
>> > > Sorry, but where exactly do I include the libjars option?  I tried to
>> put
>> > > it
>> > > where you stated (after the DataJoinJob class), but it just comes back
>> > with
>> > > usage information (as if the option is not valid):
>> > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
>> >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
>> > ./samplejoin.jar
>> > > datajoin/input datajoin/output Text 1
>> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
>> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
>> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>> > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
>> numofParts
>> > > mapper_class reducer_class map_output_value_class output_value_class
>> > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
>> > >
>> > > It seems like it's not taking the option for some reason, like it's
>> > failing
>> > > an argument check in DataJoinJob - does that not use the standard args
>> or
>> > > something?
>> > >
>> > >
>> > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in
>> your
>> > > > HADOOP_CLASSPATH
>> > > >
>> > > > I think you should specify samplejoin.jar using -libjars instead of
>> > > putting
>> > > > it directly after jar command:
>> > > > hadoop jar hadoop-0.20.2-datajoin.jar
>> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
>> > > ./samplejoin.jar
>> > > > ... (same as your example)
>> > > >
>> > > > Cheers
>> > > >
>> > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
>> > > >
>> > > > > I may be having a setup issue with classpaths, would appreciate
>> some
>> > > > help.
>> > > > >
>> > > > > I created a jar with all the Sample* classes in contrib/DataJoin.
>> >  Here
>> > > > is
>> > > > > the listing of my samplejoin.jar file:
>> > > > > " zip.vim version v22
>> > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
>> > > > > " Select a file with cursor and press ENTER
>> > > > > META-INF/
>> > > > > META-INF/MANIFEST.MF
>> > > > > org/
>> > > > > org/apache/
>> > > > > org/apache/hadoop/
>> > > > > org/apache/hadoop/contrib/
>> > > > > org/apache/hadoop/contrib/utils/
>> > > > > org/apache/hadoop/contrib/utils/join/
>> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
>> > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
>> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
>> > > > >
>> > > > > When I go to run this, things start to run, but every Map try
>> errors
>> > > out
>> > > > > with:
>> > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
>> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
>> > > > >
>> > > > > Here is the command:
>> > > > > hadoop jar ./samplejoin.jar
>> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
>> > > > > datajoin/input datajoin/output Text 1
>> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
>> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
>> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>> > > > >
>> > > > > This is a new install of 0.20.2.
>> > > > >
>> > > > > HADOOP_CLASSPATH is set
>> > > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
>> > > > > Any help would be appreciated.
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: ClassNotFoundException with contrib/join example

Posted by M B <ma...@gmail.com>.
ah, thanks, that got it.  now I'm at the same point you are -
part-00000.deflate is there and is not readable.  Seems like I should see
text output, right?

On Mon, Mar 29, 2010 at 2:04 PM, Ted Yu <yu...@gmail.com> wrote:

> Under hadoop-0.20.2/src/contrib/data_join, run
> ant jar-examples
>
> You may need to rename the jars
> (hadoop-\$\{version\}-datajoin-examples.jar):
> [root@tyu-linux datajoin]# ls
> classes  examples  hadoop-0.20.2-datajoin-examples.jar
> hadoop-0.20.2-datajoin.jar  input  output  test
>
> On Mon, Mar 29, 2010 at 1:59 PM, M B <ma...@gmail.com> wrote:
>
> > I don't see hadoop-0.20.2-datajoin-examples.jar in the
> > build/contrib/datajoin directory.  Is that a jar you created separately?
>  I
> > tried creating one, but it still doesn't run (the mappers show the same
> > error of missing the classes).
> >
> > hadoop@hadoop01:/opt/hadoop-0.20.2/build/contrib/datajoin$ ls
> > classes  examples  test
> >
> >
> > On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > I can run the sample (I created the input files according to
> > >
> > >
> >
> contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):
> > >
> > > [root@tyu-linux datajoin]# pwd
> > > /opt/ks/hadoop-0.20.2/build/contrib/datajoin
> > > [root@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
> > > hadoop-0.20.2-datajoin-examples.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > Using TextInputFormat: Text
> > > Using TextOutputFormat: Text
> > > 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > > processName=JobTracker, sessionId=
> > > 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
> > > parsing the arguments. Applications should implement Tool for the same.
> > > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 2
> > > Job job_local_0001 is submitted
> > > Job job_local_0001 is still running.
> > > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to
> > process
> > > : 2
> > > 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
> > > 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
> > > 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
> > > 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
> > > 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
> > > 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > > Task:attempt_local_0001_m_000000_0
> > > is done. And is in the process of commiting
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
> > > totalCount      6
> > >
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > 'attempt_local_0001_m_000000_0' done.
> > > 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
> > > 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
> > > 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
> > > 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
> > > 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
> > > 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > > Task:attempt_local_0001_m_000001_0
> > > is done. And is in the process of commiting
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
> > > totalCount      5
> > >
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > 'attempt_local_0001_m_000001_0' done.
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > > 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
> > > 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with
> 2
> > > segments left of total size: 939 bytes
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > > 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
> > > library
> > > 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded &
> > initialized
> > > native-zlib library
> > > 10/03/29 09:01:32 INFO datajoin.job: key: A.a11
> this.largestNumOfValues:
> > 3
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > > Task:attempt_local_0001_r_000000_0
> > > is done. And is in the process of commiting
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > attempt_local_0001_r_000000_0
> > > is allowed to commit now
> > > 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
> > > 'attempt_local_0001_r_000000_0' to
> > > file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
> > > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount
>  5
> > > collectedCount  7
> > > groupCount      6
> > >  > reduce
> > > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > > 'attempt_local_0001_r_000000_0' done.
> > > [root@tyu-linux datajoin]# date
> > > Mon Mar 29 09:02:37 PDT 2010
> > >
> > > It took a minute between the last INFO log and exit of DataJoinJob.
> > >
> > > Cheers
> > >
> > > On Mon, Mar 29, 2010 at 8:26 AM, M B <ma...@gmail.com> wrote:
> > >
> > > > Sorry, I should have mentioned that I tried that as well and it also
> > > gives
> > > > an error:
> > > >
> > > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
> > >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> > > > datajoin/output Text 1
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > Exception in thread "main" java.io.IOException: Error opening job
> jar:
> > > > -libjars
> > > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> > > > Caused by: java.util.zip.ZipException: error in opening zip file
> > > >        at java.util.zip.ZipFile.open(Native Method)
> > > >        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
> > > >        at java.util.jar.JarFile.<init>(JarFile.java:133)
> > > >        at java.util.jar.JarFile.<init>(JarFile.java:70)
> > > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> > > > Has something changed or is my environment not set up correctly?
> > > >  Appreciate
> > > > any help.
> > > >
> > > >
> > > >
> > > > On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > Then use the syntax given by
> > > > >
> > > > >
> > > >
> > >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > > > > :
> > > > >
> > > > > $ bin/hadoop jar -libjars ./samplejoin.jar
> > > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> > > > >
> > > > > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
> > > > >
> > > > > > Sorry, but where exactly do I include the libjars option?  I
> tried
> > to
> > > > put
> > > > > > it
> > > > > > where you stated (after the DataJoinJob class), but it just comes
> > > back
> > > > > with
> > > > > > usage information (as if the option is not valid):
> > > > > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> > > > >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > > ./samplejoin.jar
> > > > > > datajoin/input datajoin/output Text 1
> > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> > > > numofParts
> > > > > > mapper_class reducer_class map_output_value_class
> > output_value_class
> > > > > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > > > > >
> > > > > > It seems like it's not taking the option for some reason, like
> it's
> > > > > failing
> > > > > > an argument check in DataJoinJob - does that not use the standard
> > > args
> > > > or
> > > > > > something?
> > > > > >
> > > > > >
> > > > > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is
> > in
> > > > your
> > > > > > > HADOOP_CLASSPATH
> > > > > > >
> > > > > > > I think you should specify samplejoin.jar using -libjars
> instead
> > of
> > > > > > putting
> > > > > > > it directly after jar command:
> > > > > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > > > ./samplejoin.jar
> > > > > > > ... (same as your example)
> > > > > > >
> > > > > > > Cheers
> > > > > > >
> > > > > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > I may be having a setup issue with classpaths, would
> appreciate
> > > > some
> > > > > > > help.
> > > > > > > >
> > > > > > > > I created a jar with all the Sample* classes in
> > contrib/DataJoin.
> > > > >  Here
> > > > > > > is
> > > > > > > > the listing of my samplejoin.jar file:
> > > > > > > > " zip.vim version v22
> > > > > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > > > > " Select a file with cursor and press ENTER
> > > > > > > > META-INF/
> > > > > > > > META-INF/MANIFEST.MF
> > > > > > > > org/
> > > > > > > > org/apache/
> > > > > > > > org/apache/hadoop/
> > > > > > > > org/apache/hadoop/contrib/
> > > > > > > > org/apache/hadoop/contrib/utils/
> > > > > > > > org/apache/hadoop/contrib/utils/join/
> > > > > > > >
> > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > > > >
> > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > > > >
> org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > > > > >
> > > > > > > > When I go to run this, things start to run, but every Map try
> > > > errors
> > > > > > out
> > > > > > > > with:
> > > > > > > > "java.lang.RuntimeException:
> java.lang.ClassNotFoundException:
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > > > > >
> > > > > > > > Here is the command:
> > > > > > > > hadoop jar ./samplejoin.jar
> > > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > > > > datajoin/input datajoin/output Text 1
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput
> Text
> > > > > > > >
> > > > > > > > This is a new install of 0.20.2.
> > > > > > > >
> > > > > > > > HADOOP_CLASSPATH is set
> > > > > > > > to:
> > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > > > > Any help would be appreciated.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by Ted Yu <yu...@gmail.com>.
Under hadoop-0.20.2/src/contrib/data_join, run
ant jar-examples

You may need to rename the jars
(hadoop-\$\{version\}-datajoin-examples.jar):
[root@tyu-linux datajoin]# ls
classes  examples  hadoop-0.20.2-datajoin-examples.jar
hadoop-0.20.2-datajoin.jar  input  output  test

On Mon, Mar 29, 2010 at 1:59 PM, M B <ma...@gmail.com> wrote:

> I don't see hadoop-0.20.2-datajoin-examples.jar in the
> build/contrib/datajoin directory.  Is that a jar you created separately?  I
> tried creating one, but it still doesn't run (the mappers show the same
> error of missing the classes).
>
> hadoop@hadoop01:/opt/hadoop-0.20.2/build/contrib/datajoin$ ls
> classes  examples  test
>
>
> On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > I can run the sample (I created the input files according to
> >
> >
> contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):
> >
> > [root@tyu-linux datajoin]# pwd
> > /opt/ks/hadoop-0.20.2/build/contrib/datajoin
> > [root@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
> > hadoop-0.20.2-datajoin-examples.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > Using TextInputFormat: Text
> > Using TextOutputFormat: Text
> > 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> > processName=JobTracker, sessionId=
> > 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
> > parsing the arguments. Applications should implement Tool for the same.
> > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 2
> > Job job_local_0001 is submitted
> > Job job_local_0001 is still running.
> > 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 2
> > 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
> > 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
> > 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
> > 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
> > 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
> > 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
> > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > Task:attempt_local_0001_m_000000_0
> > is done. And is in the process of commiting
> > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
> > totalCount      6
> >
> > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > 'attempt_local_0001_m_000000_0' done.
> > 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
> > 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
> > 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
> > 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
> > 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
> > 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
> > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > Task:attempt_local_0001_m_000001_0
> > is done. And is in the process of commiting
> > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
> > totalCount      5
> >
> > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > 'attempt_local_0001_m_000001_0' done.
> > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
> > 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with 2
> > segments left of total size: 939 bytes
> > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
> > library
> > 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded &
> initialized
> > native-zlib library
> > 10/03/29 09:01:32 INFO datajoin.job: key: A.a11 this.largestNumOfValues:
> 3
> > 10/03/29 09:01:32 INFO mapred.TaskRunner:
> > Task:attempt_local_0001_r_000000_0
> > is done. And is in the process of commiting
> > 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > attempt_local_0001_r_000000_0
> > is allowed to commit now
> > 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
> > 'attempt_local_0001_r_000000_0' to
> > file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
> > 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount    5
> > collectedCount  7
> > groupCount      6
> >  > reduce
> > 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> > 'attempt_local_0001_r_000000_0' done.
> > [root@tyu-linux datajoin]# date
> > Mon Mar 29 09:02:37 PDT 2010
> >
> > It took a minute between the last INFO log and exit of DataJoinJob.
> >
> > Cheers
> >
> > On Mon, Mar 29, 2010 at 8:26 AM, M B <ma...@gmail.com> wrote:
> >
> > > Sorry, I should have mentioned that I tried that as well and it also
> > gives
> > > an error:
> > >
> > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
> >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> > > datajoin/output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > Exception in thread "main" java.io.IOException: Error opening job jar:
> > > -libjars
> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> > > Caused by: java.util.zip.ZipException: error in opening zip file
> > >        at java.util.zip.ZipFile.open(Native Method)
> > >        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
> > >        at java.util.jar.JarFile.<init>(JarFile.java:133)
> > >        at java.util.jar.JarFile.<init>(JarFile.java:70)
> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> > > Has something changed or is my environment not set up correctly?
> > >  Appreciate
> > > any help.
> > >
> > >
> > >
> > > On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > Then use the syntax given by
> > > >
> > > >
> > >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > > > :
> > > >
> > > > $ bin/hadoop jar -libjars ./samplejoin.jar
> > > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> > > >
> > > > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
> > > >
> > > > > Sorry, but where exactly do I include the libjars option?  I tried
> to
> > > put
> > > > > it
> > > > > where you stated (after the DataJoinJob class), but it just comes
> > back
> > > > with
> > > > > usage information (as if the option is not valid):
> > > > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> > > >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > ./samplejoin.jar
> > > > > datajoin/input datajoin/output Text 1
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> > > numofParts
> > > > > mapper_class reducer_class map_output_value_class
> output_value_class
> > > > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > > > >
> > > > > It seems like it's not taking the option for some reason, like it's
> > > > failing
> > > > > an argument check in DataJoinJob - does that not use the standard
> > args
> > > or
> > > > > something?
> > > > >
> > > > >
> > > > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > > >
> > > > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is
> in
> > > your
> > > > > > HADOOP_CLASSPATH
> > > > > >
> > > > > > I think you should specify samplejoin.jar using -libjars instead
> of
> > > > > putting
> > > > > > it directly after jar command:
> > > > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > > ./samplejoin.jar
> > > > > > ... (same as your example)
> > > > > >
> > > > > > Cheers
> > > > > >
> > > > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com>
> wrote:
> > > > > >
> > > > > > > I may be having a setup issue with classpaths, would appreciate
> > > some
> > > > > > help.
> > > > > > >
> > > > > > > I created a jar with all the Sample* classes in
> contrib/DataJoin.
> > > >  Here
> > > > > > is
> > > > > > > the listing of my samplejoin.jar file:
> > > > > > > " zip.vim version v22
> > > > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > > > " Select a file with cursor and press ENTER
> > > > > > > META-INF/
> > > > > > > META-INF/MANIFEST.MF
> > > > > > > org/
> > > > > > > org/apache/
> > > > > > > org/apache/hadoop/
> > > > > > > org/apache/hadoop/contrib/
> > > > > > > org/apache/hadoop/contrib/utils/
> > > > > > > org/apache/hadoop/contrib/utils/join/
> > > > > > >
> org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > > >
> org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > > > >
> > > > > > > When I go to run this, things start to run, but every Map try
> > > errors
> > > > > out
> > > > > > > with:
> > > > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > > > >
> > > > > > > Here is the command:
> > > > > > > hadoop jar ./samplejoin.jar
> > > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > > > datajoin/input datajoin/output Text 1
> > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > > > >
> > > > > > > This is a new install of 0.20.2.
> > > > > > >
> > > > > > > HADOOP_CLASSPATH is set
> > > > > > > to:
> > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > > > Any help would be appreciated.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by M B <ma...@gmail.com>.
I don't see hadoop-0.20.2-datajoin-examples.jar in the
build/contrib/datajoin directory.  Is that a jar you created separately?  I
tried creating one, but it still doesn't run (the mappers show the same
error of missing the classes).

hadoop@hadoop01:/opt/hadoop-0.20.2/build/contrib/datajoin$ ls
classes  examples  test


On Mon, Mar 29, 2010 at 9:26 AM, Ted Yu <yu...@gmail.com> wrote:

> I can run the sample (I created the input files according to
>
> contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):
>
> [root@tyu-linux datajoin]# pwd
> /opt/ks/hadoop-0.20.2/build/contrib/datajoin
> [root@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
> hadoop-0.20.2-datajoin-examples.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> Using TextInputFormat: Text
> Using TextOutputFormat: Text
> 10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> Job job_local_0001 is submitted
> Job job_local_0001 is still running.
> 10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
> : 2
> 10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
> 10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
> 10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
> 10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
> 10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000000_0
> is done. And is in the process of commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
> totalCount      6
>
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_000000_0' done.
> 10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
> 10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
> 10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
> 10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
> 10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
> 10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_m_000001_0
> is done. And is in the process of commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
> totalCount      5
>
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_m_000001_0' done.
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
> 10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with 2
> segments left of total size: 939 bytes
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded & initialized
> native-zlib library
> 10/03/29 09:01:32 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3
> 10/03/29 09:01:32 INFO mapred.TaskRunner:
> Task:attempt_local_0001_r_000000_0
> is done. And is in the process of commiting
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner:
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> attempt_local_0001_r_000000_0
> is allowed to commit now
> 10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
> 'attempt_local_0001_r_000000_0' to
> file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
> 10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount    5
> collectedCount  7
> groupCount      6
>  > reduce
> 10/03/29 09:01:32 INFO mapred.TaskRunner: Task
> 'attempt_local_0001_r_000000_0' done.
> [root@tyu-linux datajoin]# date
> Mon Mar 29 09:02:37 PDT 2010
>
> It took a minute between the last INFO log and exit of DataJoinJob.
>
> Cheers
>
> On Mon, Mar 29, 2010 at 8:26 AM, M B <ma...@gmail.com> wrote:
>
> > Sorry, I should have mentioned that I tried that as well and it also
> gives
> > an error:
> >
> > $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
>  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> > datajoin/output Text 1
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > Exception in thread "main" java.io.IOException: Error opening job jar:
> > -libjars
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> > Caused by: java.util.zip.ZipException: error in opening zip file
> >        at java.util.zip.ZipFile.open(Native Method)
> >        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
> >        at java.util.jar.JarFile.<init>(JarFile.java:133)
> >        at java.util.jar.JarFile.<init>(JarFile.java:70)
> >        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> > Has something changed or is my environment not set up correctly?
> >  Appreciate
> > any help.
> >
> >
> >
> > On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Then use the syntax given by
> > >
> > >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > > :
> > >
> > > $ bin/hadoop jar -libjars ./samplejoin.jar
> > > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> > >
> > > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
> > >
> > > > Sorry, but where exactly do I include the libjars option?  I tried to
> > put
> > > > it
> > > > where you stated (after the DataJoinJob class), but it just comes
> back
> > > with
> > > > usage information (as if the option is not valid):
> > > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> > >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > ./samplejoin.jar
> > > > datajoin/input datajoin/output Text 1
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> > numofParts
> > > > mapper_class reducer_class map_output_value_class output_value_class
> > > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > > >
> > > > It seems like it's not taking the option for some reason, like it's
> > > failing
> > > > an argument check in DataJoinJob - does that not use the standard
> args
> > or
> > > > something?
> > > >
> > > >
> > > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
> > > >
> > > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in
> > your
> > > > > HADOOP_CLASSPATH
> > > > >
> > > > > I think you should specify samplejoin.jar using -libjars instead of
> > > > putting
> > > > > it directly after jar command:
> > > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > > ./samplejoin.jar
> > > > > ... (same as your example)
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> > > > >
> > > > > > I may be having a setup issue with classpaths, would appreciate
> > some
> > > > > help.
> > > > > >
> > > > > > I created a jar with all the Sample* classes in contrib/DataJoin.
> > >  Here
> > > > > is
> > > > > > the listing of my samplejoin.jar file:
> > > > > > " zip.vim version v22
> > > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > > " Select a file with cursor and press ENTER
> > > > > > META-INF/
> > > > > > META-INF/MANIFEST.MF
> > > > > > org/
> > > > > > org/apache/
> > > > > > org/apache/hadoop/
> > > > > > org/apache/hadoop/contrib/
> > > > > > org/apache/hadoop/contrib/utils/
> > > > > > org/apache/hadoop/contrib/utils/join/
> > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > > >
> > > > > > When I go to run this, things start to run, but every Map try
> > errors
> > > > out
> > > > > > with:
> > > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > > >
> > > > > > Here is the command:
> > > > > > hadoop jar ./samplejoin.jar
> > > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > > datajoin/input datajoin/output Text 1
> > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > > >
> > > > > > This is a new install of 0.20.2.
> > > > > >
> > > > > > HADOOP_CLASSPATH is set
> > > > > > to:
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > > Any help would be appreciated.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by Ted Yu <yu...@gmail.com>.
I can run the sample (I created the input files according to
contrib/data_join/src/examples/org/apache/hadoop/contrib/utils/join/README.txt):

[root@tyu-linux datajoin]# pwd
/opt/ks/hadoop-0.20.2/build/contrib/datajoin
[root@tyu-linux datajoin]# /opt/ks/hadoop-0.20.2/bin/hadoop jar
hadoop-0.20.2-datajoin-examples.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob input output Text 1
org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
Using TextInputFormat: Text
Using TextOutputFormat: Text
10/03/29 09:01:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
10/03/29 09:01:30 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
: 2
Job job_local_0001 is submitted
Job job_local_0001 is still running.
10/03/29 09:01:30 INFO mapred.FileInputFormat: Total input paths to process
: 2
10/03/29 09:01:31 INFO mapred.MapTask: numReduceTasks: 1
10/03/29 09:01:31 INFO mapred.MapTask: io.sort.mb = 100
10/03/29 09:01:31 INFO mapred.MapTask: data buffer = 79691776/99614720
10/03/29 09:01:31 INFO mapred.MapTask: record buffer = 262144/327680
10/03/29 09:01:31 INFO mapred.MapTask: Starting flush of map output
10/03/29 09:01:31 INFO mapred.MapTask: Finished spill 0
10/03/29 09:01:32 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0
is done. And is in the process of commiting
10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    6
totalCount      6

10/03/29 09:01:32 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000000_0' done.
10/03/29 09:01:32 INFO mapred.MapTask: numReduceTasks: 1
10/03/29 09:01:32 INFO mapred.MapTask: io.sort.mb = 100
10/03/29 09:01:32 INFO mapred.MapTask: data buffer = 79691776/99614720
10/03/29 09:01:32 INFO mapred.MapTask: record buffer = 262144/327680
10/03/29 09:01:32 INFO mapred.MapTask: Starting flush of map output
10/03/29 09:01:32 INFO mapred.MapTask: Finished spill 0
10/03/29 09:01:32 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0
is done. And is in the process of commiting
10/03/29 09:01:32 INFO mapred.LocalJobRunner: collectedCount    5
totalCount      5

10/03/29 09:01:32 INFO mapred.TaskRunner: Task
'attempt_local_0001_m_000001_0' done.
10/03/29 09:01:32 INFO mapred.LocalJobRunner:
10/03/29 09:01:32 INFO mapred.Merger: Merging 2 sorted segments
10/03/29 09:01:32 INFO mapred.Merger: Down to the last merge-pass, with 2
segments left of total size: 939 bytes
10/03/29 09:01:32 INFO mapred.LocalJobRunner:
10/03/29 09:01:32 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
10/03/29 09:01:32 INFO zlib.ZlibFactory: Successfully loaded & initialized
native-zlib library
10/03/29 09:01:32 INFO datajoin.job: key: A.a11 this.largestNumOfValues: 3
10/03/29 09:01:32 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0
is done. And is in the process of commiting
10/03/29 09:01:32 INFO mapred.LocalJobRunner:
10/03/29 09:01:32 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0
is allowed to commit now
10/03/29 09:01:32 INFO mapred.FileOutputCommitter: Saved output of task
'attempt_local_0001_r_000000_0' to
file:/opt/kindsight/hadoop-0.20.2/build/contrib/datajoin/output
10/03/29 09:01:32 INFO mapred.LocalJobRunner: actuallyCollectedCount    5
collectedCount  7
groupCount      6
 > reduce
10/03/29 09:01:32 INFO mapred.TaskRunner: Task
'attempt_local_0001_r_000000_0' done.
[root@tyu-linux datajoin]# date
Mon Mar 29 09:02:37 PDT 2010

It took a minute between the last INFO log and exit of DataJoinJob.

Cheers

On Mon, Mar 29, 2010 at 8:26 AM, M B <ma...@gmail.com> wrote:

> Sorry, I should have mentioned that I tried that as well and it also gives
> an error:
>
> $ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
> datajoin/output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> Exception in thread "main" java.io.IOException: Error opening job jar:
> -libjars
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
> Caused by: java.util.zip.ZipException: error in opening zip file
>        at java.util.zip.ZipFile.open(Native Method)
>        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
>        at java.util.jar.JarFile.<init>(JarFile.java:133)
>        at java.util.jar.JarFile.<init>(JarFile.java:70)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
> Has something changed or is my environment not set up correctly?
>  Appreciate
> any help.
>
>
>
> On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Then use the syntax given by
> >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> > :
> >
> > $ bin/hadoop jar -libjars ./samplejoin.jar
> > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
> >
> > On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
> >
> > > Sorry, but where exactly do I include the libjars option?  I tried to
> put
> > > it
> > > where you stated (after the DataJoinJob class), but it just comes back
> > with
> > > usage information (as if the option is not valid):
> > > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> >  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > ./samplejoin.jar
> > > datajoin/input datajoin/output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > *usage: DataJoinJob inputdirs outputdir map_input_file_format
> numofParts
> > > mapper_class reducer_class map_output_value_class output_value_class
> > > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> > >
> > > It seems like it's not taking the option for some reason, like it's
> > failing
> > > an argument check in DataJoinJob - does that not use the standard args
> or
> > > something?
> > >
> > >
> > > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in
> your
> > > > HADOOP_CLASSPATH
> > > >
> > > > I think you should specify samplejoin.jar using -libjars instead of
> > > putting
> > > > it directly after jar command:
> > > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > > ./samplejoin.jar
> > > > ... (same as your example)
> > > >
> > > > Cheers
> > > >
> > > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> > > >
> > > > > I may be having a setup issue with classpaths, would appreciate
> some
> > > > help.
> > > > >
> > > > > I created a jar with all the Sample* classes in contrib/DataJoin.
> >  Here
> > > > is
> > > > > the listing of my samplejoin.jar file:
> > > > > " zip.vim version v22
> > > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > > " Select a file with cursor and press ENTER
> > > > > META-INF/
> > > > > META-INF/MANIFEST.MF
> > > > > org/
> > > > > org/apache/
> > > > > org/apache/hadoop/
> > > > > org/apache/hadoop/contrib/
> > > > > org/apache/hadoop/contrib/utils/
> > > > > org/apache/hadoop/contrib/utils/join/
> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > > >
> > > > > When I go to run this, things start to run, but every Map try
> errors
> > > out
> > > > > with:
> > > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > > >
> > > > > Here is the command:
> > > > > hadoop jar ./samplejoin.jar
> > > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > > datajoin/input datajoin/output Text 1
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > > >
> > > > > This is a new install of 0.20.2.
> > > > >
> > > > > HADOOP_CLASSPATH is set
> > > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > > Any help would be appreciated.
> > > > >
> > > >
> > >
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by M B <ma...@gmail.com>.
Sorry, I should have mentioned that I tried that as well and it also gives
an error:

$ <p@hadoop01:~/hadoop_tests$> hadoop jar -libjars ./samplejoin.jar
/opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input
datajoin/output Text 1
org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
Exception in thread "main" java.io.IOException: Error opening job jar:
-libjars
        at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
Caused by: java.util.zip.ZipException: error in opening zip file
        at java.util.zip.ZipFile.open(Native Method)
        at java.util.zip.ZipFile.<init>(ZipFile.java:114)
        at java.util.jar.JarFile.<init>(JarFile.java:133)
        at java.util.jar.JarFile.<init>(JarFile.java:70)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
Has something changed or is my environment not set up correctly?  Appreciate
any help.



On Fri, Mar 26, 2010 at 8:23 PM, Ted Yu <yu...@gmail.com> wrote:

> Then use the syntax given by
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
> :
>
> $ bin/hadoop jar -libjars ./samplejoin.jar
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...
>
> On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:
>
> > Sorry, but where exactly do I include the libjars option?  I tried to put
> > it
> > where you stated (after the DataJoinJob class), but it just comes back
> with
> > usage information (as if the option is not valid):
> > $ <p@hadoop01:~/hadoop_tests$> hadoop jar
>  > /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> ./samplejoin.jar
> > datajoin/input datajoin/output Text 1
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > *usage: DataJoinJob inputdirs outputdir map_input_file_format numofParts
> > mapper_class reducer_class map_output_value_class output_value_class
> > [maxNumOfValuesPerGroup [descriptionOfJob]]]*
> >
> > It seems like it's not taking the option for some reason, like it's
> failing
> > an argument check in DataJoinJob - does that not use the standard args or
> > something?
> >
> >
> > On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in your
> > > HADOOP_CLASSPATH
> > >
> > > I think you should specify samplejoin.jar using -libjars instead of
> > putting
> > > it directly after jar command:
> > > hadoop jar hadoop-0.20.2-datajoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> > ./samplejoin.jar
> > > ... (same as your example)
> > >
> > > Cheers
> > >
> > > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> > >
> > > > I may be having a setup issue with classpaths, would appreciate some
> > > help.
> > > >
> > > > I created a jar with all the Sample* classes in contrib/DataJoin.
>  Here
> > > is
> > > > the listing of my samplejoin.jar file:
> > > > " zip.vim version v22
> > > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > > " Select a file with cursor and press ENTER
> > > > META-INF/
> > > > META-INF/MANIFEST.MF
> > > > org/
> > > > org/apache/
> > > > org/apache/hadoop/
> > > > org/apache/hadoop/contrib/
> > > > org/apache/hadoop/contrib/utils/
> > > > org/apache/hadoop/contrib/utils/join/
> > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > > >
> > > > When I go to run this, things start to run, but every Map try errors
> > out
> > > > with:
> > > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > > >
> > > > Here is the command:
> > > > hadoop jar ./samplejoin.jar
> > > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > > datajoin/input datajoin/output Text 1
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > > >
> > > > This is a new install of 0.20.2.
> > > >
> > > > HADOOP_CLASSPATH is set
> > > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > > Any help would be appreciated.
> > > >
> > >
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by Ted Yu <yu...@gmail.com>.
Then use the syntax given by
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/GenericOptionsParser.html
:

$ bin/hadoop jar -libjars ./samplejoin.jar
/opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob datajoin/input ...

On Fri, Mar 26, 2010 at 5:10 PM, M B <ma...@gmail.com> wrote:

> Sorry, but where exactly do I include the libjars option?  I tried to put
> it
> where you stated (after the DataJoinJob class), but it just comes back with
> usage information (as if the option is not valid):
> $ <p@hadoop01:~/hadoop_tests$> hadoop jar
> /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars ./samplejoin.jar
> datajoin/input datajoin/output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> *usage: DataJoinJob inputdirs outputdir map_input_file_format numofParts
> mapper_class reducer_class map_output_value_class output_value_class
> [maxNumOfValuesPerGroup [descriptionOfJob]]]*
>
> It seems like it's not taking the option for some reason, like it's failing
> an argument check in DataJoinJob - does that not use the standard args or
> something?
>
>
> On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in your
> > HADOOP_CLASSPATH
> >
> > I think you should specify samplejoin.jar using -libjars instead of
> putting
> > it directly after jar command:
> > hadoop jar hadoop-0.20.2-datajoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars
> ./samplejoin.jar
> > ... (same as your example)
> >
> > Cheers
> >
> > On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
> >
> > > I may be having a setup issue with classpaths, would appreciate some
> > help.
> > >
> > > I created a jar with all the Sample* classes in contrib/DataJoin.  Here
> > is
> > > the listing of my samplejoin.jar file:
> > > " zip.vim version v22
> > > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > > " Select a file with cursor and press ENTER
> > > META-INF/
> > > META-INF/MANIFEST.MF
> > > org/
> > > org/apache/
> > > org/apache/hadoop/
> > > org/apache/hadoop/contrib/
> > > org/apache/hadoop/contrib/utils/
> > > org/apache/hadoop/contrib/utils/join/
> > > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> > >
> > > When I go to run this, things start to run, but every Map try errors
> out
> > > with:
> > > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> > >
> > > Here is the command:
> > > hadoop jar ./samplejoin.jar
> > > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > > datajoin/input datajoin/output Text 1
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> > >
> > > This is a new install of 0.20.2.
> > >
> > > HADOOP_CLASSPATH is set
> > > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > > Any help would be appreciated.
> > >
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by M B <ma...@gmail.com>.
Sorry, but where exactly do I include the libjars option?  I tried to put it
where you stated (after the DataJoinJob class), but it just comes back with
usage information (as if the option is not valid):
$ <p@hadoop01:~/hadoop_tests$> hadoop jar
/opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars ./samplejoin.jar
datajoin/input datajoin/output Text 1
org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
*usage: DataJoinJob inputdirs outputdir map_input_file_format numofParts
mapper_class reducer_class map_output_value_class output_value_class
[maxNumOfValuesPerGroup [descriptionOfJob]]]*

It seems like it's not taking the option for some reason, like it's failing
an argument check in DataJoinJob - does that not use the standard args or
something?


On Fri, Mar 26, 2010 at 4:38 PM, Ted Yu <yu...@gmail.com> wrote:

> DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in your
> HADOOP_CLASSPATH
>
> I think you should specify samplejoin.jar using -libjars instead of putting
> it directly after jar command:
> hadoop jar hadoop-0.20.2-datajoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars ./samplejoin.jar
> ... (same as your example)
>
> Cheers
>
> On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:
>
> > I may be having a setup issue with classpaths, would appreciate some
> help.
> >
> > I created a jar with all the Sample* classes in contrib/DataJoin.  Here
> is
> > the listing of my samplejoin.jar file:
> > " zip.vim version v22
> > " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> > " Select a file with cursor and press ENTER
> > META-INF/
> > META-INF/MANIFEST.MF
> > org/
> > org/apache/
> > org/apache/hadoop/
> > org/apache/hadoop/contrib/
> > org/apache/hadoop/contrib/utils/
> > org/apache/hadoop/contrib/utils/join/
> > org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> > org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> > org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
> >
> > When I go to run this, things start to run, but every Map try errors out
> > with:
> > "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
> >
> > Here is the command:
> > hadoop jar ./samplejoin.jar
> > org.apache.hadoop.contrib.utils.join.DataJoinJob
> > datajoin/input datajoin/output Text 1
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> > org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> > org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
> >
> > This is a new install of 0.20.2.
> >
> > HADOOP_CLASSPATH is set
> > to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> > Any help would be appreciated.
> >
>

Re: ClassNotFoundException with contrib/join example

Posted by Ted Yu <yu...@gmail.com>.
DataJoinJob is contained in hadoop-0.20.2-datajoin.jar which is in your
HADOOP_CLASSPATH

I think you should specify samplejoin.jar using -libjars instead of putting
it directly after jar command:
hadoop jar hadoop-0.20.2-datajoin.jar
org.apache.hadoop.contrib.utils.join.DataJoinJob -libjars ./samplejoin.jar
... (same as your example)

Cheers

On Fri, Mar 26, 2010 at 3:24 PM, M B <ma...@gmail.com> wrote:

> I may be having a setup issue with classpaths, would appreciate some help.
>
> I created a jar with all the Sample* classes in contrib/DataJoin.  Here is
> the listing of my samplejoin.jar file:
> " zip.vim version v22
> " Browsing zipfile /home/hadoop/hadoop_tests/samplejoin.jar
> " Select a file with cursor and press ENTER
> META-INF/
> META-INF/MANIFEST.MF
> org/
> org/apache/
> org/apache/hadoop/
> org/apache/hadoop/contrib/
> org/apache/hadoop/contrib/utils/
> org/apache/hadoop/contrib/utils/join/
> org/apache/hadoop/contrib/utils/join/SampleDataJoinReducer.class
> org/apache/hadoop/contrib/utils/join/SampleTaggedMapOutput.class
> org/apache/hadoop/contrib/utils/join/SampleDataJoinMapper.class
>
> When I go to run this, things start to run, but every Map try errors out
> with:
> "java.lang.RuntimeException: java.lang.ClassNotFoundException:
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput"
>
> Here is the command:
> hadoop jar ./samplejoin.jar
> org.apache.hadoop.contrib.utils.join.DataJoinJob
> datajoin/input datajoin/output Text 1
> org.apache.hadoop.contrib.utils.join.SampleDataJoinMapper
> org.apache.hadoop.contrib.utils.join.SampleDataJoinReducer
> org.apache.hadoop.contrib.utils.join.SampleTaggedMapOutput Text
>
> This is a new install of 0.20.2.
>
> HADOOP_CLASSPATH is set
> to: /opt/hadoop-0.20.2/contrib/datajoin/hadoop-0.20.2-datajoin.jar
> Any help would be appreciated.
>