You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by zzcgiacomini <zz...@echo.fr> on 2006/05/26 11:29:10 UTC

mergesegs (nutch-08) : what is the right syntax ?

Hi all,

I have two segments test/segments/20060511101525   and 
test/segments/20060523
Which I would like to merge into one new segment using "mergesegs" so 
far without  success.

- nutch mergesegs  test/segments/test/segments/20060526095530 
test/segments/20060511101525   test/segments/20060523095535
- nutch mergesegs  test/segments/20060526095530 -dir test/segments

whatever I try I always get Exception raise at the same place :

Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
        at 
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
        at 
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)

and the job tracker logs the following lines :

060526 105956 job init failed
java.io.IOException: java.lang.ClassNotFoundException: 
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
        at 
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
        at 
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293)
        at java.lang.Thread.run(Thread.java:595)
060526 105959 Server connection on port 9011 from 10.234.57.38: exiting

Any Help ?

Thanks in advance

-Corrado

Re: mergesegs (nutch-08) : what is the right syntax ?

Posted by zzcgiacomini <zz...@echo.fr>.
Yes you are right,
I have just added the following line in hadoop script in my nutch 
installation
before it build  classpath with hadoop-*.jar
May be is not the proper place but I got my segment merged

for f in $HADOOP_HOME/nutch-*.jar; do
  CLASSPATH=${CLASSPATH}:$f;
done

Thanks Andrzej,

-Corrado

Andrzej Bialecki wrote:
> zzcgiacomini wrote:
>> Hi all,
>>
>> I have two segments test/segments/20060511101525   and 
>> test/segments/20060523
>> Which I would like to merge into one new segment using "mergesegs" so 
>> far without  success.
>>
>> - nutch mergesegs  test/segments/test/segments/20060526095530 
>> test/segments/20060511101525   test/segments/20060523095535
>> - nutch mergesegs  test/segments/20060526095530 -dir test/segments
>>
>> whatever I try I always get Exception raise at the same place :
>>
>> Exception in thread "main" java.io.IOException: Job failed!
>>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>>        at 
>> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
>>        at 
>> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)
>>
>> and the job tracker logs the following lines :
>>
>> 060526 105956 job init failed
>> java.io.IOException: java.lang.ClassNotFoundException: 
>> org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
>>        at 
>> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
>>        at 
>> org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293) 
>>
>>        at java.lang.Thread.run(Thread.java:595)
>> 060526 105959 Server connection on port 9011 from 10.234.57.38: exiting
>>
>> Any Help ?
>
> This looks like an (already known) problem with Hadoop and the 
> classloader - input and output format classes need to be deployed on 
> the tasktracker, and not just submitted in the *.job jar.
>
> Simply speaking, you need to put the nutch*.jar on the classpath of 
> all tasktrackers.
>


Re: mergesegs (nutch-08) : what is the right syntax ?

Posted by Andrzej Bialecki <ab...@getopt.org>.
zzcgiacomini wrote:
> Hi all,
>
> I have two segments test/segments/20060511101525   and 
> test/segments/20060523
> Which I would like to merge into one new segment using "mergesegs" so 
> far without  success.
>
> - nutch mergesegs  test/segments/test/segments/20060526095530 
> test/segments/20060511101525   test/segments/20060523095535
> - nutch mergesegs  test/segments/20060526095530 -dir test/segments
>
> whatever I try I always get Exception raise at the same place :
>
> Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>        at 
> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
>        at 
> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)
>
> and the job tracker logs the following lines :
>
> 060526 105956 job init failed
> java.io.IOException: java.lang.ClassNotFoundException: 
> org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
>        at 
> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
>        at 
> org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293) 
>
>        at java.lang.Thread.run(Thread.java:595)
> 060526 105959 Server connection on port 9011 from 10.234.57.38: exiting
>
> Any Help ?

This looks like an (already known) problem with Hadoop and the 
classloader - input and output format classes need to be deployed on the 
tasktracker, and not just submitted in the *.job jar.

Simply speaking, you need to put the nutch*.jar on the classpath of all 
tasktrackers.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com