You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by zzcgiacomini <zz...@echo.fr> on 2006/05/26 11:29:10 UTC
mergesegs (nutch-08) : what is the right syntax ?
Hi all,
I have two segments test/segments/20060511101525 and
test/segments/20060523
Which I would like to merge into one new segment using "mergesegs" so
far without success.
- nutch mergesegs test/segments/test/segments/20060526095530
test/segments/20060511101525 test/segments/20060523095535
- nutch mergesegs test/segments/20060526095530 -dir test/segments
whatever I try I always get Exception raise at the same place :
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
at
org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
at
org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)
and the job tracker logs the following lines :
060526 105956 job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293)
at java.lang.Thread.run(Thread.java:595)
060526 105959 Server connection on port 9011 from 10.234.57.38: exiting
Any Help ?
Thanks in advance
-Corrado
Re: mergesegs (nutch-08) : what is the right syntax ?
Posted by zzcgiacomini <zz...@echo.fr>.
Yes you are right,
I have just added the following line in hadoop script in my nutch
installation
before it build classpath with hadoop-*.jar
May be is not the proper place but I got my segment merged
for f in $HADOOP_HOME/nutch-*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
Thanks Andrzej,
-Corrado
Andrzej Bialecki wrote:
> zzcgiacomini wrote:
>> Hi all,
>>
>> I have two segments test/segments/20060511101525 and
>> test/segments/20060523
>> Which I would like to merge into one new segment using "mergesegs" so
>> far without success.
>>
>> - nutch mergesegs test/segments/test/segments/20060526095530
>> test/segments/20060511101525 test/segments/20060523095535
>> - nutch mergesegs test/segments/20060526095530 -dir test/segments
>>
>> whatever I try I always get Exception raise at the same place :
>>
>> Exception in thread "main" java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
>> at
>> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
>> at
>> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)
>>
>> and the job tracker logs the following lines :
>>
>> 060526 105956 job init failed
>> java.io.IOException: java.lang.ClassNotFoundException:
>> org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
>> at
>> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
>> at
>> org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293)
>>
>> at java.lang.Thread.run(Thread.java:595)
>> 060526 105959 Server connection on port 9011 from 10.234.57.38: exiting
>>
>> Any Help ?
>
> This looks like an (already known) problem with Hadoop and the
> classloader - input and output format classes need to be deployed on
> the tasktracker, and not just submitted in the *.job jar.
>
> Simply speaking, you need to put the nutch*.jar on the classpath of
> all tasktrackers.
>
Re: mergesegs (nutch-08) : what is the right syntax ?
Posted by Andrzej Bialecki <ab...@getopt.org>.
zzcgiacomini wrote:
> Hi all,
>
> I have two segments test/segments/20060511101525 and
> test/segments/20060523
> Which I would like to merge into one new segment using "mergesegs" so
> far without success.
>
> - nutch mergesegs test/segments/test/segments/20060526095530
> test/segments/20060511101525 test/segments/20060523095535
> - nutch mergesegs test/segments/20060526095530 -dir test/segments
>
> whatever I try I always get Exception raise at the same place :
>
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:341)
> at
> org.apache.nutch.segment.SegmentMerger.merge(SegmentMerger.java:596)
> at
> org.apache.nutch.segment.SegmentMerger.main(SegmentMerger.java:644)
>
> and the job tracker logs the following lines :
>
> 060526 105956 job init failed
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.nutch.segment.SegmentMerger$ObjectInputFormat
> at
> org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:119)
> at
> org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:293)
>
> at java.lang.Thread.run(Thread.java:595)
> 060526 105959 Server connection on port 9011 from 10.234.57.38: exiting
>
> Any Help ?
This looks like an (already known) problem with Hadoop and the
classloader - input and output format classes need to be deployed on the
tasktracker, and not just submitted in the *.job jar.
Simply speaking, you need to put the nutch*.jar on the classpath of all
tasktrackers.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com