You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vishal Shah <vi...@rediff.co.in> on 2006/09/12 07:11:51 UTC

ClassNotFoundException while using segread

Hi,
 
   I am trying to use the dump option in the segread command to get a
segment's dump. However, I see the ClassNotFound exception for
SegmentReader$InputFormat. Has anyone else experienced this? How do I
resolve it?
 
[nutch@machine1 search]$ bin/nutch readseg -dump
crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
-nocontent -nofetch -noparse -noparsedata -noparsetext
SegmentReader: dump segment: crawl1/segments/20060908210708
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
        at
org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
        at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
 
 
[nutch@machine1 search]$ tail logs/nutch.log
2006-09-12 12:50:52,675 WARN  mapred.JobTracker - job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentReader$InputFormat
        at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:123)
        at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:31
4)
        at java.lang.Thread.run(Thread.java:595)
 
Thanks,
 
-vishal.

RE: ClassNotFoundException while using segread

Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Dennis,

  Thanks a lot for the response. I added the for loop in my hadoop
script, and it works now. I really appreciate your help in helping us
newbies sort things out!

Regards,

-vishal.

-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com] 
Sent: Tuesday, September 12, 2006 6:37 PM
To: nutch-user@lucene.apache.org
Subject: Re: ClassNotFoundException while using segread

Isn't this the same problem that was happening before with the 
SegmentMerger I think where the nutch-x.x.jar needed to be added to the 
classpath on all of the task trackers.  We added the following code to 
our hadoop script just below the other for loops and redeployed script 
and restarted all task trackers:

for f in $HADOOP_HOME/nutch-*.jar; do
 CLASSPATH=${CLASSPATH}:$f;
done

Dennis

Vishal Shah wrote:
> Hi Andrzej,
>
>   Thanks for the reply. I have a job running on the system right now,
> but I'll try to reinstall (redeploy ;-)) it after it is done.
>
> Regards,
>
> -vishal.
>
> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org] 
> Sent: Tuesday, September 12, 2006 2:40 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: ClassNotFoundException while using segread
>
> Vishal Shah wrote:
>   
>> Hi Andrzej,
>>
>>   Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
>> the instructions in the hadoop/nutch tutorial. Currently, I have
>>     
> copied
>   
>>   
>>     
>
> Ok, then forget my explanation - it is still true, but not applicable
to
>
> your case.
>
>   
>> the nutch jars in my NUTCH_HOME directory. I tried copying the
>> nutch-xxxx.job to my lib directory, but that doesn't work too. 
>>
>>   
>>     
>
> No, you shouldn't need to do this. The scripts should find all
necessary
>
> jars and put them on CLASSPATH.
>
>   
>>   Do I need to set the CLASSPATH before I run bin/start-all.sh, or is
>>     
> it
>   
>> something else? Sorry, I am new to Java development, so I don't know
>> what you mean by deploying something.
>>   
>>     
>
> Well, I'm not sure what could be wrong... Does it occur for you with
the
>
> clean installation,  i.e. if you get a fresh copy, rebuild, reinstall 
> from scratch and try again?
>
>   


Re: ClassNotFoundException while using segread

Posted by Dennis Kubes <nu...@dragonflymc.com>.
Isn't this the same problem that was happening before with the 
SegmentMerger I think where the nutch-x.x.jar needed to be added to the 
classpath on all of the task trackers.  We added the following code to 
our hadoop script just below the other for loops and redeployed script 
and restarted all task trackers:

for f in $HADOOP_HOME/nutch-*.jar; do
 CLASSPATH=${CLASSPATH}:$f;
done

Dennis

Vishal Shah wrote:
> Hi Andrzej,
>
>   Thanks for the reply. I have a job running on the system right now,
> but I'll try to reinstall (redeploy ;-)) it after it is done.
>
> Regards,
>
> -vishal.
>
> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org] 
> Sent: Tuesday, September 12, 2006 2:40 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: ClassNotFoundException while using segread
>
> Vishal Shah wrote:
>   
>> Hi Andrzej,
>>
>>   Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
>> the instructions in the hadoop/nutch tutorial. Currently, I have
>>     
> copied
>   
>>   
>>     
>
> Ok, then forget my explanation - it is still true, but not applicable to
>
> your case.
>
>   
>> the nutch jars in my NUTCH_HOME directory. I tried copying the
>> nutch-xxxx.job to my lib directory, but that doesn't work too. 
>>
>>   
>>     
>
> No, you shouldn't need to do this. The scripts should find all necessary
>
> jars and put them on CLASSPATH.
>
>   
>>   Do I need to set the CLASSPATH before I run bin/start-all.sh, or is
>>     
> it
>   
>> something else? Sorry, I am new to Java development, so I don't know
>> what you mean by deploying something.
>>   
>>     
>
> Well, I'm not sure what could be wrong... Does it occur for you with the
>
> clean installation,  i.e. if you get a fresh copy, rebuild, reinstall 
> from scratch and try again?
>
>   

RE: ClassNotFoundException while using segread

Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Andrzej,

  Thanks for the reply. I have a job running on the system right now,
but I'll try to reinstall (redeploy ;-)) it after it is done.

Regards,

-vishal.

-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: Tuesday, September 12, 2006 2:40 PM
To: nutch-user@lucene.apache.org
Subject: Re: ClassNotFoundException while using segread

Vishal Shah wrote:
> Hi Andrzej,
>
>   Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
> the instructions in the hadoop/nutch tutorial. Currently, I have
copied
>   

Ok, then forget my explanation - it is still true, but not applicable to

your case.

> the nutch jars in my NUTCH_HOME directory. I tried copying the
> nutch-xxxx.job to my lib directory, but that doesn't work too. 
>
>   

No, you shouldn't need to do this. The scripts should find all necessary

jars and put them on CLASSPATH.

>   Do I need to set the CLASSPATH before I run bin/start-all.sh, or is
it
> something else? Sorry, I am new to Java development, so I don't know
> what you mean by deploying something.
>   

Well, I'm not sure what could be wrong... Does it occur for you with the

clean installation,  i.e. if you get a fresh copy, rebuild, reinstall 
from scratch and try again?

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: ClassNotFoundException while using segread

Posted by Andrzej Bialecki <ab...@getopt.org>.
Vishal Shah wrote:
> Hi Andrzej,
>
>   Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
> the instructions in the hadoop/nutch tutorial. Currently, I have copied
>   

Ok, then forget my explanation - it is still true, but not applicable to 
your case.

> the nutch jars in my NUTCH_HOME directory. I tried copying the
> nutch-xxxx.job to my lib directory, but that doesn't work too. 
>
>   

No, you shouldn't need to do this. The scripts should find all necessary 
jars and put them on CLASSPATH.

>   Do I need to set the CLASSPATH before I run bin/start-all.sh, or is it
> something else? Sorry, I am new to Java development, so I don't know
> what you mean by deploying something.
>   

Well, I'm not sure what could be wrong... Does it occur for you with the 
clean installation,  i.e. if you get a fresh copy, rebuild, reinstall 
from scratch and try again?

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



RE: ClassNotFoundException while using segread

Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Andrzej,

  Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
the instructions in the hadoop/nutch tutorial. Currently, I have copied
the nutch jars in my NUTCH_HOME directory. I tried copying the
nutch-xxxx.job to my lib directory, but that doesn't work too. 

  Do I need to set the CLASSPATH before I run bin/start-all.sh, or is it
something else? Sorry, I am new to Java development, so I don't know
what you mean by deploying something.

Thanks,

-vishal.

-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org] 
Sent: Tuesday, September 12, 2006 12:06 PM
To: nutch-user@lucene.apache.org
Subject: Re: ClassNotFoundException while using segread

Vishal Shah wrote:
> Hi,
>  
>    I am trying to use the dump option in the segread command to get a
> segment's dump. However, I see the ClassNotFound exception for
> SegmentReader$InputFormat. Has anyone else experienced this? How do I
> resolve it?
>  
> [nutch@machine1 search]$ bin/nutch readseg -dump
> crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
> -nocontent -nofetch -noparse -noparsedata -noparsetext
> SegmentReader: dump segment: crawl1/segments/20060908210708
> Exception in thread "main" java.io.IOException: Job failed!
>         at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
>         at
> org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
>         at
> org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
>  
>  
> [nutch@machine1 search]$ tail logs/nutch.log
> 2006-09-12 12:50:52,675 WARN  mapred.JobTracker - job init failed
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.nutch.segment.SegmentReader$InputFormat
>   

How are you deploying Hadoop/Nutch? If you run just plain Hadoop 
cluster, without deploying Nutch jars, and then only submit Nutch job 
jar, then Hadoop cannot process input files that require custom 
InputFormats, because at this moment the TaskTracker's classloader 
doesn't yet have access to the InputFormat defined in the job jar.

A workaround is to deploy the nutch-xxxx.jar too, in addition to 
Hadoop-only jars. I believe this has been solved in the newer versions 
of Hadoop.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: ClassNotFoundException while using segread

Posted by Andrzej Bialecki <ab...@getopt.org>.
Vishal Shah wrote:
> Hi,
>  
>    I am trying to use the dump option in the segread command to get a
> segment's dump. However, I see the ClassNotFound exception for
> SegmentReader$InputFormat. Has anyone else experienced this? How do I
> resolve it?
>  
> [nutch@machine1 search]$ bin/nutch readseg -dump
> crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
> -nocontent -nofetch -noparse -noparsedata -noparsetext
> SegmentReader: dump segment: crawl1/segments/20060908210708
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
>         at
> org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
>         at
> org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
>  
>  
> [nutch@machine1 search]$ tail logs/nutch.log
> 2006-09-12 12:50:52,675 WARN  mapred.JobTracker - job init failed
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.nutch.segment.SegmentReader$InputFormat
>   

How are you deploying Hadoop/Nutch? If you run just plain Hadoop 
cluster, without deploying Nutch jars, and then only submit Nutch job 
jar, then Hadoop cannot process input files that require custom 
InputFormats, because at this moment the TaskTracker's classloader 
doesn't yet have access to the InputFormat defined in the job jar.

A workaround is to deploy the nutch-xxxx.jar too, in addition to 
Hadoop-only jars. I believe this has been solved in the newer versions 
of Hadoop.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com