You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Vishal Shah <vi...@rediff.co.in> on 2006/09/12 07:11:51 UTC
ClassNotFoundException while using segread
Hi,
I am trying to use the dump option in the segread command to get a
segment's dump. However, I see the ClassNotFound exception for
SegmentReader$InputFormat. Has anyone else experienced this? How do I
resolve it?
[nutch@machine1 search]$ bin/nutch readseg -dump
crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
-nocontent -nofetch -noparse -noparsedata -noparsetext
SegmentReader: dump segment: crawl1/segments/20060908210708
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
at
org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
at
org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
[nutch@machine1 search]$ tail logs/nutch.log
2006-09-12 12:50:52,675 WARN mapred.JobTracker - job init failed
java.io.IOException: java.lang.ClassNotFoundException:
org.apache.nutch.segment.SegmentReader$InputFormat
at
org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:123)
at
org.apache.hadoop.mapred.JobTracker$JobInitThread.run(JobTracker.java:31
4)
at java.lang.Thread.run(Thread.java:595)
Thanks,
-vishal.
RE: ClassNotFoundException while using segread
Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Dennis,
Thanks a lot for the response. I added the for loop in my hadoop
script, and it works now. I really appreciate your help in helping us
newbies sort things out!
Regards,
-vishal.
-----Original Message-----
From: Dennis Kubes [mailto:nutch-dev@dragonflymc.com]
Sent: Tuesday, September 12, 2006 6:37 PM
To: nutch-user@lucene.apache.org
Subject: Re: ClassNotFoundException while using segread
Isn't this the same problem that was happening before with the
SegmentMerger I think where the nutch-x.x.jar needed to be added to the
classpath on all of the task trackers. We added the following code to
our hadoop script just below the other for loops and redeployed script
and restarted all task trackers:
for f in $HADOOP_HOME/nutch-*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
Dennis
Vishal Shah wrote:
> Hi Andrzej,
>
> Thanks for the reply. I have a job running on the system right now,
> but I'll try to reinstall (redeploy ;-)) it after it is done.
>
> Regards,
>
> -vishal.
>
> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org]
> Sent: Tuesday, September 12, 2006 2:40 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: ClassNotFoundException while using segread
>
> Vishal Shah wrote:
>
>> Hi Andrzej,
>>
>> Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
>> the instructions in the hadoop/nutch tutorial. Currently, I have
>>
> copied
>
>>
>>
>
> Ok, then forget my explanation - it is still true, but not applicable
to
>
> your case.
>
>
>> the nutch jars in my NUTCH_HOME directory. I tried copying the
>> nutch-xxxx.job to my lib directory, but that doesn't work too.
>>
>>
>>
>
> No, you shouldn't need to do this. The scripts should find all
necessary
>
> jars and put them on CLASSPATH.
>
>
>> Do I need to set the CLASSPATH before I run bin/start-all.sh, or is
>>
> it
>
>> something else? Sorry, I am new to Java development, so I don't know
>> what you mean by deploying something.
>>
>>
>
> Well, I'm not sure what could be wrong... Does it occur for you with
the
>
> clean installation, i.e. if you get a fresh copy, rebuild, reinstall
> from scratch and try again?
>
>
Re: ClassNotFoundException while using segread
Posted by Dennis Kubes <nu...@dragonflymc.com>.
Isn't this the same problem that was happening before with the
SegmentMerger I think where the nutch-x.x.jar needed to be added to the
classpath on all of the task trackers. We added the following code to
our hadoop script just below the other for loops and redeployed script
and restarted all task trackers:
for f in $HADOOP_HOME/nutch-*.jar; do
CLASSPATH=${CLASSPATH}:$f;
done
Dennis
Vishal Shah wrote:
> Hi Andrzej,
>
> Thanks for the reply. I have a job running on the system right now,
> but I'll try to reinstall (redeploy ;-)) it after it is done.
>
> Regards,
>
> -vishal.
>
> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org]
> Sent: Tuesday, September 12, 2006 2:40 PM
> To: nutch-user@lucene.apache.org
> Subject: Re: ClassNotFoundException while using segread
>
> Vishal Shah wrote:
>
>> Hi Andrzej,
>>
>> Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
>> the instructions in the hadoop/nutch tutorial. Currently, I have
>>
> copied
>
>>
>>
>
> Ok, then forget my explanation - it is still true, but not applicable to
>
> your case.
>
>
>> the nutch jars in my NUTCH_HOME directory. I tried copying the
>> nutch-xxxx.job to my lib directory, but that doesn't work too.
>>
>>
>>
>
> No, you shouldn't need to do this. The scripts should find all necessary
>
> jars and put them on CLASSPATH.
>
>
>> Do I need to set the CLASSPATH before I run bin/start-all.sh, or is
>>
> it
>
>> something else? Sorry, I am new to Java development, so I don't know
>> what you mean by deploying something.
>>
>>
>
> Well, I'm not sure what could be wrong... Does it occur for you with the
>
> clean installation, i.e. if you get a fresh copy, rebuild, reinstall
> from scratch and try again?
>
>
RE: ClassNotFoundException while using segread
Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Andrzej,
Thanks for the reply. I have a job running on the system right now,
but I'll try to reinstall (redeploy ;-)) it after it is done.
Regards,
-vishal.
-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org]
Sent: Tuesday, September 12, 2006 2:40 PM
To: nutch-user@lucene.apache.org
Subject: Re: ClassNotFoundException while using segread
Vishal Shah wrote:
> Hi Andrzej,
>
> Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
> the instructions in the hadoop/nutch tutorial. Currently, I have
copied
>
Ok, then forget my explanation - it is still true, but not applicable to
your case.
> the nutch jars in my NUTCH_HOME directory. I tried copying the
> nutch-xxxx.job to my lib directory, but that doesn't work too.
>
>
No, you shouldn't need to do this. The scripts should find all necessary
jars and put them on CLASSPATH.
> Do I need to set the CLASSPATH before I run bin/start-all.sh, or is
it
> something else? Sorry, I am new to Java development, so I don't know
> what you mean by deploying something.
>
Well, I'm not sure what could be wrong... Does it occur for you with the
clean installation, i.e. if you get a fresh copy, rebuild, reinstall
from scratch and try again?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: ClassNotFoundException while using segread
Posted by Andrzej Bialecki <ab...@getopt.org>.
Vishal Shah wrote:
> Hi Andrzej,
>
> Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
> the instructions in the hadoop/nutch tutorial. Currently, I have copied
>
Ok, then forget my explanation - it is still true, but not applicable to
your case.
> the nutch jars in my NUTCH_HOME directory. I tried copying the
> nutch-xxxx.job to my lib directory, but that doesn't work too.
>
>
No, you shouldn't need to do this. The scripts should find all necessary
jars and put them on CLASSPATH.
> Do I need to set the CLASSPATH before I run bin/start-all.sh, or is it
> something else? Sorry, I am new to Java development, so I don't know
> what you mean by deploying something.
>
Well, I'm not sure what could be wrong... Does it occur for you with the
clean installation, i.e. if you get a fresh copy, rebuild, reinstall
from scratch and try again?
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
RE: ClassNotFoundException while using segread
Posted by Vishal Shah <vi...@rediff.co.in>.
Hi Andrzej,
Thanks for the reply. Currently, I have deployed Hadoop/Nutch using
the instructions in the hadoop/nutch tutorial. Currently, I have copied
the nutch jars in my NUTCH_HOME directory. I tried copying the
nutch-xxxx.job to my lib directory, but that doesn't work too.
Do I need to set the CLASSPATH before I run bin/start-all.sh, or is it
something else? Sorry, I am new to Java development, so I don't know
what you mean by deploying something.
Thanks,
-vishal.
-----Original Message-----
From: Andrzej Bialecki [mailto:ab@getopt.org]
Sent: Tuesday, September 12, 2006 12:06 PM
To: nutch-user@lucene.apache.org
Subject: Re: ClassNotFoundException while using segread
Vishal Shah wrote:
> Hi,
>
> I am trying to use the dump option in the segread command to get a
> segment's dump. However, I see the ClassNotFound exception for
> SegmentReader$InputFormat. Has anyone else experienced this? How do I
> resolve it?
>
> [nutch@machine1 search]$ bin/nutch readseg -dump
> crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
> -nocontent -nofetch -noparse -noparsedata -noparsetext
> SegmentReader: dump segment: crawl1/segments/20060908210708
> Exception in thread "main" java.io.IOException: Job failed!
> at
org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
> at
> org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
> at
> org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
>
>
> [nutch@machine1 search]$ tail logs/nutch.log
> 2006-09-12 12:50:52,675 WARN mapred.JobTracker - job init failed
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.nutch.segment.SegmentReader$InputFormat
>
How are you deploying Hadoop/Nutch? If you run just plain Hadoop
cluster, without deploying Nutch jars, and then only submit Nutch job
jar, then Hadoop cannot process input files that require custom
InputFormats, because at this moment the TaskTracker's classloader
doesn't yet have access to the InputFormat defined in the job jar.
A workaround is to deploy the nutch-xxxx.jar too, in addition to
Hadoop-only jars. I believe this has been solved in the newer versions
of Hadoop.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: ClassNotFoundException while using segread
Posted by Andrzej Bialecki <ab...@getopt.org>.
Vishal Shah wrote:
> Hi,
>
> I am trying to use the dump option in the segread command to get a
> segment's dump. However, I see the ClassNotFound exception for
> SegmentReader$InputFormat. Has anyone else experienced this? How do I
> resolve it?
>
> [nutch@machine1 search]$ bin/nutch readseg -dump
> crawl1/segments/20060908210708 crawl1/segments/20060908210708/gendump
> -nocontent -nofetch -noparse -noparsedata -noparsetext
> SegmentReader: dump segment: crawl1/segments/20060908210708
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:363)
> at
> org.apache.nutch.segment.SegmentReader.dump(SegmentReader.java:196)
> at
> org.apache.nutch.segment.SegmentReader.main(SegmentReader.java:533)
>
>
> [nutch@machine1 search]$ tail logs/nutch.log
> 2006-09-12 12:50:52,675 WARN mapred.JobTracker - job init failed
> java.io.IOException: java.lang.ClassNotFoundException:
> org.apache.nutch.segment.SegmentReader$InputFormat
>
How are you deploying Hadoop/Nutch? If you run just plain Hadoop
cluster, without deploying Nutch jars, and then only submit Nutch job
jar, then Hadoop cannot process input files that require custom
InputFormats, because at this moment the TaskTracker's classloader
doesn't yet have access to the InputFormat defined in the job jar.
A workaround is to deploy the nutch-xxxx.jar too, in addition to
Hadoop-only jars. I believe this has been solved in the newer versions
of Hadoop.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com