You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fabrizio Silvestri <fa...@isti.cnr.it> on 2006/02/27 10:10:16 UTC
Problems with the tutorial example

Dear all,

I just started to use the nutch search engine, and as you may imagine  
I started to play with it by trying it's tutorial example.

First of all, nutch-0.7.1 does not contain any  
org.apache.nutch.crawl.DmozParser class (I even tried to issue a  
locate DmozParser and I could not find it anyway), anyone of you  
knows if the tutorial applies also to nutch-0.7.1?

Anyway, I successfully 'svn checked-out' the last version of nutch  
and within the trunk directory I found the DmozParser class but not  
under crawl.DmozParser but under tools.DmozParser (?!?!?)

Now another problem, It all goes through smoothly as far as I reach  
the nutch invertlinks crawl/linkdb crawl/segments command to issue...  
In this case I receive the following error output:

060227 000913 LinkDb: starting
060227 000913 LinkDb: linkdb: crawl/linkdb
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/hadoop-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
nutch-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
nutch-site.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
hadoop-site.xml
060227 000913 LinkDb: adding segment: crawl/segments
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/hadoop-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
nutch-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
nutch-site.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
hadoop-site.xml
060227 000913 Running job: job_mmx151
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/hadoop-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/ 
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing /tmp/hadoop/mapred/local/localRunner/ 
job_mmx151.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/ 
hadoop-site.xml
java.io.IOException: No input directories specified in:  
Configuration: defaults: hadoop-default.xml , mapred-default.xml , / 
tmp/hadoop/mapred/local/localRunner/job_mmx151.xmlfinal: hadoop-site.xml
         at org.apache.hadoop.mapred.InputFormatBase.listFiles 
(InputFormatBase.java:84)
         at org.apache.hadoop.mapred.SequenceFileInputFormat.listFiles 
(SequenceFileInputFormat.java:37)
         at org.apache.hadoop.mapred.InputFormatBase.getSplits 
(InputFormatBase.java:94)
         at org.apache.hadoop.mapred.LocalJobRunner$Job.run 
(LocalJobRunner.java:70)
060227 000914  map 0%  reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
310)
         at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
         at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:208)


Any idea?

Cheers,


------------------------------------------------------------------------ 
------------------------
Fabrizio Silvestri
Researcher
ISTI - CNR
Phone: +39 50 315 3011
Web: http://hpc.isti.cnr.it/~silvestr