You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Fabrizio Silvestri <fa...@isti.cnr.it> on 2006/02/27 10:10:16 UTC
Problems with the tutorial example
Dear all,
I just started to use the nutch search engine, and as you may imagine
I started to play with it by trying it's tutorial example.
First of all, nutch-0.7.1 does not contain any
org.apache.nutch.crawl.DmozParser class (I even tried to issue a
locate DmozParser and I could not find it anyway), anyone of you
knows if the tutorial applies also to nutch-0.7.1?
Anyway, I successfully 'svn checked-out' the last version of nutch
and within the trunk directory I found the DmozParser class but not
under crawl.DmozParser but under tools.DmozParser (?!?!?)
Now another problem, It all goes through smoothly as far as I reach
the nutch invertlinks crawl/linkdb crawl/segments command to issue...
In this case I receive the following error output:
060227 000913 LinkDb: starting
060227 000913 LinkDb: linkdb: crawl/linkdb
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/hadoop-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
nutch-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
nutch-site.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
hadoop-site.xml
060227 000913 LinkDb: adding segment: crawl/segments
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/hadoop-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
nutch-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
nutch-site.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
hadoop-site.xml
060227 000913 Running job: job_mmx151
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/hadoop-default.xml
060227 000913 parsing jar:file:/home/silvestr/nutch/nutch/trunk/lib/
hadoop-0.1-dev.jar!/mapred-default.xml
060227 000913 parsing /tmp/hadoop/mapred/local/localRunner/
job_mmx151.xml
060227 000913 parsing file:/home/silvestr/nutch/nutch/trunk/conf/
hadoop-site.xml
java.io.IOException: No input directories specified in:
Configuration: defaults: hadoop-default.xml , mapred-default.xml , /
tmp/hadoop/mapred/local/localRunner/job_mmx151.xmlfinal: hadoop-site.xml
at org.apache.hadoop.mapred.InputFormatBase.listFiles
(InputFormatBase.java:84)
at org.apache.hadoop.mapred.SequenceFileInputFormat.listFiles
(SequenceFileInputFormat.java:37)
at org.apache.hadoop.mapred.InputFormatBase.getSplits
(InputFormatBase.java:94)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:70)
060227 000914 map 0% reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
310)
at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)
at org.apache.nutch.crawl.LinkDb.main(LinkDb.java:208)
Any idea?
Cheers,
------------------------------------------------------------------------
------------------------
Fabrizio Silvestri
Researcher
ISTI - CNR
Phone: +39 50 315 3011
Web: http://hpc.isti.cnr.it/~silvestr