You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dima Mazmanov <nu...@proservice.ge> on 2006/02/22 09:30:59 UTC

nutch-0.8 crawl problem

Hi!
I have problems in crawling..Mainly I cannot even start to crawl.
I've downloaded latest source of nutch, and after 3 hours of struggling with config files, I gave up.
I have some question I want to ask
1) What is hadoop and how can I use it.
I searched information about hadoop and found that it's no longer integrated in nutch.It's another project.
But in lib folder I found corresponding hadoop-0.1-dev.jar file. But what does he do?
2) How can I crawl? :)
when I type command I get following exception 

No input directories specified in: Configuration: defaults: hadoop-default.xml , mapred-default.xml , /tmp/hadoop/mapred/local/localRunner/job_vpit8j.xmlfinal: hadoop-site.xml
        at org.apache.hadoop.mapred.InputFormatBase.listFiles(InputFormatBase.java:84)
        at org.apache.hadoop.mapred.InputFormatBase.getSplits(InputFormatBase.java:94)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:70)
060222 131857  map 0%  reduce 0%
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:310)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:104)

I wrote in hadoop-site.xml following

<!--StartFragment--><property>
    <name>fs.default.name</name>
    <value>localhost:9000</value>
  </property>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9001</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>

But I don't know what does it mean.(just copied from hadoop website)
So, how can I crawl using nutch-0.8?
3) where is ./nutch ndfs?
When I execute this command I get
Exception in thread "main" java.lang.NoClassDefFoundError: ndfs
I had no problems with 0.7 version.
I decided to move to 0.8 because of parse-swf plugin, since I couldn't compile it.
Please describe how to use new nutch? Or what do I need to compile parse-swf plugin?