You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bud Witney <wi...@osu.edu> on 2006/03/08 17:00:04 UTC
Crawl crash hadoop
whats going on with this. Tried nightly build to see future build and
have following error on intranet crawl. IS there good documentation
how to setup hadoop
used the ./bin/nutch crawl urls -dir crawl.academic -depth 10
and export
export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/
1.5.0/Home
running on OSX 10.4.5
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/nutch-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/crawl-tool.xml
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/nutch-site.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/hadoop-site.xml
060308 103218 crawl started in: crawl.academic
060308 103218 rootUrlDir = urls
060308 103218 threads = 10
060308 103218 depth = 10
060308 103218 Injector: starting
060308 103218 Injector: crawlDb: crawl.academic/crawldb
060308 103218 Injector: urlDir: urls
060308 103218 Injector: Converting injected urls to crawl db entries.
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/nutch-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/crawl-tool.xml
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/nutch-site.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/hadoop-site.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/nutch-default.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/crawl-tool.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/nutch-site.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/hadoop-site.xml
060308 103219 Running job: job_caq34e
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly%
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing /tmp/hadoop/mapred/local/localRunner/
job_caq34e.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/
conf/hadoop-site.xml
java.io.IOException: No input directories specified in:
Configuration: defaults: hadoop-default.xml , mapred-default.xml , /
tmp/hadoop/mapred/local/localRunner/job_caq34e.xmlfinal: hadoop-site.xml
at org.apache.hadoop.mapred.InputFormatBase.listFiles
(InputFormatBase.java:84)
at org.apache.hadoop.mapred.InputFormatBase.getSplits
(InputFormatBase.java:94)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:70)
060308 103220 map 0% reduce 0%
060308 103220 SEVERE error, caught Exception in main()
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:
310)
at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
at org.apache.nutch.crawl.Crawl.doMain(Crawl.java:104)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)