You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bud Witney <wi...@osu.edu> on 2006/03/08 17:00:04 UTC

Crawl crash hadoop

whats going on with this. Tried nightly build to see future build and  
have following error on intranet crawl. IS there good documentation  
how to setup hadoop

used the ./bin/nutch crawl urls -dir crawl.academic -depth 10

and export

export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/ 
1.5.0/Home

running on OSX 10.4.5

060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/nutch-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/crawl-tool.xml
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/nutch-site.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/hadoop-site.xml
060308 103218 crawl started in: crawl.academic
060308 103218 rootUrlDir = urls
060308 103218 threads = 10
060308 103218 depth = 10
060308 103218 Injector: starting
060308 103218 Injector: crawlDb: crawl.academic/crawldb
060308 103218 Injector: urlDir: urls
060308 103218 Injector: Converting injected urls to crawl db entries.
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/nutch-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/crawl-tool.xml
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103218 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/nutch-site.xml
060308 103218 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/hadoop-site.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/nutch-default.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/crawl-tool.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/nutch-site.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/hadoop-site.xml
060308 103219 Running job: job_caq34e
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/hadoop-default.xml
060308 103219 parsing jar:file:/Users/budwitney/Desktop/nutch-nightly% 
202/lib/hadoop-0.1-dev.jar!/mapred-default.xml
060308 103219 parsing /tmp/hadoop/mapred/local/localRunner/ 
job_caq34e.xml
060308 103219 parsing file:/Users/budwitney/Desktop/nutch-nightly%202/ 
conf/hadoop-site.xml
java.io.IOException: No input directories specified in:  
Configuration: defaults: hadoop-default.xml , mapred-default.xml , / 
tmp/hadoop/mapred/local/localRunner/job_caq34e.xmlfinal: hadoop-site.xml
         at org.apache.hadoop.mapred.InputFormatBase.listFiles 
(InputFormatBase.java:84)
         at org.apache.hadoop.mapred.InputFormatBase.getSplits 
(InputFormatBase.java:94)
         at org.apache.hadoop.mapred.LocalJobRunner$Job.run 
(LocalJobRunner.java:70)
060308 103220  map 0%  reduce 0%
060308 103220 SEVERE error, caught Exception in main()
java.io.IOException: Job failed!
         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java: 
310)
         at org.apache.nutch.crawl.Injector.inject(Injector.java:114)
         at org.apache.nutch.crawl.Crawl.doMain(Crawl.java:104)
         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)