You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by 陈钊 <ch...@gmail.com> on 2007/09/12 09:25:30 UTC
Nutch can't fetch pages under hadoop

Hi,
I am a new in Nutch,wo want to deploy it with hadoop for some tests.
We use Nutch0.8.1 for study.First we deployed it on a single mashine,it
works well,then we want use it for distributed crawl.I try to configure it
as NutchHadoop Tutorial tells.I used two linux computer,hadoop works
well,but when I run nutch it dosen't write any thing in crawled
folder.AndI checked the
jobtracker.jsp,the fetch job's task show that '0 pages, 0 errors,
0.0pages/s, 0 kb/s'.

this is a part of crawl.log
rootUrlDir = urls
threads = 6
depth = 7
Injector: starting
Injector: crawlDb: crawled/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Merging injected urls into crawl db.
Injector: done
Generator: starting
Generator: segment: crawled/segments/20070912141603
Generator: Selecting best-scoring urls due for fetch.
Generator: Partitioning selected urls by host, for politeness.
Generator: done.
Fetcher: starting
Fetcher: segment: crawled/segments/20070912141603
Fetcher: done
CrawlDb update: starting
CrawlDb update: db: crawled/crawldb
CrawlDb update: segment: crawled/segments/20070912141603
CrawlDb update: Merging segment data into db.
CrawlDb update: done
Generator: starting

In hadoop-nutch-namenode-dev01.log the warning appear frequently.
2007-09-12 14:27:14,754 WARN fs.FSNamesystem - Zero targets found,
forbidden1.size=2 forbidden2.size()=0

ps:The day befor yesterday I run nutch under hadoop,it stops at fetch
job.And prints :
 java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:357)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:562)
        at org.apache.nutch.crawl.Crawl.Crawler(Crawl.java:135)
        at org.apache.nutch.crawl.Crawl.ReplyPNo1Command(Crawl.java:325)
        at org.apache.nutch.crawl.Crawl.run(Crawl.java:436)
I deployed it again then it works like I just tell you.

Can you tell me what's wrong with me,and thank you very much!!!