You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by "saravan.krish" <sa...@cognizant.com> on 2009/10/27 12:03:58 UTC

How to run fetch from local

I had generated the segments after crawling process. Then I downloaded the
segments to local from crawldb. Below are the four segments I generated and
downloaded from crawldb. Now if I run fetch upon these four segments then I
get the below error. Please help me how to run fetch in local.

[nutch@devcluster01 search]$ ls -lrt db/segments/crawled_22/segments/
total 32
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022065049
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022065828
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022071136
drwxr-xr-x 8 nutch users 4096 Oct 23 03:17 20091022104701
[nutch@devcluster01 search]$ bin/nutch fetch
db/segments/crawled_22/segments/20091022065049
Fetcher: Your 'http.agent.name' value should be listed first in
'http.robots.agents' property.
Fetcher: starting
Fetcher: segment: db/segments/crawled_22/segments/20091022065049
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
hdfs://devcluster01:9000/user/nutch/db/segments/crawled_22/segments/20091022065049/crawl_generate
        at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
        at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:39)
        at
org.apache.nutch.fetcher.Fetcher$InputFormat.getSplits(Fetcher.java:101)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:797)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:969)
        at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1003)


-- 
View this message in context: http://www.nabble.com/How-to-run-fetch-from-local-tp26075786p26075786.html
Sent from the Nutch - User mailing list archive at Nabble.com.