You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bob Song <da...@163.com> on 2006/08/03 05:57:28 UTC

Help: No content fetched in V0.8

I have just installed Nutch0.8 in Windows XP with cygwin. And I set a local
site to test. 
Followed by the instructions, the crawling process seemed successed. The
structure of directories and files are created. But there is no content
fetched. I don't what's wrong in my steps.  

Below is my steps and configurations. 

The command under shell 'sh' were :
1.   ../bin/nutch crawl urls -dir crawled -depath 3 >& crawl.log

One file 'urls.txt' under directory 'urls'. And it only contain one line:
http://localhost/fetchtest/index.html
, which has some links to some other files inside the site.
The site is availible to access in IE.

After execution, the log file is empty.

2.  ../bin/nutch readdb crawled/crawldb -stats
No any content shown. And I checked the total size of the directory,  only
26K, indicating that nothing has been fetched.

And the configuration file I has modified are:

1.crawl-urfilter.txt, the last lines of which are:

-^(file|ftp|mailto):
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png)$
-.*(/.+?)/.*?\1/.*?\1/
+^http://localhost/
-.

2. nutch-site.xml

Modify some properties such as http.agent.name
-- 
View this message in context: http://www.nabble.com/Help%3A-No-content-fetched-in-V0.8-tf2043619.html#a5626455
Sent from the Nutch - User forum at Nabble.com.