You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Bob Song <da...@163.com> on 2006/08/03 05:57:28 UTC
Help: No content fetched in V0.8
I have just installed Nutch0.8 in Windows XP with cygwin. And I set a local
site to test.
Followed by the instructions, the crawling process seemed successed. The
structure of directories and files are created. But there is no content
fetched. I don't what's wrong in my steps.
Below is my steps and configurations.
The command under shell 'sh' were :
1. ../bin/nutch crawl urls -dir crawled -depath 3 >& crawl.log
One file 'urls.txt' under directory 'urls'. And it only contain one line:
http://localhost/fetchtest/index.html
, which has some links to some other files inside the site.
The site is availible to access in IE.
After execution, the log file is empty.
2. ../bin/nutch readdb crawled/crawldb -stats
No any content shown. And I checked the total size of the directory, only
26K, indicating that nothing has been fetched.
And the configuration file I has modified are:
1.crawl-urfilter.txt, the last lines of which are:
-^(file|ftp|mailto):
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|png)$
-.*(/.+?)/.*?\1/.*?\1/
+^http://localhost/
-.
2. nutch-site.xml
Modify some properties such as http.agent.name
--
View this message in context: http://www.nabble.com/Help%3A-No-content-fetched-in-V0.8-tf2043619.html#a5626455
Sent from the Nutch - User forum at Nabble.com.