You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Godmar Back <go...@gmail.com> on 2010/01/07 02:04:56 UTC
Nutch crawls parent directories and ignores the url filters added to
prevent this in crawl-urlfilter.txt
... if you followed the wrong instructions in the old FAQ, which I took the
liberty to correct:
http://wiki.apache.org/nutch/FAQ?action=diff&rev1=113&rev2=115
I am proud to report that nutch has now indexed an entire directory of PDF
files and actually returns search results.
- Godmar
keyword: nutch crawls parent directories, indexing local filesystem,
urlfilter-regexp, plugin.include