You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Olena Medelyan <me...@coling.uni-freiburg.de> on 2005/10/04 17:29:14 UTC

Modifying Fetcher for fetching specific webpages

Hi all,
I'm working with the CrawlTool: crawling webpages from a list of seed
urls. My idea is to crawl only those webpages that confirm a specific
condition. In handleFetch I look at the content of the fetched webpages
and call then the outputPage method if my condition apply.
Somehow it doesn't work: Nutch is not saving any data about the
webpage in the segment.
Is there any trick I need to know? I suppose it has something to do with
Re-fetching?
Thanks a lot for your help!
Cheers,
Olena