You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Elisabeth Adler <el...@gmail.com> on 2011/09/22 11:21:04 UTC

Redirects and crawl URLs twice

Hi,

Based on my problem that I have to crawl a site that redirects to 
itself, I am now thinking about creating a Nutch plugin that allows to 
crawl certain URLs twice.
Since I'm not too familiar with the Nutch code, I would appreciate any 
pointers on where to start - or is there already an option available in 
Nutch which I missed?

A more thourough explination about why the page is redirecting to itself 
can be found in an earlier thread [1].

Thanks,
Elisabeth

[1] 
http://markmail.org/search/?q=list%3Aorg.apache.lucene.nutch-user+crawling+and+redirects#query:list%3Aorg.apache.lucene.nutch-user%20crawling%20and%20redirects+page:1+mid:urds3zg2kp7n6o46+state:results