You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Elisabeth Adler <el...@gmail.com> on 2011/09/22 11:21:04 UTC
Redirects and crawl URLs twice
Hi,
Based on my problem that I have to crawl a site that redirects to
itself, I am now thinking about creating a Nutch plugin that allows to
crawl certain URLs twice.
Since I'm not too familiar with the Nutch code, I would appreciate any
pointers on where to start - or is there already an option available in
Nutch which I missed?
A more thourough explination about why the page is redirecting to itself
can be found in an earlier thread [1].
Thanks,
Elisabeth
[1]
http://markmail.org/search/?q=list%3Aorg.apache.lucene.nutch-user+crawling+and+redirects#query:list%3Aorg.apache.lucene.nutch-user%20crawling%20and%20redirects+page:1+mid:urds3zg2kp7n6o46+state:results