You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Otis Gospodnetic <og...@yahoo.com> on 2009/08/03 05:13:30 UTC

Re: denied by robots.txt rules

Hi,

robots.txt is periodically rechecked and the previously denied URL should be retried when the time to refetch it comes.  If robots.txt rules no longer deny access to it, it should be fetched.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----
> From: Saurabh Suman <sa...@rediff.com>
> To: nutch-user@lucene.apache.org
> Sent: Thursday, July 30, 2009 11:29:28 PM
> Subject: denied by robots.txt rules
> 
> 
> Hi 
> if a url is denied by denied once by robots.txt rules,is crawled again by
> nutch?
> 
> -- 
> View this message in context: 
> http://www.nabble.com/denied-by-robots.txt-rules-tp24750517p24750517.html
> Sent from the Nutch - User mailing list archive at Nabble.com.