You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Beats <ta...@yahoo.com> on 2009/07/14 10:06:54 UTC

Ignoring robots.txt

hi all,

i m trying to make ntch crawler ignore robots.txt file.

i hv tried to change fetcher.java RobotsRulesParser.java.

But NullPointException error is reported.

Can anybody gv the correct changes require to made.

with regards

Beats

Beats@yahoo.com
-- 
View this message in context: http://www.nabble.com/Ignoring-robots.txt-tp24475336p24475336.html
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: Ignoring robots.txt

Posted by Dennis Kubes <ku...@apache.org>.
Actually no.  Nutch is made to obey robots.txt.  All polite crawlers 
will obey rules set down in robots.txt.

Dennis

Beats wrote:
> 
> hi...
> 
>  anybody can plz help me on this.......
> i m stuck with it
> 
> thanx 
> 
> 

Re: Ignoring robots.txt

Posted by Beats <ta...@yahoo.com>.

hi...

 anybody can plz help me on this.......
i m stuck with it

thanx 


-- 
View this message in context: http://www.nabble.com/Ignoring-robots.txt-tp24475336p24545150.html
Sent from the Nutch - User mailing list archive at Nabble.com.