You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Beats <ta...@yahoo.com> on 2009/07/14 10:06:54 UTC
Ignoring robots.txt
hi all,
i m trying to make ntch crawler ignore robots.txt file.
i hv tried to change fetcher.java RobotsRulesParser.java.
But NullPointException error is reported.
Can anybody gv the correct changes require to made.
with regards
Beats
Beats@yahoo.com
--
View this message in context: http://www.nabble.com/Ignoring-robots.txt-tp24475336p24475336.html
Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Ignoring robots.txt
Posted by Dennis Kubes <ku...@apache.org>.
Actually no. Nutch is made to obey robots.txt. All polite crawlers
will obey rules set down in robots.txt.
Dennis
Beats wrote:
>
> hi...
>
> anybody can plz help me on this.......
> i m stuck with it
>
> thanx
>
>
Re: Ignoring robots.txt
Posted by Beats <ta...@yahoo.com>.
hi...
anybody can plz help me on this.......
i m stuck with it
thanx
--
View this message in context: http://www.nabble.com/Ignoring-robots.txt-tp24475336p24545150.html
Sent from the Nutch - User mailing list archive at Nabble.com.