You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Michael Wechner <mi...@wyona.com> on 2004/02/26 01:46:00 UTC
robots.txt
I have patched the class RobotExclusion which is part of the websphinx
package.
One can now use local robots.txt files in order to disallow certain URLs.
The issues arises if one wants to crawl a remote domain, where one
doesn't have
access to the robots.txt but still wants to disallow certain URLs, e.g. in
the case of Loops (http://foo.bar.net/time/79797123210218 -->
http://foo.bar.net/time/771313128977123 --> ....
Michi
--
Michael Wechner
Wyona Inc. - Open Source Content Management - Apache Lenya
http://www.wyona.com http://cocoon.apache.org/lenya/
michael.wechner@wyona.com michi@apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-dev-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-dev-help@cocoon.apache.org