You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lenya.apache.org by Michael Wechner <mi...@wyona.com> on 2004/02/26 01:46:00 UTC

robots.txt

I have patched the class RobotExclusion which is part of the websphinx 
package.

One can now use local robots.txt files in order to disallow certain URLs.

The issues arises if one wants to crawl a remote domain, where one 
doesn't have
access to the robots.txt but still wants to disallow certain URLs, e.g. in
the case of Loops (http://foo.bar.net/time/79797123210218 --> 
http://foo.bar.net/time/771313128977123 --> ....

Michi

-- 
Michael Wechner
Wyona Inc.  -   Open Source Content Management   -   Apache Lenya
http://www.wyona.com              http://cocoon.apache.org/lenya/
michael.wechner@wyona.com                        michi@apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lenya-dev-unsubscribe@cocoon.apache.org
For additional commands, e-mail: lenya-dev-help@cocoon.apache.org