You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Bruno Wolff III <br...@wolff.to> on 2002/08/27 18:48:22 UTC

[users@httpd] Re: Wget

On Mon, Aug 26, 2002 at 15:13:31 +0200,
  Wolter Kamphuis <ap...@wkamphuis.student.utwente.nl> wrote:
> 
> I now use robotcop (http://www.robotcop.org/) to block webspiders. On some
> of my pages (especially dynamic ones) I include a one-pixel image-link.
> Everyone following this link will be blocked for two days. Normal browsers
> won't follow this link so they are unaffected. I catch about 10 to 20
> people a day using wget, teleport pro and more of such spiders.

I use a two step process. I add links that don't surround content that
point to a separate page. That page displays a warning not to follow
any links off of the page. It also has meta-robot tags saying not to index
the page or follow links off of it. There is a link on that page that
runs a cgi-bin script which will block the connecting IP address until
it is manually removed.

This is more to stop robots that ignore meta-robot tags than to catch
things like wget that pull stuff too fast, but don't do that repeatedly.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org