You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Raquel Rice <ra...@thericehouse.net> on 2002/01/17 23:31:46 UTC

robots.txt

I'm not certain whether this is "on topic" or not.  I'm going to
take the chance and ask.

Is it possible to use "robots.txt" to totally block certain robots
from a domain?

I have a domain from which certain sites have been gone for 2 years.
 However, Scooter insists on probing for them.  It wouldn't be bad
except that I have a perl script that sends an email whenever a
robot reports broken links.  Scooter just found all those sites
which have been gone for 2 years and my perl script sent me ... all
at once ... 181 emails.

At least the other robots don't hit so hard in one shot!

Phew!  I feel better now.  If this isn't a topic for here, does
anyone know where to look?

-- 
Raquel
============================================================
Men never do evil so completely and cheerfully as when they do it
with religious conviction.
  --Blaise Pascal

                              
                              

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


RE: robots.txt

Posted by Joshua Slive <jo...@slive.ca>.
> From: obo@bourse.ch [mailto:obo@bourse.ch]

> If you found a robot that was always coming from a particular IP
> address, you could always "Deny" it, I guess. AFAIK, there is no way to
> do something like:
> 
> Deny if UserAgent =~ "Scooter"

Sure you can:

SetEnvIf User-Agent "Scooter" badrobot
Order deny,allow
deny from env=badrobot

Joshua. 

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: robots.txt

Posted by Owen Boyle <ob...@bourse.ch>.
RuneImp wrote:
> 
> It will block all robots from the site. If you only
> wish to block all robots from certain directories
> then you would do:
> 
> User-agent: *
> Disallow: /badsite1/
> Disallow: /badsite2/
> 
> If you only want certain robots to not see certain
> directories then:
> 
> User-agent: Scooter
> Disallow: /badsite1/
> Disallow: /badsite2/

It should be pointed out that the robot exclusion standard is entirely
voluntary. The robot is supposed to request "robots.txt" first, read it,
then decide whether to crawl the site or not. If the robot wants to
ignore "robots.txt" completely, it is free to do so and you can't stop
it requesting what it likes.

If you found a robot that was always coming from a particular IP
address, you could always "Deny" it, I guess. AFAIK, there is no way to
do something like:

Deny if UserAgent =~ "Scooter"

Pity...

Rgds,

Owen Boyle.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: robots.txt

Posted by RuneImp <ru...@imptech.net>.
It will block all robots from the site. If you only
wish to block all robots from certain directories
then you would do:

User-agent: *
Disallow: /badsite1/
Disallow: /badsite2/

If you only want certain robots to not see certain
directories then:

User-agent: Scooter
Disallow: /badsite1/
Disallow: /badsite2/


-=- RuneImp
ImpTech - Web Design, Hosting & Computer Tech
http://imptech.net
rune@imptech.net


----- Original Message ----- 
From: "Raquel Rice" <ra...@thericehouse.net>
To: <us...@httpd.apache.org>
Sent: Thursday, January 17, 2002 4:39 PM
Subject: Re: robots.txt


On Thu, 17 Jan 2002 15:52:38 -0800
RuneImp "RuneImp" <ru...@imptech.net> wrote:

> Put this in your file
> 
> User-agent: *
> Disallow: /
> 
> 
> -=- RuneImp
> ImpTech - Web Design, Hosting & Computer Tech
> http://imptech.net
> rune@imptech.net
> 

Thanks, but won't that block everything?

BTW ... here's a good site for info:
http://www.robotstxt.org/wc/robots.html

-- 
Raquel
============================================================
Men never do evil so completely and cheerfully as when they do it
with religious conviction.
  --Blaise Pascal


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: robots.txt

Posted by Raquel Rice <ra...@thericehouse.net>.
On Thu, 17 Jan 2002 15:52:38 -0800
RuneImp "RuneImp" <ru...@imptech.net> wrote:

> Put this in your file
> 
> User-agent: *
> Disallow: /
> 
> 
> -=- RuneImp
> ImpTech - Web Design, Hosting & Computer Tech
> http://imptech.net
> rune@imptech.net
> 

Thanks, but won't that block everything?

BTW ... here's a good site for info:
	http://www.robotstxt.org/wc/robots.html

-- 
Raquel
============================================================
Men never do evil so completely and cheerfully as when they do it
with religious conviction.
  --Blaise Pascal

                              
                              

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: robots.txt

Posted by RuneImp <ru...@imptech.net>.
Put this in your file

User-agent: *
Disallow: /


-=- RuneImp
ImpTech - Web Design, Hosting & Computer Tech
http://imptech.net
rune@imptech.net


----- Original Message ----- 
From: "Raquel Rice" <ra...@thericehouse.net>
To: <us...@httpd.apache.org>
Sent: Thursday, January 17, 2002 2:31 PM
Subject: robots.txt


I'm not certain whether this is "on topic" or not.  I'm going to
take the chance and ask.

Is it possible to use "robots.txt" to totally block certain robots
from a domain?

I have a domain from which certain sites have been gone for 2 years.
 However, Scooter insists on probing for them.  It wouldn't be bad
except that I have a perl script that sends an email whenever a
robot reports broken links.  Scooter just found all those sites
which have been gone for 2 years and my perl script sent me ... all
at once ... 181 emails.

At least the other robots don't hit so hard in one shot!

Phew!  I feel better now.  If this isn't a topic for here, does
anyone know where to look?

-- 
Raquel
============================================================
Men never do evil so completely and cheerfully as when they do it
with religious conviction.
  --Blaise Pascal


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org