You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Manuel Lemos <ml...@acm.org> on 2005/01/27 22:45:56 UTC

[users@httpd] mod_throttle relaxing restrictions for crawlers

Hello,

I want to use mod_throttle to limit the rate of requests that can be
done from each client IP.

However, I also need to allow the internal crawler, which is HT:Dig, to
crawl the site without retrictions.

My problem is that I could not find a way to specify a different
throttle policy for clients accessing from a given IP nor even from a
given user agent.

Anybody has any ideas on how to solve this problem?

-- 

Regards,
Manuel Lemos


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by "Ivan Barrera A." <Br...@Ivn.cl>.
Ups.. I'm sorry
My mod is meant for Apache 2. I didnt read that you were using apache 1. 
Sorry about that.

I may implement those functionalities in a future.. but for apache 2. If 
i got some time, i'll try to make it work for apache 1 too.

Dont know if this suits you, but i think i saw a mod_bwshare

Manuel Lemos wrote:
> Hello,
> 
> on 01/27/2005 07:51 PM Ivan Barrera A. said the following:
> 
>> I'm not sure if this is what your asking for, but my mod bw_mod 
>> v0.5rc1 can do bw limiting on a host/ip filesize basis.
>> http://ivn.cl/apache
>>
>> Note that it is a release candidate yet. Although There are some 
>> people using my mod flawlessly on huge servers..
>>
>> It has been tested on Linux/x86.
>> It compiles and work (very little testing) on MacOS X, Solaris, 
>> FreeBSD, on Sparc, Ppc, and x86.
>>
>> It's limited to work on perfork MPM for now. I'm implementing worker, 
>> and windows port soon. (also a MaxConnection per each ip)
> 
> 
> I have tried it but it even did not compile. Are you sure this works 
> with Apache 1.3.33 on Linux/x86? The errors follow below.
> 
> Anyway, despite it may let you limit connections per IP, I wonder if 
> what this module does is what I want.
> 
> First, I could not see a way to monitor the IPs that are being throttle 
> like you can with mod_throttle accessing a special handler page.
> 
> Another this is that this module seems to restrict the number of 
> simulataneous connections. What mod_throttle does is to limit the number 
> of requests per period of time. This is more like what I want. Does your 
> module provide this?
> 
> $ /path/to/httpd/bin/apxs -i -a -c bw_mod-0.5rc1.c
> gcc -DLINUX=22 -DHAVE_SET_DUMPABLE -DNO_DBM_REWRITEMAP -DUSE_EXPAT 
> -I../lib/expat-lite -fpic -DSHARED_MODULE -I/path/to/httpd/include  -c 
> bw_mod-0.5rc1.c
> bw_mod-0.5rc1.c:39:25: apr_buckets.h: No such file or directory
> bw_mod-0.5rc1.c:40:25: apr_strings.h: No such file or directory
> bw_mod-0.5rc1.c:41:24: apr_atomic.h: No such file or directory
> bw_mod-0.5rc1.c:42:21: apr_lib.h: No such file or directory
> bw_mod-0.5rc1.c:43:21: apr_shm.h: No such file or directory
> bw_mod-0.5rc1.c:45:25: util_filter.h: No such file or directory
> bw_mod-0.5rc1.c:46:20: ap_mpm.h: No such file or directory
> apxs:Break: Command failed with rc=1
> 
> 

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by Manuel Lemos <ml...@acm.org>.
Hello,

on 01/27/2005 07:51 PM Ivan Barrera A. said the following:
> I'm not sure if this is what your asking for, but my mod bw_mod v0.5rc1 
> can do bw limiting on a host/ip filesize basis.
> http://ivn.cl/apache
> 
> Note that it is a release candidate yet. Although There are some people 
> using my mod flawlessly on huge servers..
> 
> It has been tested on Linux/x86.
> It compiles and work (very little testing) on MacOS X, Solaris, FreeBSD, 
> on Sparc, Ppc, and x86.
> 
> It's limited to work on perfork MPM for now. I'm implementing worker, 
> and windows port soon. (also a MaxConnection per each ip)

I have tried it but it even did not compile. Are you sure this works 
with Apache 1.3.33 on Linux/x86? The errors follow below.

Anyway, despite it may let you limit connections per IP, I wonder if 
what this module does is what I want.

First, I could not see a way to monitor the IPs that are being throttle 
like you can with mod_throttle accessing a special handler page.

Another this is that this module seems to restrict the number of 
simulataneous connections. What mod_throttle does is to limit the number 
of requests per period of time. This is more like what I want. Does your 
module provide this?

$ /path/to/httpd/bin/apxs -i -a -c bw_mod-0.5rc1.c
gcc -DLINUX=22 -DHAVE_SET_DUMPABLE -DNO_DBM_REWRITEMAP -DUSE_EXPAT 
-I../lib/expat-lite -fpic -DSHARED_MODULE -I/path/to/httpd/include  -c 
bw_mod-0.5rc1.c
bw_mod-0.5rc1.c:39:25: apr_buckets.h: No such file or directory
bw_mod-0.5rc1.c:40:25: apr_strings.h: No such file or directory
bw_mod-0.5rc1.c:41:24: apr_atomic.h: No such file or directory
bw_mod-0.5rc1.c:42:21: apr_lib.h: No such file or directory
bw_mod-0.5rc1.c:43:21: apr_shm.h: No such file or directory
bw_mod-0.5rc1.c:45:25: util_filter.h: No such file or directory
bw_mod-0.5rc1.c:46:20: ap_mpm.h: No such file or directory
apxs:Break: Command failed with rc=1


-- 

Regards,
Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/

Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by "Ivan Barrera A." <Br...@Ivn.cl>.
I'm not sure if this is what your asking for, but my mod bw_mod v0.5rc1 
can do bw limiting on a host/ip filesize basis.
http://ivn.cl/apache

Note that it is a release candidate yet. Although There are some people 
using my mod flawlessly on huge servers..

It has been tested on Linux/x86.
It compiles and work (very little testing) on MacOS X, Solaris, FreeBSD, 
on Sparc, Ppc, and x86.

It's limited to work on perfork MPM for now. I'm implementing worker, 
and windows port soon. (also a MaxConnection per each ip)


Manuel Lemos wrote:
> Hello,
> 
> I want to use mod_throttle to limit the rate of requests that can be
> done from each client IP.
> 
> However, I also need to allow the internal crawler, which is HT:Dig, to
> crawl the site without retrictions.
> 
> My problem is that I could not find a way to specify a different
> throttle policy for clients accessing from a given IP nor even from a
> given user agent.
> 
> Anybody has any ideas on how to solve this problem?
> 

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by Manuel Lemos <ml...@acm.org>.
Hello,

on 01/27/2005 09:25 PM Jeremy Hilton said the following:
>>>>I want to use mod_throttle to limit the rate of requests that can be
>>>>done from each client IP.
>>>>
>>>>However, I also need to allow the internal crawler, which is HT:Dig, to
>>>>crawl the site without retrictions.
>>>>
>>>>My problem is that I could not find a way to specify a different
>>>>throttle policy for clients accessing from a given IP nor even from a
>>>>given user agent.
>>>>
>>>>Anybody has any ideas on how to solve this problem?
>>>http://www.snert.com/Software/mod_throttle/
>>If you read me again, you may notice that I want to use mod_throttle but
>>AFAIK there is no way to tell it to not throttle connections that come
>>from a given IP. That is the problem.
>>
>>I was wondering if there is a way to configure mod throttle in
>>combination with mod_setenvif like I already do to split my logs, but I
>>could not find a way to do it.
> 
> Since the throttle rules can be declared in the context of <VirtualHost>,
> couldn't you setup a second virt host that has no throttle rules, and
> instruct your crawler to use the "other" virt host?

Good idea!

However the way I am doing it, it seems to not be working as it seems to 
still be counting my server IP accesses and block it when it exceeds the 
accesses for the virtual host that has the throttle policy set to none.

It seems like ThrottlePolicy None does not stop setting 
ThrottleClientIP. I tried setting ThrottleClientIP inside just one 
virtual host but it crashes Apache when I do apachectl configtest.

Do you have any what is the problem? I have it like this:

LoadModule throttle_module    libexec/mod_throttle.so
 

<IfModule mod_throttle.c>

ThrottleClientIP 100 Document 100 180 
 
ThrottlePolicy None
<Location /throttle-status>
SetHandler throttle-status
</Location>

</IfModule>

<VirtualHost 123.45.67.89>
DocumentRoot /path/to/httpd/htdocs
ServerName crawl.mydomain.com
<IfModule mod_throttle.c>
ThrottlePolicy None
</IfModule>
</VirtualHost>

<VirtualHost 123.45.67.89>
DocumentRoot /path/to/httpd/htdocs
ServerName www.mydomain.com
<IfModule mod_throttle.c>
ThrottlePolicy Document
</IfModule>
</VirtualHost>


-- 

Regards,
Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/

Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by Jeremy Hilton <je...@adtcs.com>.
On 1/27/05 5:34 PM, "Manuel Lemos" <ml...@acm.org> wrote:

> Hello,
> 
> on 01/27/2005 08:02 PM Jeremy Hilton said the following:
>>> I want to use mod_throttle to limit the rate of requests that can be
>>> done from each client IP.
>>> 
>>> However, I also need to allow the internal crawler, which is HT:Dig, to
>>> crawl the site without retrictions.
>>> 
>>> My problem is that I could not find a way to specify a different
>>> throttle policy for clients accessing from a given IP nor even from a
>>> given user agent.
>>> 
>>> Anybody has any ideas on how to solve this problem?
>> 
>> http://www.snert.com/Software/mod_throttle/
> 
> If you read me again, you may notice that I want to use mod_throttle but
> AFAIK there is no way to tell it to not throttle connections that come
> from a given IP. That is the problem.
> 
> I was wondering if there is a way to configure mod throttle in
> combination with mod_setenvif like I already do to split my logs, but I
> could not find a way to do it.

Since the throttle rules can be declared in the context of <VirtualHost>,
couldn't you setup a second virt host that has no throttle rules, and
instruct your crawler to use the "other" virt host?




---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by Manuel Lemos <ml...@acm.org>.
Hello,

on 01/27/2005 08:02 PM Jeremy Hilton said the following:
>>I want to use mod_throttle to limit the rate of requests that can be
>>done from each client IP.
>>
>>However, I also need to allow the internal crawler, which is HT:Dig, to
>>crawl the site without retrictions.
>>
>>My problem is that I could not find a way to specify a different
>>throttle policy for clients accessing from a given IP nor even from a
>>given user agent.
>>
>>Anybody has any ideas on how to solve this problem?
> 
> http://www.snert.com/Software/mod_throttle/

If you read me again, you may notice that I want to use mod_throttle but 
AFAIK there is no way to tell it to not throttle connections that come 
from a given IP. That is the problem.

I was wondering if there is a way to configure mod throttle in 
combination with mod_setenvif like I already do to split my logs, but I 
could not find a way to do it.

-- 

Regards,
Manuel Lemos

PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/

PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/

Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] mod_throttle relaxing restrictions for crawlers

Posted by Jeremy Hilton <je...@adtcs.com>.
On 1/27/05 4:45 PM, "Manuel Lemos" <ml...@acm.org> wrote:

> Hello,
> 
> I want to use mod_throttle to limit the rate of requests that can be
> done from each client IP.
> 
> However, I also need to allow the internal crawler, which is HT:Dig, to
> crawl the site without retrictions.
> 
> My problem is that I could not find a way to specify a different
> throttle policy for clients accessing from a given IP nor even from a
> given user agent.
> 
> Anybody has any ideas on how to solve this problem?

http://www.snert.com/Software/mod_throttle/


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org