You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Manuel Lemos <ml...@acm.org> on 2005/01/27 22:45:56 UTC
[users@httpd] mod_throttle relaxing restrictions for crawlers
Hello,
I want to use mod_throttle to limit the rate of requests that can be
done from each client IP.
However, I also need to allow the internal crawler, which is HT:Dig, to
crawl the site without retrictions.
My problem is that I could not find a way to specify a different
throttle policy for clients accessing from a given IP nor even from a
given user agent.
Anybody has any ideas on how to solve this problem?
--
Regards,
Manuel Lemos
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by "Ivan Barrera A." <Br...@Ivn.cl>.
Ups.. I'm sorry
My mod is meant for Apache 2. I didnt read that you were using apache 1.
Sorry about that.
I may implement those functionalities in a future.. but for apache 2. If
i got some time, i'll try to make it work for apache 1 too.
Dont know if this suits you, but i think i saw a mod_bwshare
Manuel Lemos wrote:
> Hello,
>
> on 01/27/2005 07:51 PM Ivan Barrera A. said the following:
>
>> I'm not sure if this is what your asking for, but my mod bw_mod
>> v0.5rc1 can do bw limiting on a host/ip filesize basis.
>> http://ivn.cl/apache
>>
>> Note that it is a release candidate yet. Although There are some
>> people using my mod flawlessly on huge servers..
>>
>> It has been tested on Linux/x86.
>> It compiles and work (very little testing) on MacOS X, Solaris,
>> FreeBSD, on Sparc, Ppc, and x86.
>>
>> It's limited to work on perfork MPM for now. I'm implementing worker,
>> and windows port soon. (also a MaxConnection per each ip)
>
>
> I have tried it but it even did not compile. Are you sure this works
> with Apache 1.3.33 on Linux/x86? The errors follow below.
>
> Anyway, despite it may let you limit connections per IP, I wonder if
> what this module does is what I want.
>
> First, I could not see a way to monitor the IPs that are being throttle
> like you can with mod_throttle accessing a special handler page.
>
> Another this is that this module seems to restrict the number of
> simulataneous connections. What mod_throttle does is to limit the number
> of requests per period of time. This is more like what I want. Does your
> module provide this?
>
> $ /path/to/httpd/bin/apxs -i -a -c bw_mod-0.5rc1.c
> gcc -DLINUX=22 -DHAVE_SET_DUMPABLE -DNO_DBM_REWRITEMAP -DUSE_EXPAT
> -I../lib/expat-lite -fpic -DSHARED_MODULE -I/path/to/httpd/include -c
> bw_mod-0.5rc1.c
> bw_mod-0.5rc1.c:39:25: apr_buckets.h: No such file or directory
> bw_mod-0.5rc1.c:40:25: apr_strings.h: No such file or directory
> bw_mod-0.5rc1.c:41:24: apr_atomic.h: No such file or directory
> bw_mod-0.5rc1.c:42:21: apr_lib.h: No such file or directory
> bw_mod-0.5rc1.c:43:21: apr_shm.h: No such file or directory
> bw_mod-0.5rc1.c:45:25: util_filter.h: No such file or directory
> bw_mod-0.5rc1.c:46:20: ap_mpm.h: No such file or directory
> apxs:Break: Command failed with rc=1
>
>
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by Manuel Lemos <ml...@acm.org>.
Hello,
on 01/27/2005 07:51 PM Ivan Barrera A. said the following:
> I'm not sure if this is what your asking for, but my mod bw_mod v0.5rc1
> can do bw limiting on a host/ip filesize basis.
> http://ivn.cl/apache
>
> Note that it is a release candidate yet. Although There are some people
> using my mod flawlessly on huge servers..
>
> It has been tested on Linux/x86.
> It compiles and work (very little testing) on MacOS X, Solaris, FreeBSD,
> on Sparc, Ppc, and x86.
>
> It's limited to work on perfork MPM for now. I'm implementing worker,
> and windows port soon. (also a MaxConnection per each ip)
I have tried it but it even did not compile. Are you sure this works
with Apache 1.3.33 on Linux/x86? The errors follow below.
Anyway, despite it may let you limit connections per IP, I wonder if
what this module does is what I want.
First, I could not see a way to monitor the IPs that are being throttle
like you can with mod_throttle accessing a special handler page.
Another this is that this module seems to restrict the number of
simulataneous connections. What mod_throttle does is to limit the number
of requests per period of time. This is more like what I want. Does your
module provide this?
$ /path/to/httpd/bin/apxs -i -a -c bw_mod-0.5rc1.c
gcc -DLINUX=22 -DHAVE_SET_DUMPABLE -DNO_DBM_REWRITEMAP -DUSE_EXPAT
-I../lib/expat-lite -fpic -DSHARED_MODULE -I/path/to/httpd/include -c
bw_mod-0.5rc1.c
bw_mod-0.5rc1.c:39:25: apr_buckets.h: No such file or directory
bw_mod-0.5rc1.c:40:25: apr_strings.h: No such file or directory
bw_mod-0.5rc1.c:41:24: apr_atomic.h: No such file or directory
bw_mod-0.5rc1.c:42:21: apr_lib.h: No such file or directory
bw_mod-0.5rc1.c:43:21: apr_shm.h: No such file or directory
bw_mod-0.5rc1.c:45:25: util_filter.h: No such file or directory
bw_mod-0.5rc1.c:46:20: ap_mpm.h: No such file or directory
apxs:Break: Command failed with rc=1
--
Regards,
Manuel Lemos
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/
Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by "Ivan Barrera A." <Br...@Ivn.cl>.
I'm not sure if this is what your asking for, but my mod bw_mod v0.5rc1
can do bw limiting on a host/ip filesize basis.
http://ivn.cl/apache
Note that it is a release candidate yet. Although There are some people
using my mod flawlessly on huge servers..
It has been tested on Linux/x86.
It compiles and work (very little testing) on MacOS X, Solaris, FreeBSD,
on Sparc, Ppc, and x86.
It's limited to work on perfork MPM for now. I'm implementing worker,
and windows port soon. (also a MaxConnection per each ip)
Manuel Lemos wrote:
> Hello,
>
> I want to use mod_throttle to limit the rate of requests that can be
> done from each client IP.
>
> However, I also need to allow the internal crawler, which is HT:Dig, to
> crawl the site without retrictions.
>
> My problem is that I could not find a way to specify a different
> throttle policy for clients accessing from a given IP nor even from a
> given user agent.
>
> Anybody has any ideas on how to solve this problem?
>
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by Manuel Lemos <ml...@acm.org>.
Hello,
on 01/27/2005 09:25 PM Jeremy Hilton said the following:
>>>>I want to use mod_throttle to limit the rate of requests that can be
>>>>done from each client IP.
>>>>
>>>>However, I also need to allow the internal crawler, which is HT:Dig, to
>>>>crawl the site without retrictions.
>>>>
>>>>My problem is that I could not find a way to specify a different
>>>>throttle policy for clients accessing from a given IP nor even from a
>>>>given user agent.
>>>>
>>>>Anybody has any ideas on how to solve this problem?
>>>http://www.snert.com/Software/mod_throttle/
>>If you read me again, you may notice that I want to use mod_throttle but
>>AFAIK there is no way to tell it to not throttle connections that come
>>from a given IP. That is the problem.
>>
>>I was wondering if there is a way to configure mod throttle in
>>combination with mod_setenvif like I already do to split my logs, but I
>>could not find a way to do it.
>
> Since the throttle rules can be declared in the context of <VirtualHost>,
> couldn't you setup a second virt host that has no throttle rules, and
> instruct your crawler to use the "other" virt host?
Good idea!
However the way I am doing it, it seems to not be working as it seems to
still be counting my server IP accesses and block it when it exceeds the
accesses for the virtual host that has the throttle policy set to none.
It seems like ThrottlePolicy None does not stop setting
ThrottleClientIP. I tried setting ThrottleClientIP inside just one
virtual host but it crashes Apache when I do apachectl configtest.
Do you have any what is the problem? I have it like this:
LoadModule throttle_module libexec/mod_throttle.so
<IfModule mod_throttle.c>
ThrottleClientIP 100 Document 100 180
ThrottlePolicy None
<Location /throttle-status>
SetHandler throttle-status
</Location>
</IfModule>
<VirtualHost 123.45.67.89>
DocumentRoot /path/to/httpd/htdocs
ServerName crawl.mydomain.com
<IfModule mod_throttle.c>
ThrottlePolicy None
</IfModule>
</VirtualHost>
<VirtualHost 123.45.67.89>
DocumentRoot /path/to/httpd/htdocs
ServerName www.mydomain.com
<IfModule mod_throttle.c>
ThrottlePolicy Document
</IfModule>
</VirtualHost>
--
Regards,
Manuel Lemos
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/
Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by Jeremy Hilton <je...@adtcs.com>.
On 1/27/05 5:34 PM, "Manuel Lemos" <ml...@acm.org> wrote:
> Hello,
>
> on 01/27/2005 08:02 PM Jeremy Hilton said the following:
>>> I want to use mod_throttle to limit the rate of requests that can be
>>> done from each client IP.
>>>
>>> However, I also need to allow the internal crawler, which is HT:Dig, to
>>> crawl the site without retrictions.
>>>
>>> My problem is that I could not find a way to specify a different
>>> throttle policy for clients accessing from a given IP nor even from a
>>> given user agent.
>>>
>>> Anybody has any ideas on how to solve this problem?
>>
>> http://www.snert.com/Software/mod_throttle/
>
> If you read me again, you may notice that I want to use mod_throttle but
> AFAIK there is no way to tell it to not throttle connections that come
> from a given IP. That is the problem.
>
> I was wondering if there is a way to configure mod throttle in
> combination with mod_setenvif like I already do to split my logs, but I
> could not find a way to do it.
Since the throttle rules can be declared in the context of <VirtualHost>,
couldn't you setup a second virt host that has no throttle rules, and
instruct your crawler to use the "other" virt host?
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by Manuel Lemos <ml...@acm.org>.
Hello,
on 01/27/2005 08:02 PM Jeremy Hilton said the following:
>>I want to use mod_throttle to limit the rate of requests that can be
>>done from each client IP.
>>
>>However, I also need to allow the internal crawler, which is HT:Dig, to
>>crawl the site without retrictions.
>>
>>My problem is that I could not find a way to specify a different
>>throttle policy for clients accessing from a given IP nor even from a
>>given user agent.
>>
>>Anybody has any ideas on how to solve this problem?
>
> http://www.snert.com/Software/mod_throttle/
If you read me again, you may notice that I want to use mod_throttle but
AFAIK there is no way to tell it to not throttle connections that come
from a given IP. That is the problem.
I was wondering if there is a way to configure mod throttle in
combination with mod_setenvif like I already do to split my logs, but I
could not find a way to do it.
--
Regards,
Manuel Lemos
PHP Classes - Free ready to use OOP components written in PHP
http://www.phpclasses.org/
PHP Reviews - Reviews of PHP books and other products
http://www.phpclasses.org/reviews/
Metastorage - Data object relational mapping layer generator
http://www.meta-language.net/metastorage.html
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] mod_throttle relaxing restrictions for crawlers
Posted by Jeremy Hilton <je...@adtcs.com>.
On 1/27/05 4:45 PM, "Manuel Lemos" <ml...@acm.org> wrote:
> Hello,
>
> I want to use mod_throttle to limit the rate of requests that can be
> done from each client IP.
>
> However, I also need to allow the internal crawler, which is HT:Dig, to
> crawl the site without retrictions.
>
> My problem is that I could not find a way to specify a different
> throttle policy for clients accessing from a given IP nor even from a
> given user agent.
>
> Anybody has any ideas on how to solve this problem?
http://www.snert.com/Software/mod_throttle/
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org