You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Carlos S <ne...@gmail.com> on 2011/01/05 00:03:02 UTC
[users@httpd] disable wget-like user-agents
Is there any way to disable download/traffic from wget-like user
agents? Can this be done using user-agent string? Any documentation
link or example will be really helpful.
--
cs.
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] disable wget-like user-agents
Posted by Igor Galić <i....@brainsware.org>.
----- "Mark Montague" <ma...@catseye.org> wrote:
> On January 4, 2011 22:32 , Carlos S <ne...@gmail.com> wrote:
> > Recently I was trying to download a package using wget, but the
> > website prevented access to it. I tried --user-agent option but it
> > didn't work either. So I was curious to know what strategy this web
> > admin must have implemented.
>
> Without an example URL, I can only speculate, but the ideas that come
> to
> mind first are denying the download unless a cookie is set (you could
i.galic@panic ~ % wget --help | grep cook
--no-cookies don’t use cookies.
--load-cookies=FILE load cookies from FILE before session.
--save-cookies=FILE save cookies to FILE after session.
--keep-session-cookies load and save session (non-permanent) cookies.
i.galic@panic ~ %
> get quite complex with this, such as setting the cookie via
> JavaScript,
Yup.. that (JS) would kill off wget.. but also many other (sensible) clients
> which wget won't execute), checking the referrer header, or other
i.galic@panic ~ % wget --help | grep -i referer
--referer=URL include ‘Referer: URL’ header in HTTP request.
i.galic@panic ~ %
> JavaScript based checks.
i
> --
> Mark Montague
> mark@catseye.org
i
--
Igor Galić
Tel: +43 (0) 664 886 22 883
Mail: i.galic@brainsware.org
URL: http://brainsware.org/
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] disable wget-like user-agents
Posted by Mark Montague <ma...@catseye.org>.
On January 4, 2011 22:32 , Carlos S <ne...@gmail.com> wrote:
> Recently I was trying to download a package using wget, but the
> website prevented access to it. I tried --user-agent option but it
> didn't work either. So I was curious to know what strategy this web
> admin must have implemented.
Without an example URL, I can only speculate, but the ideas that come to
mind first are denying the download unless a cookie is set (you could
get quite complex with this, such as setting the cookie via JavaScript,
which wget won't execute), checking the referrer header, or other
JavaScript based checks.
--
Mark Montague
mark@catseye.org
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] disable wget-like user-agents
Posted by Carlos S <ne...@gmail.com>.
Thanks for the links Mark and Doug. The webscrapers thing looks interesting..
I had looked at mod_rewrite and User-Agent header solution.
Recently I was trying to download a package using wget, but the
website prevented access to it. I tried --user-agent option but it
didn't work either. So I was curious to know what strategy this web
admin must have implemented. May be I used incorrect user-agent
string?? I remember using AppleWebKit and Mozilla strings, will try
again.
(Not giving out that particular URL out of courtesy).
-cs.
On Tue, Jan 4, 2011 at 5:33 PM, Doug McNutt <do...@macnauchtan.com> wrote:
> At 18:19 -0500 1/4/11, Mark Montague wrote:
>>Follow the example below, but use only the user agent condition, omit the IP condition, and suitably adjust the RewriteRule regular expression to match the URL(s) you wish to block:
>>
>>http://httpd.apache.org/docs/2.2/rewrite/rewrite_guide.html#blocking-of-robots
>>
>>Note that wget has a -U option that can be used to get around this block by using a user agent string that you are not blocking -- so the block will not prevent a determined downloader.
>
> *******
>
> You might want to have a look at this rather new mailing list. It's interested in doing exactly the opposite of what you want.
>
> List-Id: webscrapers talk <webscrapers.cool.haxx.se>
> List-Archive: <http://cool.haxx.se/pipermail/webscrapers>
> List-Post: <ma...@cool.haxx.se>
> List-Help: <mailto:webscrapers-request@cool.haxx.se?subject=help>
> List-Subscribe: <http://cool.haxx.se/cgi-bin/mailman/listinfo/webscrapers>, <mailto:webscrapers-request@cool.haxx.se?subject=subscribe>
>
>
>
> --
>
> --> From the U S of A, the only socialist country that refuses to admit it. <--
>
> ---------------------------------------------------------------------
> The official User-To-User support forum of the Apache HTTP Server Project.
> See <URL:http://httpd.apache.org/userslist.html> for more info.
> To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
> " from the digest: users-digest-unsubscribe@httpd.apache.org
> For additional commands, e-mail: users-help@httpd.apache.org
>
>
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] disable wget-like user-agents
Posted by Doug McNutt <do...@macnauchtan.com>.
At 18:19 -0500 1/4/11, Mark Montague wrote:
>Follow the example below, but use only the user agent condition, omit the IP condition, and suitably adjust the RewriteRule regular expression to match the URL(s) you wish to block:
>
>http://httpd.apache.org/docs/2.2/rewrite/rewrite_guide.html#blocking-of-robots
>
>Note that wget has a -U option that can be used to get around this block by using a user agent string that you are not blocking -- so the block will not prevent a determined downloader.
*******
You might want to have a look at this rather new mailing list. It's interested in doing exactly the opposite of what you want.
List-Id: webscrapers talk <webscrapers.cool.haxx.se>
List-Archive: <http://cool.haxx.se/pipermail/webscrapers>
List-Post: <ma...@cool.haxx.se>
List-Help: <mailto:webscrapers-request@cool.haxx.se?subject=help>
List-Subscribe: <http://cool.haxx.se/cgi-bin/mailman/listinfo/webscrapers>, <mailto:webscrapers-request@cool.haxx.se?subject=subscribe>
--
--> From the U S of A, the only socialist country that refuses to admit it. <--
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org
Re: [users@httpd] disable wget-like user-agents
Posted by Mark Montague <ma...@catseye.org>.
On January 4, 2011 18:03 , Carlos S <ne...@gmail.com> wrote:
> Is there any way to disable download/traffic from wget-like user
> agents? Can this be done using user-agent string? Any documentation
> link or example will be really helpful.
Follow the example below, but use only the user agent condition, omit
the IP condition, and suitably adjust the RewriteRule regular expression
to match the URL(s) you wish to block:
http://httpd.apache.org/docs/2.2/rewrite/rewrite_guide.html#blocking-of-robots
Note that wget has a -U option that can be used to get around this block
by using a user agent string that you are not blocking -- so the block
will not prevent a determined downloader.
--
Mark Montague
mark@catseye.org
---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
" from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org