You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by rb...@apache.org on 2009/11/02 23:57:44 UTC
svn commit: r832175 - in /httpd/httpd/trunk/docs/manual/rewrite:
access.html.en access.xml rewrite_guide.html.en rewrite_guide.xml
Author: rbowen
Date: Mon Nov 2 22:57:44 2009
New Revision: 832175
URL: http://svn.apache.org/viewvc?rev=832175&view=rev
Log:
Removes the 'block evil robots' rule from rewrite_guide, moves it to
access, and makes it not suck.
Modified:
httpd/httpd/trunk/docs/manual/rewrite/access.html.en
httpd/httpd/trunk/docs/manual/rewrite/access.xml
httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en
httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml
Modified: httpd/httpd/trunk/docs/manual/rewrite/access.html.en
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/access.html.en?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/access.html.en (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/access.html.en Mon Nov 2 22:57:44 2009
@@ -36,7 +36,78 @@
</div>
<div id="quickview"><h3>See also</h3><ul class="seealso"><li><a href="../mod/mod_rewrite.html">Module documentation</a></li><li><a href="intro.html">mod_rewrite introduction</a></li></ul></div>
-</div>
+<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
+<div class="section">
+<h2><a name="blocking-of-robots" id="blocking-of-robots">Blocking of Robots</a></h2>
+
+
+
+ <dl>
+ <dt>Description:</dt>
+
+ <dd>
+ <p>
+ In this recipe, we discuss how to block persistent requests from
+ a particular robot, or user agent.</p>
+
+ <p>The standard for robot exclusion defines a file,
+ <code>/robots.txt</code> that specifies those portions of your
+ website where you which to exclude robots. However, some robots
+ do not honor these files.
+ </p>
+
+ <p>Note that there are methods of accomplishing this which do
+ not use mod_rewrite. Note also that any technique that relies on
+ the clients <code>USER_AGENT</code> string can be circumvented
+ very easily, since that string can be changed.</p>
+ </dd>
+
+ <dt>Solution:</dt>
+
+ <dd>
+ <p>We use a ruleset that specifies the directory to be
+ protected, and the client <code>USER_AGENT</code> that
+ identifies the malicious or persistent robot.</p>
+
+ <p>In this example, we are blocking a robot called
+ <code>NameOfBadRobot</code> from a location
+ <code>/secret/files</code>. You may also specify an IP address
+ range, if you are trying to block that user agent only from the
+ particular source.</p>
+
+<div class="example"><pre>
+RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>
+RewriteCond %{REMOTE_ADDR} =<strong>123\.45\.67\.[8-9]</strong>
+RewriteRule ^<strong>/secret/files/</strong> - [<strong>F</strong>]
+</pre></div>
+ </dd>
+
+ <dt>Discussion</dt>
+
+ <dd>
+ <p>
+ Rather than using mod_rewrite for this, you can accomplish the
+ same end using alternate means, as illustrated here:
+ </p>
+ <div class="example"><p><code>
+ SetEnvIfNoCase User-Agent ^NameOfBadRobot goaway<br />
+ <Location /secret/files><br />
+ Order allow,deny<br />
+ Allow from all<br />
+ Deny from env=goaway
+ </code></p></div>
+ <p>
+ As noted above, this technique is trivial to circumvent, by simply
+ modifying the <code>USER_AGENT</code> request header. If you
+ are experiencing a sustained attack, you should consider blocking
+ it at a higher level, such as at your firewall.
+ </p>
+
+ </dd>
+
+ </dl>
+
+ </div></div>
<div class="bottomlang">
<p><span>Available Languages: </span><a href="../en/rewrite/access.html" title="English"> en </a></p>
</div><div id="footer">
Modified: httpd/httpd/trunk/docs/manual/rewrite/access.xml
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/access.xml?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/access.xml (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/access.xml Mon Nov 2 22:57:44 2009
@@ -43,5 +43,75 @@
<seealso><a href="../mod/mod_rewrite.html">Module documentation</a></seealso>
<seealso><a href="intro.html">mod_rewrite introduction</a></seealso>
+ <section id="blocking-of-robots">
+
+ <title>Blocking of Robots</title>
+
+ <dl>
+ <dt>Description:</dt>
+
+ <dd>
+ <p>
+ In this recipe, we discuss how to block persistent requests from
+ a particular robot, or user agent.</p>
+
+ <p>The standard for robot exclusion defines a file,
+ <code>/robots.txt</code> that specifies those portions of your
+ website where you which to exclude robots. However, some robots
+ do not honor these files.
+ </p>
+
+ <p>Note that there are methods of accomplishing this which do
+ not use mod_rewrite. Note also that any technique that relies on
+ the clients <code>USER_AGENT</code> string can be circumvented
+ very easily, since that string can be changed.</p>
+ </dd>
+
+ <dt>Solution:</dt>
+
+ <dd>
+ <p>We use a ruleset that specifies the directory to be
+ protected, and the client <code>USER_AGENT</code> that
+ identifies the malicious or persistent robot.</p>
+
+ <p>In this example, we are blocking a robot called
+ <code>NameOfBadRobot</code> from a location
+ <code>/secret/files</code>. You may also specify an IP address
+ range, if you are trying to block that user agent only from the
+ particular source.</p>
+
+<example><pre>
+RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>
+RewriteCond %{REMOTE_ADDR} =<strong>123\.45\.67\.[8-9]</strong>
+RewriteRule ^<strong>/secret/files/</strong> - [<strong>F</strong>]
+</pre></example>
+ </dd>
+
+ <dt>Discussion</dt>
+
+ <dd>
+ <p>
+ Rather than using mod_rewrite for this, you can accomplish the
+ same end using alternate means, as illustrated here:
+ </p>
+ <example>
+ SetEnvIfNoCase User-Agent ^NameOfBadRobot goaway<br />
+ <Location /secret/files><br />
+ Order allow,deny<br />
+ Allow from all<br />
+ Deny from env=goaway
+ </example>
+ <p>
+ As noted above, this technique is trivial to circumvent, by simply
+ modifying the <code>USER_AGENT</code> request header. If you
+ are experiencing a sustained attack, you should consider blocking
+ it at a higher level, such as at your firewall.
+ </p>
+
+ </dd>
+
+ </dl>
+
+ </section>
</manualpage>
Modified: httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en Mon Nov 2 22:57:44 2009
@@ -56,7 +56,6 @@
<li><img alt="" src="../images/down.gif" /> <a href="#old-to-new">From Old to New (intern)</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#old-to-new-extern">From Old to New (extern)</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#static-to-dynamic">From Static to Dynamic</a></li>
-<li><img alt="" src="../images/down.gif" /> <a href="#blocking-of-robots">Blocking of Robots</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#blocked-inline-images">Forbidding Image "Hotlinking"</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#proxy-deny">Proxy Deny</a></li>
<li><img alt="" src="../images/down.gif" /> <a href="#external-rewriting">External Rewriting Engine</a></li>
@@ -653,44 +652,6 @@
</div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
<div class="section">
-<h2><a name="blocking-of-robots" id="blocking-of-robots">Blocking of Robots</a></h2>
-
-
-
- <dl>
- <dt>Description:</dt>
-
- <dd>
- <p>How can we block a really annoying robot from
- retrieving pages of a specific webarea? A
- <code>/robots.txt</code> file containing entries of the
- "Robot Exclusion Protocol" is typically not enough to get
- rid of such a robot.</p>
- </dd>
-
- <dt>Solution:</dt>
-
- <dd>
- <p>We use a ruleset which forbids the URLs of the webarea
- <code>/~quux/foo/arc/</code> (perhaps a very deep
- directory indexed area where the robot traversal would
- create big server load). We have to make sure that we
- forbid access only to the particular robot, i.e. just
- forbidding the host where the robot runs is not enough.
- This would block users from this host, too. We accomplish
- this by also matching the User-Agent HTTP header
- information.</p>
-
-<div class="example"><pre>
-RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>.*
-RewriteCond %{REMOTE_ADDR} ^<strong>123\.45\.67\.[8-9]</strong>$
-RewriteRule ^<strong>/~quux/foo/arc/</strong>.+ - [<strong>F</strong>]
-</pre></div>
- </dd>
- </dl>
-
- </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
-<div class="section">
<h2><a name="blocked-inline-images" id="blocked-inline-images">Forbidding Image "Hotlinking"</a></h2>
Modified: httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml Mon Nov 2 22:57:44 2009
@@ -627,44 +627,6 @@
</section>
- <section id="blocking-of-robots">
-
- <title>Blocking of Robots</title>
-
- <dl>
- <dt>Description:</dt>
-
- <dd>
- <p>How can we block a really annoying robot from
- retrieving pages of a specific webarea? A
- <code>/robots.txt</code> file containing entries of the
- "Robot Exclusion Protocol" is typically not enough to get
- rid of such a robot.</p>
- </dd>
-
- <dt>Solution:</dt>
-
- <dd>
- <p>We use a ruleset which forbids the URLs of the webarea
- <code>/~quux/foo/arc/</code> (perhaps a very deep
- directory indexed area where the robot traversal would
- create big server load). We have to make sure that we
- forbid access only to the particular robot, i.e. just
- forbidding the host where the robot runs is not enough.
- This would block users from this host, too. We accomplish
- this by also matching the User-Agent HTTP header
- information.</p>
-
-<example><pre>
-RewriteCond %{HTTP_USER_AGENT} ^<strong>NameOfBadRobot</strong>.*
-RewriteCond %{REMOTE_ADDR} ^<strong>123\.45\.67\.[8-9]</strong>$
-RewriteRule ^<strong>/~quux/foo/arc/</strong>.+ - [<strong>F</strong>]
-</pre></example>
- </dd>
- </dl>
-
- </section>
-
<section id="blocked-inline-images">
<title>Forbidding Image "Hotlinking"</title>