You are viewing a plain text version of this content. The canonical link for it is here.
Posted to cvs@httpd.apache.org by rb...@apache.org on 2009/11/02 23:57:44 UTC

svn commit: r832175 - in /httpd/httpd/trunk/docs/manual/rewrite: access.html.en access.xml rewrite_guide.html.en rewrite_guide.xml

Author: rbowen
Date: Mon Nov  2 22:57:44 2009
New Revision: 832175

URL: http://svn.apache.org/viewvc?rev=832175&view=rev
Log:
Removes the 'block evil robots' rule from rewrite_guide, moves it to
access, and makes it not suck.

Modified:
    httpd/httpd/trunk/docs/manual/rewrite/access.html.en
    httpd/httpd/trunk/docs/manual/rewrite/access.xml
    httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en
    httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml

Modified: httpd/httpd/trunk/docs/manual/rewrite/access.html.en
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/access.html.en?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/access.html.en (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/access.html.en Mon Nov  2 22:57:44 2009
@@ -36,7 +36,78 @@
 
 </div>
 <div id="quickview"><h3>See also</h3><ul class="seealso"><li><a href="../mod/mod_rewrite.html">Module documentation</a></li><li><a href="intro.html">mod_rewrite introduction</a></li></ul></div>
-</div>
+<div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
+<div class="section">
+<h2><a name="blocking-of-robots" id="blocking-of-robots">Blocking of Robots</a></h2>
+
+      
+
+      <dl>
+        <dt>Description:</dt>
+
+        <dd>
+        <p>
+        In this recipe, we discuss how to block persistent requests from
+        a particular robot, or user agent.</p>
+
+        <p>The standard for robot exclusion defines a file,
+        <code>/robots.txt</code> that specifies those portions of your
+        website where you which to exclude robots. However, some robots
+        do not honor these files.
+        </p>
+
+        <p>Note that there are methods of accomplishing this which do
+        not use mod_rewrite. Note also that any technique that relies on
+        the clients <code>USER_AGENT</code> string can be circumvented
+        very easily, since that string can be changed.</p>
+        </dd>
+
+        <dt>Solution:</dt>
+
+        <dd>
+        <p>We use a ruleset that specifies the directory to be
+        protected, and the client <code>USER_AGENT</code> that
+        identifies the malicious or persistent robot.</p>
+
+        <p>In this example, we are blocking a robot called
+        <code>NameOfBadRobot</code> from a location
+        <code>/secret/files</code>. You may also specify an IP address
+        range, if you are trying to block that user agent only from the
+        particular source.</p>
+
+<div class="example"><pre>
+RewriteCond %{HTTP_USER_AGENT}   ^<strong>NameOfBadRobot</strong>
+RewriteCond %{REMOTE_ADDR}       =<strong>123\.45\.67\.[8-9]</strong>
+RewriteRule ^<strong>/secret/files/</strong>   -   [<strong>F</strong>]
+</pre></div>
+        </dd>
+
+      <dt>Discussion</dt>
+
+      <dd>
+      <p>
+        Rather than using mod_rewrite for this, you can accomplish the
+        same end using alternate means, as illustrated here:
+      </p>
+      <div class="example"><p><code>
+      SetEnvIfNoCase User-Agent ^NameOfBadRobot goaway<br />
+      &lt;Location /secret/files&gt;<br />
+      Order allow,deny<br />
+      Allow from all<br />
+      Deny from env=goaway
+      </code></p></div>
+      <p>
+      As noted above, this technique is trivial to circumvent, by simply
+      modifying the <code>USER_AGENT</code> request header. If you
+      are experiencing a sustained attack, you should consider blocking
+      it at a higher level, such as at your firewall.
+      </p>
+
+      </dd>
+
+      </dl>
+
+    </div></div>
 <div class="bottomlang">
 <p><span>Available Languages: </span><a href="../en/rewrite/access.html" title="English">&nbsp;en&nbsp;</a></p>
 </div><div id="footer">

Modified: httpd/httpd/trunk/docs/manual/rewrite/access.xml
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/access.xml?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/access.xml (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/access.xml Mon Nov  2 22:57:44 2009
@@ -43,5 +43,75 @@
 <seealso><a href="../mod/mod_rewrite.html">Module documentation</a></seealso>
 <seealso><a href="intro.html">mod_rewrite introduction</a></seealso>
 
+    <section id="blocking-of-robots">
+
+      <title>Blocking of Robots</title>
+
+      <dl>
+        <dt>Description:</dt>
+
+        <dd>
+        <p>
+        In this recipe, we discuss how to block persistent requests from
+        a particular robot, or user agent.</p>
+
+        <p>The standard for robot exclusion defines a file,
+        <code>/robots.txt</code> that specifies those portions of your
+        website where you which to exclude robots. However, some robots
+        do not honor these files.
+        </p>
+
+        <p>Note that there are methods of accomplishing this which do
+        not use mod_rewrite. Note also that any technique that relies on
+        the clients <code>USER_AGENT</code> string can be circumvented
+        very easily, since that string can be changed.</p>
+        </dd>
+
+        <dt>Solution:</dt>
+
+        <dd>
+        <p>We use a ruleset that specifies the directory to be
+        protected, and the client <code>USER_AGENT</code> that
+        identifies the malicious or persistent robot.</p>
+
+        <p>In this example, we are blocking a robot called
+        <code>NameOfBadRobot</code> from a location
+        <code>/secret/files</code>. You may also specify an IP address
+        range, if you are trying to block that user agent only from the
+        particular source.</p>
+
+<example><pre>
+RewriteCond %{HTTP_USER_AGENT}   ^<strong>NameOfBadRobot</strong>
+RewriteCond %{REMOTE_ADDR}       =<strong>123\.45\.67\.[8-9]</strong>
+RewriteRule ^<strong>/secret/files/</strong>   -   [<strong>F</strong>]
+</pre></example>
+        </dd>
+
+      <dt>Discussion</dt>
+
+      <dd>
+      <p>
+        Rather than using mod_rewrite for this, you can accomplish the
+        same end using alternate means, as illustrated here:
+      </p>
+      <example>
+      SetEnvIfNoCase User-Agent ^NameOfBadRobot goaway<br />
+      &lt;Location /secret/files&gt;<br />
+      Order allow,deny<br />
+      Allow from all<br />
+      Deny from env=goaway
+      </example>
+      <p>
+      As noted above, this technique is trivial to circumvent, by simply
+      modifying the <code>USER_AGENT</code> request header. If you
+      are experiencing a sustained attack, you should consider blocking
+      it at a higher level, such as at your firewall.
+      </p>
+
+      </dd>
+
+      </dl>
+
+    </section>
 
 </manualpage> 

Modified: httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.html.en Mon Nov  2 22:57:44 2009
@@ -56,7 +56,6 @@
 <li><img alt="" src="../images/down.gif" /> <a href="#old-to-new">From Old to New (intern)</a></li>
 <li><img alt="" src="../images/down.gif" /> <a href="#old-to-new-extern">From Old to New (extern)</a></li>
 <li><img alt="" src="../images/down.gif" /> <a href="#static-to-dynamic">From Static to Dynamic</a></li>
-<li><img alt="" src="../images/down.gif" /> <a href="#blocking-of-robots">Blocking of Robots</a></li>
 <li><img alt="" src="../images/down.gif" /> <a href="#blocked-inline-images">Forbidding Image "Hotlinking"</a></li>
 <li><img alt="" src="../images/down.gif" /> <a href="#proxy-deny">Proxy Deny</a></li>
 <li><img alt="" src="../images/down.gif" /> <a href="#external-rewriting">External Rewriting Engine</a></li>
@@ -653,44 +652,6 @@
 
     </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
 <div class="section">
-<h2><a name="blocking-of-robots" id="blocking-of-robots">Blocking of Robots</a></h2>
-
-      
-
-      <dl>
-        <dt>Description:</dt>
-
-        <dd>
-          <p>How can we block a really annoying robot from
-          retrieving pages of a specific webarea? A
-          <code>/robots.txt</code> file containing entries of the
-          "Robot Exclusion Protocol" is typically not enough to get
-          rid of such a robot.</p>
-        </dd>
-
-        <dt>Solution:</dt>
-
-        <dd>
-          <p>We use a ruleset which forbids the URLs of the webarea
-          <code>/~quux/foo/arc/</code> (perhaps a very deep
-          directory indexed area where the robot traversal would
-          create big server load). We have to make sure that we
-          forbid access only to the particular robot, i.e. just
-          forbidding the host where the robot runs is not enough.
-          This would block users from this host, too. We accomplish
-          this by also matching the User-Agent HTTP header
-          information.</p>
-
-<div class="example"><pre>
-RewriteCond %{HTTP_USER_AGENT}   ^<strong>NameOfBadRobot</strong>.*
-RewriteCond %{REMOTE_ADDR}       ^<strong>123\.45\.67\.[8-9]</strong>$
-RewriteRule ^<strong>/~quux/foo/arc/</strong>.+   -   [<strong>F</strong>]
-</pre></div>
-        </dd>
-      </dl>
-
-    </div><div class="top"><a href="#page-header"><img alt="top" src="../images/up.gif" /></a></div>
-<div class="section">
 <h2><a name="blocked-inline-images" id="blocked-inline-images">Forbidding Image "Hotlinking"</a></h2>
 
       

Modified: httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml
URL: http://svn.apache.org/viewvc/httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml?rev=832175&r1=832174&r2=832175&view=diff
==============================================================================
--- httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml (original)
+++ httpd/httpd/trunk/docs/manual/rewrite/rewrite_guide.xml Mon Nov  2 22:57:44 2009
@@ -627,44 +627,6 @@
 
     </section>
 
-    <section id="blocking-of-robots">
-
-      <title>Blocking of Robots</title>
-
-      <dl>
-        <dt>Description:</dt>
-
-        <dd>
-          <p>How can we block a really annoying robot from
-          retrieving pages of a specific webarea? A
-          <code>/robots.txt</code> file containing entries of the
-          "Robot Exclusion Protocol" is typically not enough to get
-          rid of such a robot.</p>
-        </dd>
-
-        <dt>Solution:</dt>
-
-        <dd>
-          <p>We use a ruleset which forbids the URLs of the webarea
-          <code>/~quux/foo/arc/</code> (perhaps a very deep
-          directory indexed area where the robot traversal would
-          create big server load). We have to make sure that we
-          forbid access only to the particular robot, i.e. just
-          forbidding the host where the robot runs is not enough.
-          This would block users from this host, too. We accomplish
-          this by also matching the User-Agent HTTP header
-          information.</p>
-
-<example><pre>
-RewriteCond %{HTTP_USER_AGENT}   ^<strong>NameOfBadRobot</strong>.*
-RewriteCond %{REMOTE_ADDR}       ^<strong>123\.45\.67\.[8-9]</strong>$
-RewriteRule ^<strong>/~quux/foo/arc/</strong>.+   -   [<strong>F</strong>]
-</pre></example>
-        </dd>
-      </dl>
-
-    </section>
-
     <section id="blocked-inline-images">
 
       <title>Forbidding Image &quot;Hotlinking&quot;</title>