You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by bu...@apache.org on 2014/06/09 15:35:50 UTC

svn commit: r911967 - in /websites/staging/nutch/trunk/content: ./ bot.html

Author: buildbot
Date: Mon Jun  9 13:35:50 2014
New Revision: 911967

Log:
Staging update by buildbot for nutch

Modified:
    websites/staging/nutch/trunk/content/   (props changed)
    websites/staging/nutch/trunk/content/bot.html

Propchange: websites/staging/nutch/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Mon Jun  9 13:35:50 2014
@@ -1 +1 @@
-1601369
+1601376

Modified: websites/staging/nutch/trunk/content/bot.html
==============================================================================
--- websites/staging/nutch/trunk/content/bot.html (original)
+++ websites/staging/nutch/trunk/content/bot.html Mon Jun  9 13:35:50 2014
@@ -167,72 +167,66 @@ specific language governing permissions 
 under the License. 
 -->
 
-<div class="codehilite"><pre><span class="o">&lt;</span>!<span class="o">--</span> <span class="n">Subhead</span>
-</pre></div>
+<!-- Subhead
+================================================== -->
 
-
-<p>================================================== --&gt;
-    <header class="jumbotron subhead" id="overview">
-        <div class="container">
-            <h1>Nutch Robot</h1>
-            <p class="lead">A page for SysAdmins/WebMasters and other angry
-                people... ;)</p>
-        </div>
-    </header></p>
-<div class="codehilite"><pre><span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">&quot;container&quot;</span><span class="nt">&gt;</span>
-    <span class="c">&lt;!-- Typography ================================================== --&gt;</span>
-    <span class="nt">&lt;section</span> <span class="na">id=</span><span class="s">&quot;application&quot;</span><span class="nt">&gt;</span>
-        <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">&quot;page-header&quot;</span><span class="nt">&gt;</span>
-            <span class="nt">&lt;h1&gt;</span>Introduction<span class="nt">&lt;/h1&gt;</span>
-            <span class="nt">&lt;p&gt;</span>If you&#39;re reading this, chances are you&#39;ve seen a Nutch-based
-                robot visiting your site while looking through your server logs.
-                Our software obeys robots.txt files and robot META tags in HTML.
-                These are the standard mechanisms for webmasters to tell web robots
-                which portions of a site a robot is welcome to access.<span class="nt">&lt;/p&gt;</span>
-            <span class="nt">&lt;h1&gt;</span>Sysadmins/robots.txt<span class="nt">&lt;/h1&gt;</span>
-            <span class="nt">&lt;p&gt;</span>
-                We&#39;re a software project, not a service, so please understand that
-                a misbehaving crawler appearing with our Agent string is not run by
-                us. Our software may be run by anyone. However, we&#39;d still like to
-                hear about any bad behavior. If possible, please include the name
-                of the domain and some representative log entries. We can be
-                reached at
-                <span class="nt">&lt;code&gt;</span>dev [at] nutch [dot] apache [dot] org<span class="nt">&lt;/code&gt;</span>
-                .
-            <span class="nt">&lt;/p&gt;</span>
-            <span class="nt">&lt;p&gt;</span>
-                Our software obeys the robots.txt exclusion standard, described at
-                <span class="nt">&lt;a</span> <span class="na">href=</span><span class="s">&quot;http://www.robotstxt.org/wc/exclusion.html#robotstxt&quot;</span><span class="nt">&gt;</span>
-                    http://www.robotstxt.org/wc/exclusion.html#robotstxt<span class="nt">&lt;/a&gt;</span>. Different
-                installations of the Nutch software may specify different agent
-                names, but all should respond to the agent name &quot;Nutch&quot;. Thus to
-                ban all Nutch-based crawlers from your site, place the following in
-                your robots.txt file:
-            <span class="nt">&lt;/p&gt;</span>
-            <span class="nt">&lt;pre&gt;</span>User-agent: Nutch<span class="nt">&lt;br&gt;</span>Disallow: /<span class="nt">&lt;/pre&gt;</span>
-        <span class="nt">&lt;/div&gt;</span>
-        <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">&quot;page-header&quot;</span><span class="nt">&gt;</span>
-            <span class="nt">&lt;h1&gt;</span>Webmasters/Robots META<span class="nt">&lt;/h1&gt;</span>
-            <span class="nt">&lt;p&gt;</span>
-                If you do not have permission to edit the /robots.txt file on your
-                server, you can still tell robots not to index your pages or follow
-                your links. The standard mechanism for this is the robots META tag,
-                as described at<span class="nt">&lt;a</span> <span class="na">href=</span><span class="s">&quot;http://www.robotstxt.org/wc/meta-user.html&quot;</span><span class="nt">&gt;</span>
-                    http://www.robotstxt.org/wc/meta-user.html<span class="nt">&lt;/a&gt;</span>.
-            <span class="nt">&lt;/p&gt;</span>
-        <span class="nt">&lt;/div&gt;</span>
-        <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">&quot;page-header&quot;</span><span class="nt">&gt;</span>
-            <span class="nt">&lt;h1&gt;</span>Contact us<span class="nt">&lt;/h1&gt;</span>
-            <span class="nt">&lt;p&gt;</span>
-                If your site has problems or questions about the Nutch crawler,
-                please send an email to the
-                <span class="nt">&lt;code&gt;</span>agent [at] nutch [dot] apache [dot] org<span class="nt">&lt;/code&gt;</span>
-                - Nutch agent mailing list.
-            <span class="nt">&lt;/p&gt;</span>
-        <span class="nt">&lt;/div&gt;</span>
-    <span class="nt">&lt;/section&gt;</span>
-<span class="nt">&lt;/div&gt;</span>
-</pre></div></div>
+<header class="jumbotron subhead" id="overview">
+  <div class="container">
+    <h1>Nutch Robot</h1>
+    <p class="lead">A page for SysAdmins/WebMasters and other angry people... ;)</p>
+  </div>
+</header>
+
+<div class="container">
+  <!-- Typography ================================================== -->
+  <section id="application">
+    <div class="page-header">
+      <h1>Introduction</h1>
+      <p>If you're reading this, chances are you've seen a Nutch-based
+      robot visiting your site while looking through your server logs.
+      Our software obeys robots.txt files and robot META tags in HTML.
+      These are the standard mechanisms for webmasters to tell web robots
+      which portions of a site a robot is welcome to access.</p>
+      <h1>Sysadmins/robots.txt</h1>
+      <p>
+      We're a software project, not a service, so please understand that
+      a misbehaving crawler appearing with our Agent string is not run by
+      us. Our software may be run by anyone. However, we'd still like to
+      hear about any bad behavior. If possible, please include the name
+      of the domain and some representative log entries. We can be
+      reached at <code>dev [at] nutch [dot] apache [dot] org</code>.
+      </p>
+      <p>
+      Our software obeys the robots.txt exclusion standard, described at
+      <a href="http://www.robotstxt.org/wc/exclusion.html#robotstxt">
+      http://www.robotstxt.org/wc/exclusion.html#robotstxt</a>. Different
+      installations of the Nutch software may specify different agent
+      names, but all should respond to the agent name "Nutch". Thus to
+      ban all Nutch-based crawlers from your site, place the following in
+      your robots.txt file:</p>
+      <pre>User-agent: Nutch<br>Disallow: /</pre>
+    </div>
+    <div class="page-header">
+      <h1>Webmasters/Robots META</h1>
+      <p>
+      If you do not have permission to edit the /robots.txt file on your
+      server, you can still tell robots not to index your pages or follow
+      your links. The standard mechanism for this is the robots META tag,
+      as described at<a href="http://www.robotstxt.org/wc/meta-user.html">
+      http://www.robotstxt.org/wc/meta-user.html</a>.
+      </p>
+    </div>
+    <div class="page-header">
+      <h1>Contact us</h1>
+      <p>
+      If your site has problems or questions about the Nutch crawler,
+      please send an email to the
+      <code>agent [at] nutch [dot] apache [dot] org</code>
+      - Nutch agent mailing list.
+      </p>
+    </div>
+  </section>
+</div></div>
 	<!-- /container (main block) -->
 
 	<hr>