You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by le...@apache.org on 2014/06/09 15:35:38 UTC

svn commit: r1601376 - /nutch/cms_site/trunk/content/bot.md

Author: lewismc
Date: Mon Jun  9 13:35:38 2014
New Revision: 1601376

URL: http://svn.apache.org/r1601376
Log:
Test formatting on bot.html

Modified:
    nutch/cms_site/trunk/content/bot.md

Modified: nutch/cms_site/trunk/content/bot.md
URL: http://svn.apache.org/viewvc/nutch/cms_site/trunk/content/bot.md?rev=1601376&r1=1601375&r2=1601376&view=diff
==============================================================================
--- nutch/cms_site/trunk/content/bot.md (original)
+++ nutch/cms_site/trunk/content/bot.md Mon Jun  9 13:35:38 2014
@@ -18,66 +18,62 @@ specific language governing permissions 
 under the License. 
 -->
 
-	<!-- Subhead
+<!-- Subhead
 ================================================== -->
-	<header class="jumbotron subhead" id="overview">
-		<div class="container">
-			<h1>Nutch Robot</h1>
-			<p class="lead">A page for SysAdmins/WebMasters and other angry
-				people... ;)</p>
-		</div>
-	</header>
+<header class="jumbotron subhead" id="overview">
+  <div class="container">
+    <h1>Nutch Robot</h1>
+    <p class="lead">A page for SysAdmins/WebMasters and other angry people... ;)</p>
+  </div>
+</header>
 
-	<div class="container">
-		<!-- Typography ================================================== -->
-		<section id="application">
-			<div class="page-header">
-				<h1>Introduction</h1>
-				<p>If you're reading this, chances are you've seen a Nutch-based
-					robot visiting your site while looking through your server logs.
-					Our software obeys robots.txt files and robot META tags in HTML.
-					These are the standard mechanisms for webmasters to tell web robots
-					which portions of a site a robot is welcome to access.</p>
-				<h1>Sysadmins/robots.txt</h1>
-				<p>
-					We're a software project, not a service, so please understand that
-					a misbehaving crawler appearing with our Agent string is not run by
-					us. Our software may be run by anyone. However, we'd still like to
-					hear about any bad behavior. If possible, please include the name
-					of the domain and some representative log entries. We can be
-					reached at
-					<code>dev [at] nutch [dot] apache [dot] org</code>
-					.
-				</p>
-				<p>
-					Our software obeys the robots.txt exclusion standard, described at
-					<a href="http://www.robotstxt.org/wc/exclusion.html#robotstxt">
-						http://www.robotstxt.org/wc/exclusion.html#robotstxt</a>. Different
-					installations of the Nutch software may specify different agent
-					names, but all should respond to the agent name "Nutch". Thus to
-					ban all Nutch-based crawlers from your site, place the following in
-					your robots.txt file:
-				</p>
-				<pre>User-agent: Nutch<br>Disallow: /</pre>
-			</div>
-			<div class="page-header">
-				<h1>Webmasters/Robots META</h1>
-				<p>
-					If you do not have permission to edit the /robots.txt file on your
-					server, you can still tell robots not to index your pages or follow
-					your links. The standard mechanism for this is the robots META tag,
-					as described at<a href="http://www.robotstxt.org/wc/meta-user.html">
-						http://www.robotstxt.org/wc/meta-user.html</a>.
-				</p>
-			</div>
-			<div class="page-header">
-				<h1>Contact us</h1>
-				<p>
-					If your site has problems or questions about the Nutch crawler,
-					please send an email to the
-					<code>agent [at] nutch [dot] apache [dot] org</code>
-					- Nutch agent mailing list.
-				</p>
-			</div>
-		</section>
-	</div>
+<div class="container">
+  <!-- Typography ================================================== -->
+  <section id="application">
+    <div class="page-header">
+      <h1>Introduction</h1>
+      <p>If you're reading this, chances are you've seen a Nutch-based
+      robot visiting your site while looking through your server logs.
+      Our software obeys robots.txt files and robot META tags in HTML.
+      These are the standard mechanisms for webmasters to tell web robots
+      which portions of a site a robot is welcome to access.</p>
+      <h1>Sysadmins/robots.txt</h1>
+      <p>
+      We're a software project, not a service, so please understand that
+      a misbehaving crawler appearing with our Agent string is not run by
+      us. Our software may be run by anyone. However, we'd still like to
+      hear about any bad behavior. If possible, please include the name
+      of the domain and some representative log entries. We can be
+      reached at <code>dev [at] nutch [dot] apache [dot] org</code>.
+      </p>
+      <p>
+      Our software obeys the robots.txt exclusion standard, described at
+      <a href="http://www.robotstxt.org/wc/exclusion.html#robotstxt">
+      http://www.robotstxt.org/wc/exclusion.html#robotstxt</a>. Different
+      installations of the Nutch software may specify different agent
+      names, but all should respond to the agent name "Nutch". Thus to
+      ban all Nutch-based crawlers from your site, place the following in
+      your robots.txt file:</p>
+      <pre>User-agent: Nutch<br>Disallow: /</pre>
+    </div>
+    <div class="page-header">
+      <h1>Webmasters/Robots META</h1>
+      <p>
+      If you do not have permission to edit the /robots.txt file on your
+      server, you can still tell robots not to index your pages or follow
+      your links. The standard mechanism for this is the robots META tag,
+      as described at<a href="http://www.robotstxt.org/wc/meta-user.html">
+      http://www.robotstxt.org/wc/meta-user.html</a>.
+      </p>
+    </div>
+    <div class="page-header">
+      <h1>Contact us</h1>
+      <p>
+      If your site has problems or questions about the Nutch crawler,
+      please send an email to the
+      <code>agent [at] nutch [dot] apache [dot] org</code>
+      - Nutch agent mailing list.
+      </p>
+    </div>
+  </section>
+</div>