You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@labs.apache.org by th...@apache.org on 2008/09/04 13:25:49 UTC

svn commit: r691971 - /labs/droids/trunk/src/documentation/content/xdocs/index.xml

Author: thorsten
Date: Thu Sep  4 04:25:48 2008
New Revision: 691971

URL: http://svn.apache.org/viewvc?rev=691971&view=rev
Log:
white noise - formating changes

Modified:
    labs/droids/trunk/src/documentation/content/xdocs/index.xml

Modified: labs/droids/trunk/src/documentation/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/labs/droids/trunk/src/documentation/content/xdocs/index.xml?rev=691971&r1=691970&r2=691971&view=diff
==============================================================================
--- labs/droids/trunk/src/documentation/content/xdocs/index.xml (original)
+++ labs/droids/trunk/src/documentation/content/xdocs/index.xml Thu Sep  4 04:25:48 2008
@@ -1,159 +1,145 @@
 <?xml version="1.0" encoding="UTF-8"?>
-<!--
-  Licensed to the Apache Software Foundation (ASF) under one or more
-  contributor license agreements.  See the NOTICE file distributed with
-  this work for additional information regarding copyright ownership.
-  The ASF licenses this file to You under the Apache License, Version 2.0
-  (the "License"); you may not use this file except in compliance with
-  the License.  You may obtain a copy of the License at
-
-      http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License.
--->
+  <!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements. See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version
+    2.0 (the "License"); you may not use this file except in compliance
+    with the License. You may obtain a copy of the License at
+    http://www.apache.org/licenses/LICENSE-2.0 Unless required by
+    applicable law or agreed to in writing, software distributed under
+    the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
+    OR CONDITIONS OF ANY KIND, either express or implied. See the
+    License for the specific language governing permissions and
+    limitations under the License.
+  -->
 <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
 "http://forrest.apache.org/dtd/document-v20.dtd">
 <document>
   <header>
     <title>Welcome to Apache Droids</title>
   </header>
-
   <body>
-    <section
-     id="what">
+    <section id="what">
       <title>What is this?</title>
-
-      <p>Droids aims to be an intelligent standalone robot framework that
-      allows to create and extend existing droids (robots). In the future 
-      it will offer an administration application to manage and controll 
-      the different droids.</p>
-
-      <p>Droids makes it very easy to extend existing robots or write a new one
-      from scratch, which can automatically seek out relevant online information 
-       based on the user's specifications.</p>
+      <p>Droids aims to be an intelligent standalone robot
+        framework that allows to create and extend existing droids
+        (robots). In the future it will offer an administration
+        application to manage and controll the different droids.</p>
+      <p>Droids makes it very easy to extend existing robots or
+        write a new one from scratch, which can automatically seek out
+        relevant online information based on the user's specifications.</p>
     </section>
-
     <section>
       <title>Why was it created?</title>
-
-      <p>Mainly because of personal curiosity: The background of this work is
-      that Cocoon trunk does not provide a crawler anymore and Forrest is based
-      on it, meaning we cannot update anymore till we found a crawler
-      replacement. Getting more involved in Solr and Nutch I see request for a
-      generic standalone crawler.</p>
-      
-      <p>For the first core I took nutch, ripped out and modified the plugin/extension
-        framework. However the second version were not based on it anymore but was using
-        Spring instead. The main reason is that Spring has become a standard and helps to make
+      <p>Mainly because of personal curiosity: The background of
+        this work is that Cocoon trunk does not provide a crawler
+        anymore and Forrest is based on it, meaning we cannot update
+        anymore till we found a crawler replacement. Getting more
+        involved in Solr and Nutch I see request for a generic
+        standalone crawler.</p>
+      <p>For the first core I took nutch, ripped out and modified the
+        plugin/extension framework. However the second version were not
+        based on it anymore but was using Spring instead. The main
+        reason is that Spring has become a standard and helps to make
         Droids as extensible as possible.</p>
     </section>
-    
     <section>
-  <title>Feature list</title>
-  <ul>
-    <li>
-      <strong>Customizable.</strong>
-      Completely controlled by its default.properties which can be
-      easily be overridden by creating a file build.properties and
-      overriding the default properties that are needed.
-    </li>
-    <li>
-      <strong>Spring based.</strong>
-      The properties mentioned above get picked up by the build process
-      which inject them in the spring configuration.
-    </li>
-    <li>
-      <strong>Extensible.</strong>
-      The spring configuration makes usage of the cocoon-configurator and
-      its dynamic registry support (making extending droids a pleasure).
-    </li>
-    <li>
-      <strong>Multi-threaded.</strong>
-      The architecture is that a robot (e.g. DefaultDroid) controls
-      various worker (threads) that are doing the actual work.
-    </li>
-    <li>
-      <strong>Honor robots.txt.</strong>
-      By default droids honors the robot.txt. However you can turn on the
-hostile mode of a droid (droids.protocol.http.force=true).
-    </li>
-    <li><strong>Crawl throttling.</strong>
-    You can configure the amount of concurrent threads that a droid can
-distribute to their workers (droids.maxThreads=5) and the delay time
-between the requests (droids.delay.request=500). 
-You can use one of the different delay components:<ul>
-<li>SimpleDelayTimer</li>
-<li>RandomDelayTimer</li>
-<li>GaussianRandomDelayTime</li>
-</ul>
-</li>
-  </ul>
-</section>
-
-<section>
-<title>Architecture</title>
-<p>The following graph shows the basic architecture of droids with the help of the first implementation (defaultCrawler).</p>
-<figure src="images/droidsOverview.png" alt="Overview" width="400" />
-</section>
-
+      <title>Feature list</title>
+      <ul>
+        <li>
+          <strong>Customizable.</strong>
+          Completely controlled by its default.properties which can be
+          easily be overridden by creating a file build.properties and
+          overriding the default properties that are needed.
+        </li>
+        <li>
+          <strong>Spring based.</strong>
+          The properties mentioned above get picked up by the build
+          process which inject them in the spring configuration.
+        </li>
+        <li>
+          <strong>Extensible.</strong>
+          The spring configuration makes usage of the
+          cocoon-configurator and its dynamic registry support (making
+          extending droids a pleasure).
+        </li>
+        <li>
+          <strong>Multi-threaded.</strong>
+          The architecture is that a robot (e.g. DefaultDroid) controls
+          various worker (threads) that are doing the actual work.
+        </li>
+        <li>
+          <strong>Honor robots.txt.</strong>
+          By default droids honors the robot.txt. However you can turn
+          on the hostile mode of a droid
+          (droids.protocol.http.force=true).
+        </li>
+        <li>
+          <strong>Crawl throttling.</strong>
+          You can configure the amount of concurrent threads that a
+          droid can distribute to their workers (droids.maxThreads=5)
+          and the delay time between the requests
+          (droids.delay.request=500). You can use one of the different
+          delay components:
+          <ul>
+            <li>SimpleDelayTimer</li>
+            <li>RandomDelayTimer</li>
+            <li>GaussianRandomDelayTime</li>
+          </ul>
+        </li>
+      </ul>
+    </section>
+    <section>
+      <title>Architecture</title>
+      <p>The following graph shows the basic architecture of droids
+        with the help of the first implementation (defaultCrawler).</p>
+      <figure src="images/droidsOverview.png" alt="Overview"
+        width="400" />
+    </section>
     <section>
       <title>Requirements</title>
-      <warning label="Ant Optional Tasks">Important is that you have as well the optional
-        Ant tasks installed! Otherwise you will not be able to build!</warning>
+      <warning label="Ant Optional Tasks">Important is that you have as well
+        the optional Ant tasks installed! Otherwise you will not be able
+        to build!</warning>
       <ul>
         <li>Apache Ant version 1.7.0 or higher</li>
         <li>JDK 1.5 or higher</li>
       </ul>
     </section>
-
-    <warning
-     label="HEADSUP">!!! Please ONLY crawl localhost NEVER a internet site when
-    you test the first time!!! You will need to adjust the urlfilters to limit
-    loops.</warning>
-
+    <warning label="HEADSUP">!!! Please ONLY crawl localhost NEVER a
+      internet site when you test the first time!!! You will need to
+      adjust the urlfilters to limit loops.</warning>
     <section>
       <title>Links / related projects</title>
-
       <ul>
         <li>
-          <a
-           href="http://lucene.apache.org/nutch/">Nutch web-search software</a>
+          <a href="http://lucene.apache.org/nutch/">Nutch web-search software</a>
         </li>
-
         <li>
-          <a
-           href="http://www.robotstxt.org/wc/robots.html">The Web Robots
-          Pages</a>
+          <a href="http://www.robotstxt.org/wc/robots.html">The Web Robots Pages</a>
         </li>
-
         <li>
           <a
-           href="http://www.andreas-hess.info/programming/webcrawler/index.html">
-          Programming webcrawler</a>
+            href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Programming webcrawler</a>
         </li>
-
         <li>
           <a
-           href="http://www.andreas-hess.info/programming/webcrawler/index.html">
-          Writing a Web Crawler in the Java Programming Language</a>
+            href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Writing a Web Crawler in the Java Programming Language</a>
         </li>
-
         <li>
           <a
-           href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/">
-          Norbert</a>
+            href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/"> Norbert</a>
         </li>
         <li>
-          <a href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
+          <a
+            href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
         </li>
-        <li><a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser.</a>
+        <li>
+          <a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based
+            on the use of a server-side headless mozilla-based browser.</a>
         </li>
       </ul>
     </section>
   </body>
-</document>
-
+</document>
\ No newline at end of file



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org