You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@labs.apache.org by th...@apache.org on 2008/09/05 00:34:45 UTC

svn commit: r692285 - in /labs/droids/trunk/docs: changes.dispatcher.css changes.pdf default.pdf develope.dispatcher.css develope.pdf images/droidsOverview.png index.html index.pdf install.pdf linkmap.pdf todo.dispatcher.css todo.pdf

Author: thorsten
Date: Thu Sep  4 15:34:44 2008
New Revision: 692285

URL: http://svn.apache.org/viewvc?rev=692285&view=rev
Log:
adding more documentation

Added:
    labs/droids/trunk/docs/images/droidsOverview.png   (with props)
Modified:
    labs/droids/trunk/docs/changes.dispatcher.css
    labs/droids/trunk/docs/changes.pdf
    labs/droids/trunk/docs/default.pdf
    labs/droids/trunk/docs/develope.dispatcher.css
    labs/droids/trunk/docs/develope.pdf
    labs/droids/trunk/docs/index.html
    labs/droids/trunk/docs/index.pdf
    labs/droids/trunk/docs/install.pdf
    labs/droids/trunk/docs/linkmap.pdf
    labs/droids/trunk/docs/todo.dispatcher.css
    labs/droids/trunk/docs/todo.pdf

Modified: labs/droids/trunk/docs/changes.dispatcher.css
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.dispatcher.css?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/changes.dispatcher.css (original)
+++ labs/droids/trunk/docs/changes.dispatcher.css Thu Sep  4 15:34:44 2008
@@ -3,6 +3,7 @@
   
   
   
+
   
 /* branding-theme-profiler-theme: Pelt */ 
 #header .round-top-left-small {

Modified: labs/droids/trunk/docs/changes.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/default.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/default.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/develope.dispatcher.css
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/develope.dispatcher.css?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/develope.dispatcher.css (original)
+++ labs/droids/trunk/docs/develope.dispatcher.css Thu Sep  4 15:34:44 2008
@@ -3,6 +3,7 @@
   
   
   
+
   
 /* branding-theme-profiler-theme: Pelt */ 
 #header .round-top-left-small {

Modified: labs/droids/trunk/docs/develope.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/develope.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.

Added: labs/droids/trunk/docs/images/droidsOverview.png
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/images/droidsOverview.png?rev=692285&view=auto
==============================================================================
Binary file - no diff available.

Propchange: labs/droids/trunk/docs/images/droidsOverview.png
------------------------------------------------------------------------------
    svn:mime-type = image/png

Modified: labs/droids/trunk/docs/index.html
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/index.html?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/index.html (original)
+++ labs/droids/trunk/docs/index.html Thu Sep  4 15:34:44 2008
@@ -178,6 +178,12 @@
 <a href="#Why+was+it+created%3F">Why was it created?</a>
 </li>
 <li>
+<a href="#Feature+list">Feature list</a>
+</li>
+<li>
+<a href="#Architecture">Architecture</a>
+</li>
+<li>
 <a href="#Requirements">Requirements</a>
 </li>
 <li>
@@ -190,34 +196,93 @@
 <a name="what" title="What is this?"> </a>
 <h2 class="underlined_10">What is this?</h2>
 <div class="section">
-<p>Droids aims to be an intelligent standalone robot framework that
-      allows to create and extend existing droids (robots). In the future 
-      it will offer an administration application to manage and controll 
-      the different droids.</p>
-<p>Droids makes it very easy to extend existing robots or write a new one
-      from scratch, which can automatically seek out relevant online information 
-       based on the user's specifications.</p>
+<p>Droids aims to be an intelligent standalone robot
+        framework that allows to create and extend existing droids
+        (robots). In the future it will offer an administration
+        application to manage and controll the different droids.</p>
+<p>Droids makes it very easy to extend existing robots or
+        write a new one from scratch, which can automatically seek out
+        relevant online information based on the user's specifications.</p>
 </div>
 <a name="Why+was+it+created%3F" title="Why was it created?"> </a>
 <h2 class="underlined_10">Why was it created?</h2>
 <div class="section">
-<p>Mainly because of personal curiosity: The background of this work is
-      that Cocoon trunk does not provide a crawler anymore and Forrest is based
-      on it, meaning we cannot update anymore till we found a crawler
-      replacement. Getting more involved in Solr and Nutch I see request for a
-      generic standalone crawler.</p>
-<p>For the first core I took nutch, ripped out and modified the plugin/extension
-        framework. However the second version were not based on it anymore but was using
-        Spring instead. The main reason is that Spring has become a standard and helps to make
+<p>Mainly because of personal curiosity: The background of
+        this work is that Cocoon trunk does not provide a crawler
+        anymore and Forrest is based on it, meaning we cannot update
+        anymore till we found a crawler replacement. Getting more
+        involved in Solr and Nutch I see request for a generic
+        standalone crawler.</p>
+<p>For the first core I took nutch, ripped out and modified the
+        plugin/extension framework. However the second version were not
+        based on it anymore but was using Spring instead. The main
+        reason is that Spring has become a standard and helps to make
         Droids as extensible as possible.</p>
 </div>
+<a name="Feature+list" title="Feature list"> </a>
+<h2 class="underlined_10">Feature list</h2>
+<div class="section">
+<ul>
+        <li>
+          <strong>Customizable.</strong>
+          Completely controlled by its default.properties which can be
+          easily be overridden by creating a file build.properties and
+          overriding the default properties that are needed.
+        </li>
+        <li>
+          <strong>Spring based.</strong>
+          The properties mentioned above get picked up by the build
+          process which inject them in the spring configuration.
+        </li>
+        <li>
+          <strong>Extensible.</strong>
+          The spring configuration makes usage of the
+          cocoon-configurator and its dynamic registry support (making
+          extending droids a pleasure).
+        </li>
+        <li>
+          <strong>Multi-threaded.</strong>
+          The architecture is that a robot (e.g. DefaultDroid) controls
+          various worker (threads) that are doing the actual work.
+        </li>
+        <li>
+          <strong>Honor robots.txt.</strong>
+          By default droids honors the robot.txt. However you can turn
+          on the hostile mode of a droid
+          (droids.protocol.http.force=true).
+        </li>
+        <li>
+          <strong>Crawl throttling.</strong>
+          You can configure the amount of concurrent threads that a
+          droid can distribute to their workers (droids.maxThreads=5)
+          and the delay time between the requests
+          (droids.delay.request=500). You can use one of the different
+          delay components:
+          <ul>
+            <li>SimpleDelayTimer</li>
+            <li>RandomDelayTimer</li>
+            <li>GaussianRandomDelayTime</li>
+          </ul>
+        </li>
+      </ul>
+</div>
+<a name="Architecture" title="Architecture"> </a>
+<h2 class="underlined_10">Architecture</h2>
+<div class="section">
+<p>The following graph shows the basic architecture of droids
+        with the help of the first implementation (defaultCrawler).</p>
+<div style="text-align: center;">
+<img alt="Overview" class="figure" src="images/droidsOverview.png" width="400" />
+</div>
+</div>
 <a name="Requirements" title="Requirements"> </a>
 <h2 class="underlined_10">Requirements</h2>
 <div class="section">
 <div class="warning">
 <div class="label">Ant Optional Tasks</div>
-<div class="content">Important is that you have as well the optional
-        Ant tasks installed! Otherwise you will not be able to build!</div>
+<div class="content">Important is that you have as well
+        the optional Ant tasks installed! Otherwise you will not be able
+        to build!</div>
 </div>
 <ul>
         <li>Apache Ant version 1.7.0 or higher</li>
@@ -226,9 +291,9 @@
 </div>
 <div class="warning">
 <div class="label">HEADSUP</div>
-<div class="content">!!! Please ONLY crawl localhost NEVER a internet site when
-    you test the first time!!! You will need to adjust the urlfilters to limit
-    loops.</div>
+<div class="content">!!! Please ONLY crawl localhost NEVER a
+      internet site when you test the first time!!! You will need to
+      adjust the urlfilters to limit loops.</div>
 </div>
 <a name="Links+%2F+related+projects" title="Links / related projects"> </a>
 <h2 class="underlined_10">Links / related projects</h2>
@@ -237,31 +302,24 @@
         <li>
           <a href="http://lucene.apache.org/nutch/">Nutch web-search software</a>
         </li>
-
         <li>
-          <a href="http://www.robotstxt.org/wc/robots.html">The Web Robots
-          Pages</a>
+          <a href="http://www.robotstxt.org/wc/robots.html">The Web Robots Pages</a>
         </li>
-
         <li>
-          <a href="http://www.andreas-hess.info/programming/webcrawler/index.html">
-          Programming webcrawler</a>
+          <a href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Programming webcrawler</a>
         </li>
-
         <li>
-          <a href="http://www.andreas-hess.info/programming/webcrawler/index.html">
-          Writing a Web Crawler in the Java Programming Language</a>
+          <a href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Writing a Web Crawler in the Java Programming Language</a>
         </li>
-
         <li>
-          <a href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/">
-          Norbert</a>
+          <a href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/"> Norbert</a>
         </li>
         <li>
           <a href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
         </li>
         <li>
-<a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser.</a>
+          <a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based
+            on the use of a server-side headless mozilla-based browser.</a>
         </li>
       </ul>
 </div>

Modified: labs/droids/trunk/docs/index.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/index.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/install.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/install.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/linkmap.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/linkmap.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/todo.dispatcher.css
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/todo.dispatcher.css?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/todo.dispatcher.css (original)
+++ labs/droids/trunk/docs/todo.dispatcher.css Thu Sep  4 15:34:44 2008
@@ -3,6 +3,7 @@
   
   
   
+
   
 /* branding-theme-profiler-theme: Pelt */ 
 #header .round-top-left-small {

Modified: labs/droids/trunk/docs/todo.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/todo.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org