You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@labs.apache.org by th...@apache.org on 2008/09/05 00:34:45 UTC
svn commit: r692285 - in /labs/droids/trunk/docs: changes.dispatcher.css
changes.pdf default.pdf develope.dispatcher.css develope.pdf
images/droidsOverview.png index.html index.pdf install.pdf linkmap.pdf
todo.dispatcher.css todo.pdf
Author: thorsten
Date: Thu Sep 4 15:34:44 2008
New Revision: 692285
URL: http://svn.apache.org/viewvc?rev=692285&view=rev
Log:
adding more documentation
Added:
labs/droids/trunk/docs/images/droidsOverview.png (with props)
Modified:
labs/droids/trunk/docs/changes.dispatcher.css
labs/droids/trunk/docs/changes.pdf
labs/droids/trunk/docs/default.pdf
labs/droids/trunk/docs/develope.dispatcher.css
labs/droids/trunk/docs/develope.pdf
labs/droids/trunk/docs/index.html
labs/droids/trunk/docs/index.pdf
labs/droids/trunk/docs/install.pdf
labs/droids/trunk/docs/linkmap.pdf
labs/droids/trunk/docs/todo.dispatcher.css
labs/droids/trunk/docs/todo.pdf
Modified: labs/droids/trunk/docs/changes.dispatcher.css
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.dispatcher.css?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/changes.dispatcher.css (original)
+++ labs/droids/trunk/docs/changes.dispatcher.css Thu Sep 4 15:34:44 2008
@@ -3,6 +3,7 @@
+
/* branding-theme-profiler-theme: Pelt */
#header .round-top-left-small {
Modified: labs/droids/trunk/docs/changes.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
Modified: labs/droids/trunk/docs/default.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/default.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
Modified: labs/droids/trunk/docs/develope.dispatcher.css
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/develope.dispatcher.css?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/develope.dispatcher.css (original)
+++ labs/droids/trunk/docs/develope.dispatcher.css Thu Sep 4 15:34:44 2008
@@ -3,6 +3,7 @@
+
/* branding-theme-profiler-theme: Pelt */
#header .round-top-left-small {
Modified: labs/droids/trunk/docs/develope.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/develope.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
Added: labs/droids/trunk/docs/images/droidsOverview.png
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/images/droidsOverview.png?rev=692285&view=auto
==============================================================================
Binary file - no diff available.
Propchange: labs/droids/trunk/docs/images/droidsOverview.png
------------------------------------------------------------------------------
svn:mime-type = image/png
Modified: labs/droids/trunk/docs/index.html
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/index.html?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/index.html (original)
+++ labs/droids/trunk/docs/index.html Thu Sep 4 15:34:44 2008
@@ -178,6 +178,12 @@
<a href="#Why+was+it+created%3F">Why was it created?</a>
</li>
<li>
+<a href="#Feature+list">Feature list</a>
+</li>
+<li>
+<a href="#Architecture">Architecture</a>
+</li>
+<li>
<a href="#Requirements">Requirements</a>
</li>
<li>
@@ -190,34 +196,93 @@
<a name="what" title="What is this?"> </a>
<h2 class="underlined_10">What is this?</h2>
<div class="section">
-<p>Droids aims to be an intelligent standalone robot framework that
- allows to create and extend existing droids (robots). In the future
- it will offer an administration application to manage and controll
- the different droids.</p>
-<p>Droids makes it very easy to extend existing robots or write a new one
- from scratch, which can automatically seek out relevant online information
- based on the user's specifications.</p>
+<p>Droids aims to be an intelligent standalone robot
+ framework that allows to create and extend existing droids
+ (robots). In the future it will offer an administration
+ application to manage and controll the different droids.</p>
+<p>Droids makes it very easy to extend existing robots or
+ write a new one from scratch, which can automatically seek out
+ relevant online information based on the user's specifications.</p>
</div>
<a name="Why+was+it+created%3F" title="Why was it created?"> </a>
<h2 class="underlined_10">Why was it created?</h2>
<div class="section">
-<p>Mainly because of personal curiosity: The background of this work is
- that Cocoon trunk does not provide a crawler anymore and Forrest is based
- on it, meaning we cannot update anymore till we found a crawler
- replacement. Getting more involved in Solr and Nutch I see request for a
- generic standalone crawler.</p>
-<p>For the first core I took nutch, ripped out and modified the plugin/extension
- framework. However the second version were not based on it anymore but was using
- Spring instead. The main reason is that Spring has become a standard and helps to make
+<p>Mainly because of personal curiosity: The background of
+ this work is that Cocoon trunk does not provide a crawler
+ anymore and Forrest is based on it, meaning we cannot update
+ anymore till we found a crawler replacement. Getting more
+ involved in Solr and Nutch I see request for a generic
+ standalone crawler.</p>
+<p>For the first core I took nutch, ripped out and modified the
+ plugin/extension framework. However the second version were not
+ based on it anymore but was using Spring instead. The main
+ reason is that Spring has become a standard and helps to make
Droids as extensible as possible.</p>
</div>
+<a name="Feature+list" title="Feature list"> </a>
+<h2 class="underlined_10">Feature list</h2>
+<div class="section">
+<ul>
+ <li>
+ <strong>Customizable.</strong>
+ Completely controlled by its default.properties which can be
+ easily be overridden by creating a file build.properties and
+ overriding the default properties that are needed.
+ </li>
+ <li>
+ <strong>Spring based.</strong>
+ The properties mentioned above get picked up by the build
+ process which inject them in the spring configuration.
+ </li>
+ <li>
+ <strong>Extensible.</strong>
+ The spring configuration makes usage of the
+ cocoon-configurator and its dynamic registry support (making
+ extending droids a pleasure).
+ </li>
+ <li>
+ <strong>Multi-threaded.</strong>
+ The architecture is that a robot (e.g. DefaultDroid) controls
+ various worker (threads) that are doing the actual work.
+ </li>
+ <li>
+ <strong>Honor robots.txt.</strong>
+ By default droids honors the robot.txt. However you can turn
+ on the hostile mode of a droid
+ (droids.protocol.http.force=true).
+ </li>
+ <li>
+ <strong>Crawl throttling.</strong>
+ You can configure the amount of concurrent threads that a
+ droid can distribute to their workers (droids.maxThreads=5)
+ and the delay time between the requests
+ (droids.delay.request=500). You can use one of the different
+ delay components:
+ <ul>
+ <li>SimpleDelayTimer</li>
+ <li>RandomDelayTimer</li>
+ <li>GaussianRandomDelayTime</li>
+ </ul>
+ </li>
+ </ul>
+</div>
+<a name="Architecture" title="Architecture"> </a>
+<h2 class="underlined_10">Architecture</h2>
+<div class="section">
+<p>The following graph shows the basic architecture of droids
+ with the help of the first implementation (defaultCrawler).</p>
+<div style="text-align: center;">
+<img alt="Overview" class="figure" src="images/droidsOverview.png" width="400" />
+</div>
+</div>
<a name="Requirements" title="Requirements"> </a>
<h2 class="underlined_10">Requirements</h2>
<div class="section">
<div class="warning">
<div class="label">Ant Optional Tasks</div>
-<div class="content">Important is that you have as well the optional
- Ant tasks installed! Otherwise you will not be able to build!</div>
+<div class="content">Important is that you have as well
+ the optional Ant tasks installed! Otherwise you will not be able
+ to build!</div>
</div>
<ul>
<li>Apache Ant version 1.7.0 or higher</li>
@@ -226,9 +291,9 @@
</div>
<div class="warning">
<div class="label">HEADSUP</div>
-<div class="content">!!! Please ONLY crawl localhost NEVER a internet site when
- you test the first time!!! You will need to adjust the urlfilters to limit
- loops.</div>
+<div class="content">!!! Please ONLY crawl localhost NEVER a
+ internet site when you test the first time!!! You will need to
+ adjust the urlfilters to limit loops.</div>
</div>
<a name="Links+%2F+related+projects" title="Links / related projects"> </a>
<h2 class="underlined_10">Links / related projects</h2>
@@ -237,31 +302,24 @@
<li>
<a href="http://lucene.apache.org/nutch/">Nutch web-search software</a>
</li>
-
<li>
- <a href="http://www.robotstxt.org/wc/robots.html">The Web Robots
- Pages</a>
+ <a href="http://www.robotstxt.org/wc/robots.html">The Web Robots Pages</a>
</li>
-
<li>
- <a href="http://www.andreas-hess.info/programming/webcrawler/index.html">
- Programming webcrawler</a>
+ <a href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Programming webcrawler</a>
</li>
-
<li>
- <a href="http://www.andreas-hess.info/programming/webcrawler/index.html">
- Writing a Web Crawler in the Java Programming Language</a>
+ <a href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Writing a Web Crawler in the Java Programming Language</a>
</li>
-
<li>
- <a href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/">
- Norbert</a>
+ <a href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/"> Norbert</a>
</li>
<li>
<a href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
</li>
<li>
-<a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser.</a>
+ <a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based
+ on the use of a server-side headless mozilla-based browser.</a>
</li>
</ul>
</div>
Modified: labs/droids/trunk/docs/index.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/index.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
Modified: labs/droids/trunk/docs/install.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/install.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
Modified: labs/droids/trunk/docs/linkmap.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/linkmap.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
Modified: labs/droids/trunk/docs/todo.dispatcher.css
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/todo.dispatcher.css?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
--- labs/droids/trunk/docs/todo.dispatcher.css (original)
+++ labs/droids/trunk/docs/todo.dispatcher.css Thu Sep 4 15:34:44 2008
@@ -3,6 +3,7 @@
+
/* branding-theme-profiler-theme: Pelt */
#header .round-top-left-small {
Modified: labs/droids/trunk/docs/todo.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/todo.pdf?rev=692285&r1=692284&r2=692285&view=diff
==============================================================================
Binary files - no diff available.
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org