You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@labs.apache.org by th...@apache.org on 2008/09/04 13:25:49 UTC
svn commit: r691971 -
/labs/droids/trunk/src/documentation/content/xdocs/index.xml
Author: thorsten
Date: Thu Sep 4 04:25:48 2008
New Revision: 691971
URL: http://svn.apache.org/viewvc?rev=691971&view=rev
Log:
white noise - formating changes
Modified:
labs/droids/trunk/src/documentation/content/xdocs/index.xml
Modified: labs/droids/trunk/src/documentation/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/labs/droids/trunk/src/documentation/content/xdocs/index.xml?rev=691971&r1=691970&r2=691971&view=diff
==============================================================================
--- labs/droids/trunk/src/documentation/content/xdocs/index.xml (original)
+++ labs/droids/trunk/src/documentation/content/xdocs/index.xml Thu Sep 4 04:25:48 2008
@@ -1,159 +1,145 @@
<?xml version="1.0" encoding="UTF-8"?>
-<!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
--->
+ <!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version
+ 2.0 (the "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+ http://www.apache.org/licenses/LICENSE-2.0 Unless required by
+ applicable law or agreed to in writing, software distributed under
+ the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
+ OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and
+ limitations under the License.
+ -->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
"http://forrest.apache.org/dtd/document-v20.dtd">
<document>
<header>
<title>Welcome to Apache Droids</title>
</header>
-
<body>
- <section
- id="what">
+ <section id="what">
<title>What is this?</title>
-
- <p>Droids aims to be an intelligent standalone robot framework that
- allows to create and extend existing droids (robots). In the future
- it will offer an administration application to manage and controll
- the different droids.</p>
-
- <p>Droids makes it very easy to extend existing robots or write a new one
- from scratch, which can automatically seek out relevant online information
- based on the user's specifications.</p>
+ <p>Droids aims to be an intelligent standalone robot
+ framework that allows to create and extend existing droids
+ (robots). In the future it will offer an administration
+ application to manage and controll the different droids.</p>
+ <p>Droids makes it very easy to extend existing robots or
+ write a new one from scratch, which can automatically seek out
+ relevant online information based on the user's specifications.</p>
</section>
-
<section>
<title>Why was it created?</title>
-
- <p>Mainly because of personal curiosity: The background of this work is
- that Cocoon trunk does not provide a crawler anymore and Forrest is based
- on it, meaning we cannot update anymore till we found a crawler
- replacement. Getting more involved in Solr and Nutch I see request for a
- generic standalone crawler.</p>
-
- <p>For the first core I took nutch, ripped out and modified the plugin/extension
- framework. However the second version were not based on it anymore but was using
- Spring instead. The main reason is that Spring has become a standard and helps to make
+ <p>Mainly because of personal curiosity: The background of
+ this work is that Cocoon trunk does not provide a crawler
+ anymore and Forrest is based on it, meaning we cannot update
+ anymore till we found a crawler replacement. Getting more
+ involved in Solr and Nutch I see request for a generic
+ standalone crawler.</p>
+ <p>For the first core I took nutch, ripped out and modified the
+ plugin/extension framework. However the second version were not
+ based on it anymore but was using Spring instead. The main
+ reason is that Spring has become a standard and helps to make
Droids as extensible as possible.</p>
</section>
-
<section>
- <title>Feature list</title>
- <ul>
- <li>
- <strong>Customizable.</strong>
- Completely controlled by its default.properties which can be
- easily be overridden by creating a file build.properties and
- overriding the default properties that are needed.
- </li>
- <li>
- <strong>Spring based.</strong>
- The properties mentioned above get picked up by the build process
- which inject them in the spring configuration.
- </li>
- <li>
- <strong>Extensible.</strong>
- The spring configuration makes usage of the cocoon-configurator and
- its dynamic registry support (making extending droids a pleasure).
- </li>
- <li>
- <strong>Multi-threaded.</strong>
- The architecture is that a robot (e.g. DefaultDroid) controls
- various worker (threads) that are doing the actual work.
- </li>
- <li>
- <strong>Honor robots.txt.</strong>
- By default droids honors the robot.txt. However you can turn on the
-hostile mode of a droid (droids.protocol.http.force=true).
- </li>
- <li><strong>Crawl throttling.</strong>
- You can configure the amount of concurrent threads that a droid can
-distribute to their workers (droids.maxThreads=5) and the delay time
-between the requests (droids.delay.request=500).
-You can use one of the different delay components:<ul>
-<li>SimpleDelayTimer</li>
-<li>RandomDelayTimer</li>
-<li>GaussianRandomDelayTime</li>
-</ul>
-</li>
- </ul>
-</section>
-
-<section>
-<title>Architecture</title>
-<p>The following graph shows the basic architecture of droids with the help of the first implementation (defaultCrawler).</p>
-<figure src="images/droidsOverview.png" alt="Overview" width="400" />
-</section>
-
+ <title>Feature list</title>
+ <ul>
+ <li>
+ <strong>Customizable.</strong>
+ Completely controlled by its default.properties which can be
+ easily be overridden by creating a file build.properties and
+ overriding the default properties that are needed.
+ </li>
+ <li>
+ <strong>Spring based.</strong>
+ The properties mentioned above get picked up by the build
+ process which inject them in the spring configuration.
+ </li>
+ <li>
+ <strong>Extensible.</strong>
+ The spring configuration makes usage of the
+ cocoon-configurator and its dynamic registry support (making
+ extending droids a pleasure).
+ </li>
+ <li>
+ <strong>Multi-threaded.</strong>
+ The architecture is that a robot (e.g. DefaultDroid) controls
+ various worker (threads) that are doing the actual work.
+ </li>
+ <li>
+ <strong>Honor robots.txt.</strong>
+ By default droids honors the robot.txt. However you can turn
+ on the hostile mode of a droid
+ (droids.protocol.http.force=true).
+ </li>
+ <li>
+ <strong>Crawl throttling.</strong>
+ You can configure the amount of concurrent threads that a
+ droid can distribute to their workers (droids.maxThreads=5)
+ and the delay time between the requests
+ (droids.delay.request=500). You can use one of the different
+ delay components:
+ <ul>
+ <li>SimpleDelayTimer</li>
+ <li>RandomDelayTimer</li>
+ <li>GaussianRandomDelayTime</li>
+ </ul>
+ </li>
+ </ul>
+ </section>
+ <section>
+ <title>Architecture</title>
+ <p>The following graph shows the basic architecture of droids
+ with the help of the first implementation (defaultCrawler).</p>
+ <figure src="images/droidsOverview.png" alt="Overview"
+ width="400" />
+ </section>
<section>
<title>Requirements</title>
- <warning label="Ant Optional Tasks">Important is that you have as well the optional
- Ant tasks installed! Otherwise you will not be able to build!</warning>
+ <warning label="Ant Optional Tasks">Important is that you have as well
+ the optional Ant tasks installed! Otherwise you will not be able
+ to build!</warning>
<ul>
<li>Apache Ant version 1.7.0 or higher</li>
<li>JDK 1.5 or higher</li>
</ul>
</section>
-
- <warning
- label="HEADSUP">!!! Please ONLY crawl localhost NEVER a internet site when
- you test the first time!!! You will need to adjust the urlfilters to limit
- loops.</warning>
-
+ <warning label="HEADSUP">!!! Please ONLY crawl localhost NEVER a
+ internet site when you test the first time!!! You will need to
+ adjust the urlfilters to limit loops.</warning>
<section>
<title>Links / related projects</title>
-
<ul>
<li>
- <a
- href="http://lucene.apache.org/nutch/">Nutch web-search software</a>
+ <a href="http://lucene.apache.org/nutch/">Nutch web-search software</a>
</li>
-
<li>
- <a
- href="http://www.robotstxt.org/wc/robots.html">The Web Robots
- Pages</a>
+ <a href="http://www.robotstxt.org/wc/robots.html">The Web Robots Pages</a>
</li>
-
<li>
<a
- href="http://www.andreas-hess.info/programming/webcrawler/index.html">
- Programming webcrawler</a>
+ href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Programming webcrawler</a>
</li>
-
<li>
<a
- href="http://www.andreas-hess.info/programming/webcrawler/index.html">
- Writing a Web Crawler in the Java Programming Language</a>
+ href="http://www.andreas-hess.info/programming/webcrawler/index.html"> Writing a Web Crawler in the Java Programming Language</a>
</li>
-
<li>
<a
- href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/">
- Norbert</a>
+ href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/"> Norbert</a>
</li>
<li>
- <a href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
+ <a
+ href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
</li>
- <li><a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser.</a>
+ <li>
+ <a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based
+ on the use of a server-side headless mozilla-based browser.</a>
</li>
</ul>
</section>
</body>
-</document>
-
+</document>
\ No newline at end of file
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org