You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@labs.apache.org by th...@apache.org on 2008/08/24 01:44:36 UTC

svn commit: r688432 [12/12] - in /labs/droids/trunk/docs: ./ api/ api/org/apache/droids/ api/org/apache/droids/api/ api/org/apache/droids/api/class-use/ api/org/apache/droids/class-use/ api/org/apache/droids/delay/ api/org/apache/droids/delay/class-use...

Modified: labs/droids/trunk/docs/changes.html
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.html?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
--- labs/droids/trunk/docs/changes.html (original)
+++ labs/droids/trunk/docs/changes.html Sat Aug 23 16:44:31 2008
@@ -205,7 +205,7 @@
 <a name="introduction" title="Introduction and explanation of symbols"> </a>
 <h2 class="underlined_10">Introduction and explanation of symbols</h2>
 <div class="section">
-<p>Changes are sorted by "type" and then chronologically with the most recent at the top. These symbols denote the various action types:<img alt="add" class="icon" src="images/add.jpg" />=add, <img alt="update" class="icon" src="images/update.jpg" />=update</p>
+<p>Changes are sorted by "type" and then chronologically with the most recent at the top. These symbols denote the various action types:<img alt="add" class="icon" src="images/add.jpg" />=add, <img alt="fix" class="icon" src="images/fix.jpg" />=fix, <img alt="update" class="icon" src="images/update.jpg" />=update</p>
 </div>
 <a name="version_0.0.1" title="Version 0.0.1 (unreleased)"> </a>
 <h2 class="underlined_10">Version 0.0.1 (unreleased)</h2>
@@ -215,6 +215,11 @@
 <div class="section">
 <ul>
 <li>
+<img alt="fix" class="icon" src="images/fix.jpg" />
+        <a href="http://cocoon.apache.org/subprojects/configuration/1.0/spring-configurator/1.0/1400_1_1.html">
+          Dynamic Registry</a> Support is crucial in a multi plugin environment since a
+        plugin can add new components to the registry. Committed by thorsten. See Issue <a href="http://issues.apache.org/jira/browse/LABS-117">LABS-117</a>.</li>
+<li>
 <img alt="update" class="icon" src="images/update.jpg" /> Cleaning up
         old nutch based code with a clean new spring implementation.
        Committed by thorsten.</li>

Modified: labs/droids/trunk/docs/changes.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/changes.rss
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/changes.rss?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
--- labs/droids/trunk/docs/changes.rss (original)
+++ labs/droids/trunk/docs/changes.rss Sat Aug 23 16:44:31 2008
@@ -19,6 +19,18 @@
   
 
     
+      <item><title>MyProject code fix
+          (bug LABS-117)
+        </title><link>http://myproj.mygroup.org//changes.html</link><description>code fix
+        by
+        thorsten
+          (fixes bug LABS-117)
+        
+        :
+        
+        
+          Dynamic Registry Support is crucial in a multi plugin environment since a
+        plugin can add new components to the registry.</description></item>
       <item><title>MyProject code update</title><link>http://myproj.mygroup.org//changes.html</link><description>code update
         by
         thorsten
@@ -43,7 +55,18 @@
     
   
   
-<item><title>MyProject code update</title><link>http://myproj.mygroup.org//changes.html</link><description>code update
+<item><title>MyProject code fix
+          (bug LABS-117)
+        </title><link>http://myproj.mygroup.org//changes.html</link><description>code fix
+        by
+        thorsten
+          (fixes bug LABS-117)
+        
+        :
+        
+        
+          Dynamic Registry Support is crucial in a multi plugin environment since a
+        plugin can add new components to the registry.</description></item><item><title>MyProject code update</title><link>http://myproj.mygroup.org//changes.html</link><description>code update
         by
         thorsten
         :

Modified: labs/droids/trunk/docs/default.html
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/default.html?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
--- labs/droids/trunk/docs/default.html (original)
+++ labs/droids/trunk/docs/default.html Sat Aug 23 16:44:31 2008
@@ -203,6 +203,25 @@
 <p>Remember the build.properties from the build process? This file controlls the
         behaviour of the default crawler. This properties can be overriden via custom calls
         of the different methods in your custom java code.</p>
+<p>The DefaultCrawler offers you a wget style robot. We will now describe the basic
+        functionality. First it will open the initial webpage (with the
+        <span class="codefrag">protocol plugin</span> corresponding to the uri) and tries to extract the
+        outlinks (with the
+        <span class="codefrag">parse plugin</span> for the corresponding content type). Then it will test
+        the links found on the page with the regular expression defined in the regex file (via
+        the
+        <span class="codefrag">filter plugin</span>). If we find new links that are accepted by the filter we
+        then merge them with the queue. The last step is to pass the input stream to the stack of .
+        <span class="codefrag">handler plugins</span>
+</p>
+<p>Like you can see the default droid crawler is build by various different type of
+        plugin. At the moment we support the following types: </p>
+<ul>
+        <li>protocol plugins</li>
+        <li>parse plugins</li>
+        <li>filter plugins</li>
+        <li>handler plugins</li>
+      </ul>
 <a name="Crawling" title="Crawling"> </a>
 <h3 class="underlined_5">Crawling</h3>
 <div class="section">

Modified: labs/droids/trunk/docs/default.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/default.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/develope.html
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/develope.html?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
--- labs/droids/trunk/docs/develope.html (original)
+++ labs/droids/trunk/docs/develope.html Sat Aug 23 16:44:31 2008
@@ -177,7 +177,7 @@
 <a href="#how">How can I use Apache Droids in my application?</a>
 </li>
 <li>
-<a href="#extend">Extend the default configuration.</a>
+<a href="#extend">Extend the default spring configuration.</a>
 </li>
 </ul>
 </div>
@@ -194,9 +194,13 @@
         <strong>SIMPLE</strong> robot that will crawl a web page and send the data to an
         Apache Solr server (poor man's approach of Apache Nutch crawling) and will call it
         <span class="codefrag">"indexer"</span>. </p>
+<p>The resulting classes can be found in
+        <span class="codefrag">$DROIDS_HOME/src/example/java</span> and can be compiled with
+        <span class="codefrag">ant droids.compile-example</span>. This target compiles the examples,
+        creates a jar and copy the resulting jar into the lib directory.</p>
 </div>
-<a name="extend" title="Extend the default configuration."> </a>
-<h2 class="underlined_10">Extend the default configuration.</h2>
+<a name="extend" title="Extend the default spring configuration."> </a>
+<h2 class="underlined_10">Extend the default spring configuration.</h2>
 <div class="section">
 <p>
         In your principal spring configuration you need to import the default spring configuration
@@ -223,6 +227,32 @@
     &lt;/map&gt;
   &lt;/property&gt;
 &lt;/bean&gt;</pre>
+<p>Then we need to configure our <span class="codefrag">indexer</span>.</p>
+<pre class="code">&lt;!-- Indexer --&gt;
+&lt;bean id="indexer" class="org.apache.droids.examples.IndexerCrawler"&gt;
+  &lt;property name="core" ref="org.apache.droids.Core"/&gt;
+  &lt;property name="queue" ref="org.apache.droids.queue.Simple"/&gt;
+  &lt;property name="maxThreads" value="@droids.maxThreads@"/&gt;
+  &lt;property name="url" value="@droids.initial.url@"/&gt;
+  &lt;property name="updateUrl" value="http://localhost:8983/solr/update"/&gt;
+&lt;/bean&gt;</pre>
+<p>The variables within @@ are replaced while building the jar with ant. The actual
+        value is defiened either by the properties in your build.properties or the
+        default.properties.</p>
+<p>Now we need to wrap up our droid in defining our new handler.</p>
+<pre class="code">&lt;bean id="org.apache.droids.helper.factories.HandlerFactory"
+  class="org.apache.droids.helper.factories.HandlerFactory"&gt;
+  &lt;property name="map"&gt;
+    &lt;map&gt;
+      &lt;entry key="solr" value-ref="org.apache.droids.handle.Solr"/&gt;
+    &lt;/map&gt;
+  &lt;/property&gt;
+&lt;/bean&gt;
+
+&lt;!-- Handler --&gt;
+&lt;bean id="org.apache.droids.handle.Solr" class="org.apache.droids.handle.Solr"&gt;
+  &lt;property name="updateUrl" value="http://localhost:8983/solr/update"/&gt;
+&lt;/bean&gt;</pre>
 </div>
 <!--+ |end content-main +-->
 </div>

Modified: labs/droids/trunk/docs/develope.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/develope.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/index.html
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/index.html?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
--- labs/droids/trunk/docs/index.html (original)
+++ labs/droids/trunk/docs/index.html Sat Aug 23 16:44:31 2008
@@ -257,6 +257,12 @@
           <a href="http://svn.apache.org/repos/asf/httpcomponents/norobots-rfc/trunk/src/java/org/apache/http/norobots/">
           Norbert</a>
         </li>
+        <li>
+          <a href="http://www.ajaxprojects.com/ajax/newsdetails.php?itemid=178">Crawling AJAX</a>
+        </li>
+        <li>
+<a href="http://simile.mit.edu/wiki/Crowbar">Crowbar is a web scraping environment based on the use of a server-side headless mozilla-based browser.</a>
+        </li>
       </ul>
 </div>
 <!--+ |end content-main +-->

Modified: labs/droids/trunk/docs/index.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/index.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/install.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/install.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/linkmap.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/linkmap.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.

Modified: labs/droids/trunk/docs/todo.pdf
URL: http://svn.apache.org/viewvc/labs/droids/trunk/docs/todo.pdf?rev=688432&r1=688431&r2=688432&view=diff
==============================================================================
Binary files - no diff available.



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@labs.apache.org
For additional commands, e-mail: commits-help@labs.apache.org