You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by jo...@apache.org on 2009/09/18 10:27:35 UTC

svn commit: r816529 - /incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.camel.driver.xml

Author: joern
Date: Fri Sep 18 08:27:34 2009
New Revision: 816529

URL: http://svn.apache.org/viewvc?rev=816529&view=rev
Log:
UIMA-1587 The uimaj-as-camel documentation should be extended

Modified:
    incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.camel.driver.xml

Modified: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.camel.driver.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.camel.driver.xml?rev=816529&r1=816528&r2=816529&view=diff
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.camel.driver.xml (original)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.camel.driver.xml Fri Sep 18 08:27:34 2009
@@ -24,24 +24,133 @@
 -->
 <chapter id="ugr.async.camel.driver">
   <title>Asynchronous Scaleout Camel Driver</title>
-  <section id="ugr.async.camel.driver.component">
+  <section id="ugr.async.camel.driver.component.overview">
     <title>Overview</title>
     <para>
-    The Asynchronous Scaleout Camel Driver is an Apache Camel component which sends the
-    camel message body to a specified UIMA AS processing pipeline. There are basically two usage scenarios.
-    The Camel Driver can be used to drive the processing of a UIMA AS cluster in which each server instance
-    runs a cas multiplier to fetch the actual document from a database. In this scenario the camel route only
-    sends an ID of the document to the cas multiplier which does the actual fetching of the document.
-    In the second usage scenario the Camel Driver is used to send a document in a one way fashion to a
-    UIMA AS processing pipeline which then takes care of processing it.
+    Apache Camel is an integration framework based on the Enterprise Integration Patterns
+    which uses routes for rule-based message routing and mediation. The camel project
+    has a large number of components which provide access to a wide variety of
+    technologies and are the building blocks of the routes. The Asynchronous Scaleout Camel Driver
+    is a component to integrate UIMA AS into Camel.
+    </para>
+    </section> 
+    <section id="ugr.async.camel.driver.component">
+    <title>How does it work?</title>
+    <para>
+    The Asynchronous Scaleout Camel Driver sends the camel message body (without headers) to
+    a specified UIMA AS processing pipeline. Accessing the analysis results which
+    are written into the CAS is not possible from a camel route.
+    There are basically two usage scenarios. The Camel Driver can be used to drive the processing
+    of a UIMA AS cluster in which each server instance runs a cas multiplier to fetch the actual
+    document from a database. In this scenario the camel route only sends an ID of the document to
+    the cas multiplier which does the actual fetching of the document. In the second usage scenario
+    the Camel Driver is used to send a document in a one way fashion to a UIMA AS processing
+    pipeline which then takes care of processing it. In case an error occurs inside the processing
+    pipeline the exception is forwarded to camel and set on the message as response. Error handling
+    is described in the
+    <ulink url="http://camel.apache.org/error-handling-in-camel.html">Error handling in Camel</ulink>
+    documentation.
+    </para>
+    <para>
+    The camel driver expects a string message body, if it is not of the type string
+    it might be automatically converted by camel type converters. The string message
+    body is set as document text on the CAS. In an Analysis Engine which receives such a
+    CAS CAS.getDocumentText() should be called ot retrieve the string.
     </para>
   </section>
   <section id="ugr.async.camel.driver.uri.format">
     <title>URI Format</title>
-    <para>The Camel Driver has the following URI format
-     <programlisting>uimadriver:authority?queue="nameofqueue" </programlisting>
+    <para>The Asynchronous Scaleout Camel Driver is configured with a configuration string. The 
+    configuration string must contain the broker location and name of the JMS queue used to
+    communicate with UIMA AS. It has the following format
+    <programlisting>uimadriver:brokerURL?queue="nameofqueue" </programlisting>
     which could for example be specified as
-     <programlisting>uimadriver:tcp://localhost:61616?queue=TextAnalysisQueue</programlisting>.
+    <programlisting>uimadriver:tcp://localhost:61616?queue=TextAnalysisQueue</programlisting>.
+    </para>
+  </section>
+  <section id="ugr.async.camel.driver.sample">
+    <title>Sample</title>
+    <para>Camel enables a developer to create quickly all kinds of applications out of
+    existing and custom components. The sample demonstrates how UIMA AS can now be integrated
+    with other technologies. Readers which are not familiar with camel should read
+    the <ulink url="http://camel.apache.org/book-getting-started.html">Getting Started</ulink>
+    chapter in the camel documentation.</para>
+    <para> 
+    First a simple sample. A user wants to test a UIMA AS processing pipeline, for
+    the test a set of test documents should be processed. The plain text test documents
+    are located in a folder "/test-data". A camel route for this defined with
+    <ulink url="http://camel.apache.org/dsl.html">Java DSL</ulink> could look like this:
+    <programlisting>
+     from("file://test-data?noop=true").
+     to("uimadriver:tcp://localhost:61616?queue=TextAnalysisQueue");
+    </programlisting>
+    In the route above the file component sends a message for every file to the
+    uimadriver component. The message contains a reference to the file but not the
+    content of the file itself. The uimadriver component expects a message with string body as input.
+    An internal camel type converter will read in the bytes of the file, decodes them
+    into characters with the default platform encoding and then creates a string object
+    which is passed to the uimadriver component. The uimadriver will then put the 
+    string into a CAS and sends it via the UIMA AS Client API to a processing pipeline,
+    results from the returned CAS cannot be retrieved in a camel route.
     </para>
+    <para>
+    A more complex sample. A web site has an area where people can upload pictures. 
+    The pictures must be checked for appropriate content. The pictures are pushed to
+    the site via http, stored in a database and assigned to the human controllers to
+    classify them either as appropriate or non-appropriate. That is achieved with an
+    existing camel route and a servlet which receives the images and sends them to the camel route.
+    <programlisting>
+    from("direct:start").
+    to("imagewriter").
+    to("jms:queue:HumanPictureAnalysisQueue");
+    </programlisting>
+    The message containing the image is received by the direct:start endpoint, 
+    the image is written to a database and replaced with a string identifier
+    by the "imagewriter" component, in the last step the camel jms component posts
+    the identifier on a JMS queue to notify the reception of a new image. The notification
+    is received by a client tool which the human controllers use to classify 
+    an image.
+    </para>
+    <para>
+    To unload the human controllers a system should automatically classify the pictures and
+    only assign questionable cases to human controllers. The automatic classification is
+    done by an UIMA Analysis Engine. The AE can mark an image with one of three classes
+    appropriate, non-appropriate and unknown. In the case of unknown the AE is not confident
+    enough which of the first two classes is correct. To be scalable the processing pipeline is hosted
+    by UIMA AS and contains three AEs, one to fetch the image from
+    the database, a classification AE and an AE to write the class of the image back to the database.
+    The first AE is typical a cas multiplier and receives a CAS which only contains the string identifier
+    but not the actual image. The cas muliplier uses the identifer to fetch the image from
+    the database and outputs a new CAS with the actual image. The Camel route blocks until the CAS is processed
+    by the following two AEs and depending on the class in the database the picture is assigned to a human
+    controller or not.
+    <programlisting>
+    from("direct:start").
+    to("imagewriter").
+    to("uimadriver:tcp://localhost:61616?queue=UimaPictureAnalysisQueue").
+    to("class-retriever").
+    // filters messages with class appropriate and non-appropriate
+    filter(header("picture-class").isEqualTo("unkown")).
+    to("jms:queue:HumanPictureAnalysisQueue");
+    </programlisting>
+    The first part is identical, after the imagewriter the string identifier is
+    send to the UIMA AS processing pipeline which writes the image class back to the database.
+    The class is retrieved with the custom class-retriever component and written to a message header
+    field, only if the class is unknown the image is assigned for human classification.
+    </para>
+  </section>
+  <section id="ugr.async.camel.driver.implementation">
+  <title>Implementation</title>
+  <para>The Asynchronous Scaleout Camel Driver is a typical camel component. The camel
+  documentation <ulink url="">Writing Components</ulink> describes how camel components are
+  written. The source code can be found in the uimaj-as-camel project. The implementation
+  defines an asynchronous producer endpoint, which is implemented in the
+  <code>org.apache.uima.camel.UimaAsProducer</code> class. The <code>UimaAsProducer.process</code> method gets
+  the string body of the message, wraps it in a CAS object and sends it to UIMA AS.
+  Since the producer is asynchronous the camel message is registered with the reference id of
+  the sent CAS in an intermediate map, when the CAS comes back from UIMA AS the camel message is looked up
+  with the reference id of the CAS and the processing of the camel message is completed. 
+  For further details please read the <code>UimaAsProducer</code> implementation code.
+  </para>
   </section>
 </chapter>