You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:06:04 UTC

svn commit: r941744 [6/7] - in /uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides: ./ src/ src/docbook/ src/docbook/images/ src/docbook/images/tutorials_and_users_guides/ src/docbook/images/tutorials_and_users_guides/tug.aae/ src/d...

Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.cpe.xml Thu May  6 14:06:02 2010
@@ -0,0 +1,1333 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tutorials_and_users_guides/tug.cpe/">
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.cpe">
+  <title>Collection Processing Engine Developer&apos;s Guide</title>
+  <titleabbrev>CPE Developer&apos;s Guide</titleabbrev>
+  
+  <para>The UIMA Analysis Engine interface provides support for developing and integrating
+    algorithms that analyze unstructured data. Analysis Engines are designed to operate on a
+    per-document basis. Their interface handles one CAS at a time. UIMA provides additional
+    support for applying analysis engines to collections of unstructured data with its
+    <emphasis>Collection Processing Architecture</emphasis>. The Collection
+    Processing Architecture defines additional components for reading raw data formats
+    from data collections, preparing the data for processing by Analysis Engines, executing
+    the analysis, extracting analysis results, and deploying the overall flow in a variety of
+    local and distributed configurations.</para>
+  
+  <para>The functionality defined in the Collection Processing Architecture is
+    implemented by a <emphasis>Collection Processing Engine</emphasis> (CPE). A CPE
+    includes an Analysis Engine and adds a <emphasis>Collection Reader</emphasis>, a
+    <emphasis>CAS Initializer</emphasis> (deprecated as of version 2), and <emphasis>CAS
+    Consumers</emphasis>. The part of the UIMA Framework that supports the execution of
+    CPEs is called the Collection Processing Manager, or CPM.</para>
+  
+  <para>A Collection Reader provides the interface to the raw input data and knows how to
+    iterate over the data collection. Collection Readers are discussed in <xref
+      linkend="ugr.tug.cpe.collection_reader.developing"/>. The CAS Initializer
+    <footnote><para>CAS Initializers are deprecated in favor of a more general mechanism,
+    multiple subjects of analysis.</para></footnote> prepares an individual data item for
+    analysis and loads it into the CAS. CAS Initializers are discussed in <xref
+      linkend="ugr.tug.cpe.cas_initializer.developing"/> A CAS Consumer extracts
+    analysis results from the CAS and may also perform <emphasis>collection level
+    processing</emphasis>, or analysis over a collection of CASes. CAS Consumers are
+    discussed in <xref linkend="ugr.tug.cpe.cas_consumer.developing"/>.</para>
+  
+  <para>Analysis Engines and CAS Consumers are both instances of <emphasis>CAS
+    Processors</emphasis>. A Collection Processing Engine (CPE) may contain multiple CAS
+    Processors. An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate
+    (composed of other Analysis Engines). Aggregates may contain Cas Consumers. While
+    Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS
+    Processor may be deployed in a variety of local and distributed modes, providing a number
+    of options for scalability and robustness. The different deployment options are covered
+    in detail in <xref linkend="ugr.tug.cpe.deployment_alternatives"/>.</para>
+  
+  <para>Each of the components in a CPE has an interface specified by the UIMA Collection
+    Processing Architecture and is described by a declarative XML descriptor file.
+    Similarly, the CPE itself has a well defined component interface and is described by a
+    declarative XML descriptor file.</para>
+  
+  <para>A user creates a CPE by assembling the components mentioned above. The UIMA SDK
+    provides a graphical tool, called the CPE Configurator, for assisting in the assembly of
+    CPEs. Use of this tool is summarized in <xref
+      linkend="ugr.tug.cpe.cpe_configurator"/>, and more details can be found in <olink
+      targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.
+    Alternatively, a CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
+    descriptor, including its syntax and content, can be found in the <olink
+      targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. The individual
+    components have associated XML descriptors, each of which can be created and / or edited
+    using the <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde">
+    Component Description Editor</olink>.</para>
+  
+  <para>A CPE is executed by a UIMA infrastructure component called the
+    <emphasis>Collection Processing Manager</emphasis> (CPM). The CPM provides a number
+    of services and deployment options that cover instantiation and execution of CPEs, error
+    recovery, and local and distributed deployment of the CPE components.</para>
+  
+  <section id="ugr.tug.cpe.concepts">
+    <title>CPE Concepts</title>
+    
+    <para> <xref linkend="ugr.tug.cpe.fig.cpe_components"/> illustrates the data flow
+      that occurs between the different types of components that make up a CPE.</para>
+    
+    <figure id="ugr.tug.cpe.fig.cpe_components">
+      <title>CPE Components</title>
+      <mediaobject>
+        <imageobject>
+          <imagedata width="5.7in" format="PNG"
+            fileref="&imgroot;image002.png"/>
+        </imageobject>
+        <textobject><phrase>CPE Components and flow between them</phrase>
+        </textobject>
+      </mediaobject>
+    </figure>
+    
+    <para>The components of a CPE are:</para>
+    
+    <itemizedlist><listitem><para><emphasis>Collection Reader &ndash;</emphasis>
+      interfaces to a collection of data items (e.g., documents) to be analyzed. Collection
+      Readers return CASes that contain the documents to analyze, possibly along with
+      additional metadata.</para></listitem>
+      
+      <listitem><para><emphasis>Analysis Engine &ndash;</emphasis> takes a CAS,
+        analyzes its contents, and produces an enriched CAS. Analysis Engines can be
+        recursively composed of other Analysis Engines (called an
+        <emphasis>Aggregate</emphasis> Analysis Engine). Aggregates may also contain
+        CAS Consumers.</para></listitem>
+      
+      <listitem><para><emphasis>CAS Consumer &ndash;</emphasis> consume the enriched
+        CAS that was produced by the sequence of Analysis Engines before it, and produce an
+        application-specific data structure, such as a search engine index or database.
+        </para></listitem></itemizedlist>
+    
+    <para>A fourth type of component, the <emphasis>CAS Initializer,</emphasis> may be
+      used by a Collection Reader to populate a CAS from a document. However, as of UIMA
+      version 2 CAS Initializers are now deprecated in favor of a more general mechsanism,
+      multiple Subjects of Analysis.</para>
+    
+    <para>The Collection Processing Manager orchestrates the data flow
+      within a CPE, monitors status, optionally manages the life-cycle of internal
+      components and collects statistics.</para>
+    
+    <para>CASes are not saved in a persistent way by the framework. If you want to save CASes,
+      then you have to save each CAS as it comes through (for example) using a CAS Consumer you
+      write to do this, in whatever format you like. The UIMA SDK supplies an example CAS
+      Consumer to save CASes to XML files, either in the standard XMI format or in an older
+      format called XCAS.  It also supplies an example CAS Consumer to extract information from CASes and
+      store the results into a relational Database, using Java&apos;s JDBC APIs.</para>
+    
+  </section>
+  
+  <section id="ugr.tug.cpe.configurator_and_viewer">
+    <title>CPE Configurator and CAS viewer</title>
+    
+    <section id="ugr.tug.cpe.cpe_configurator">
+      <title>Using the CPE Configurator</title>
+      
+      <para>A CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
+        descriptor, including its syntax and content, can be found in <olink
+          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. Rather than
+        edit raw XML, you may develop a CPE Descriptor using the CPE Configurator tool. The CPE
+        Configurator tool is described briefly in this section, and in more detail in <olink
+          targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</para>
+      
+      <para>The CPE Configurator tool can be run from Eclipse (see <xref
+          linkend="ugr.tug.cpe.running_cpe_configurator_from_eclipse"/>, or using
+        the <literal>cpeGui</literal> shell script (<literal>cpeGui.bat</literal> on
+        Windows, <literal>cpeGui.sh</literal> on Unix), which is located in the
+        <literal>bin</literal> directory of the UIMA SDK installation. Executing this
+        batch file will display the window shown here:
+        
+        
+        <screenshot>
+          <mediaobject>
+            <imageobject>
+              <imagedata width="5.7in" format="JPG" fileref="&imgroot;image004.jpg"/>
+            </imageobject>
+            <textobject><phrase>Screenshot of CPE GUI</phrase></textobject>
+          </mediaobject>
+        </screenshot>
+        </para>
+      
+      <para>The window is divided into three sections, one each for the Collection Reader, 
+        Analysis Engines, and CAS Consumers.<footnote><para>There is also a fourth pane,
+        for the CAS Initializer, but it is hidden by default.  To enable it click the
+        <literal>View &rarr; CAS Initializer Panel</literal> menu item.</para></footnote> 
+        In each section, you select the component(s) you want to include in the CPE by 
+        browsing to their XML descriptors. The configuration parameters present in the XML 
+        descriptors will then be displayed in the GUI; these can be modified to override
+        the values present in the descriptor. For example, the screen shot below shows the 
+        CPE Configurator after the following components have been chosen:
+        
+        
+        <programlisting>Collection Reader: 
+   %UIMA_HOME%/examples/descriptors/collection_reader/
+          FileSystemCollectionReader.xml
+
+Analysis Engine: 
+   %UIMA_HOME%/examples/descriptors/analysis_engine/
+          NamesAndPersonTitles_TAE.xml
+
+CAS Consumer: 
+    %UIMA_HOME%/examples/descriptors/cas_consumer/
+          XmiWriterCasConsumer.xml</programlisting></para>
+      
+      
+      <screenshot>
+     <mediaobject>
+      <imageobject>
+        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of CPE GUI after fields filled in</phrase></textobject>
+    </mediaobject>
+    </screenshot>
+      
+      <para>For the File System Collection Reader, ensure that the Input Directory is set to
+        <literal>%UIMA_HOME%\examples\data</literal><footnote><para>Replace
+        <literal>%UIMA_HOME%</literal> with the path to where you installed UIMA.</para>
+        </footnote>. The other parameters may be left blank. For the External CAS Writer CAS
+        Consumer, ensure that the Output Directory is set to
+        <literal>%UIMA_HOME%\examples\data\processed</literal>.</para>
+      
+      <para>After selecting each of the components and providing configuration settings,
+        click the play (forward arrow) button at the bottom of the screen to begin processing.
+        A progress bar should be displayed in the lower left corner. (Note that the progress
+        bar will not begin to move until all components have completed their initialization,
+        which may take several seconds.) Once processing has begun, the pause and stop
+        buttons become enabled.</para>
+      
+      <para>If an error occurs, you will be informed by an error dialog. If processing
+        completes successfully, you will be presented with a performance report.</para>
+      
+      <para>Using the File menu, you can select <literal>Save CPE Descriptor </literal>to
+        create an .xml descriptor file that defines the CPE you have constructed. Later, you
+        can use <literal>Open CPE Descriptor</literal> to restore the CPE Configurator to
+        the saved state. Also, CPE descriptors can be used to run a CPE from a Java program
+        &ndash; see section <xref
+          linkend="ugr.tug.cpe.running_cpe_from_application"/>. CPE Descriptors
+        allow specifying operational parameters, such as error handling options, that are
+        not currently available for configuration through the CPE Configurator. For more
+        information on manually creating a CPE Descriptor, see the <olink
+          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>
+            
+      <para>The CPE configured above runs a simple name and title annotator on the sample data
+        provided with the UIMA SDK and stores the results using the XMI Writer CAS Consumer. To
+        view the results, start the External CAS Annotation Viewer by running the
+        <literal>annotationViewer</literal> batch file
+        (<literal>annotationViewer.bat</literal> on Windows,
+        <literal>annotationViewer.sh</literal> on Unix), which is located in the
+        <literal>bin</literal> directory of the UIMA SDK installation. Executing this
+        batch file will display the window shown here:
+        
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image008.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Annotation Viewer results</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        </para>
+      
+      <para>Ensure that the Input Directory is the same as the Output Directory specified for
+        the XMI Writer CAS Consumer in the CPE configured above (e.g.,
+        <literal>%UIMA_HOME%\examples\data\processed</literal>) and that the TAE
+        Descriptor File is set to the Analysis Engine used in the CPE configured above (e.g.,
+        <literal>examples\descriptors\analysis_engine\NamesAndPersonTitles_TAE.xml</literal>
+        ).</para>
+      
+      <para>Click the View button to display the Analyzed Documents window:
+        
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="3.5in" format="JPG" fileref="&imgroot;image010.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of CPE Configurator Analyzed Documents</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        </para>
+      
+      <para>Double click on any document in the list to view the analyzed document. Double
+        clicking the first document, IBM_LifeSciences.txt, will bring up the following
+        window:
+        
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image012.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Document and Annotation Viewer</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        </para>
+      
+      <para>This window shows the analysis results for the document. Clicking on any
+        highlighted annotation causes the details for that annotation to be displayed in the
+        right-hand pane. Here the annotation spanning <quote>John M. Thompson</quote> has
+        been clicked.</para>
+      
+      <para>Congratulations! You have successfully configured a CPE, saved its
+        descriptor, run the CPE, and viewed the analysis results.</para>
+    </section>
+    
+    <section id="ugr.tug.cpe.running_cpe_configurator_from_eclipse">
+      <title>Running the CPE Configurator from Eclipse</title>
+      
+      <para>If you have followed the instructions in <olink
+          targetdoc="&uima_docs_overview;"
+          targetptr="ugr.ovv.eclipse_setup"/> and imported the example Eclipse
+        project, then you should already have a Run configuration for the CPE Configurator
+        tool (called <literal>UIMA CPE GUI</literal>) configured to run in the example
+        project. Simply run that configuration to start the CPE Configurator.</para>
+      
+      <para>If you haven&apos;t followed the Eclipse setup instructions and wish to run the
+        CPE Configurator tool from Eclipse, you will need to do the following. As installed,
+        this Eclipse launch configuration is associated with the
+        <quote>uimaj-examples</quote> project. If you&apos;ve not already done so, you
+        may wish to import that project into your Eclipse workspace. It&apos;s located in
+        %UIMA_HOME%/docs/examples. Doing this will supply the Eclipse launcher with all
+        the class files it needs to run the CPE configurator. If you don&apos;t do this, please
+        manually add the JAR files for UIMA to the launch configuration.</para>
+      <para>Also, you need to add any projects or JAR files for any UIMA components you will be
+        running to the launch class path.</para> <note><para>A simpler alternative may be
+      to change the CPE launch configuration to be based on your project. If you do that, it will
+      pick up all the files in your project&apos;s class path, which you should set up to
+      include all the UIMA framework files. An easy way to do this is to specify in your
+      project&apos;s properties&apos; build-path that the uimaj-examples project is on
+      the build path, because the uimaj-examples project is set up to include all the UIMA
+      framework classes in its classpath already. </para></note>
+      
+      <para>Next, in the Eclipse menu select <literal>Run &rarr;
+        Run</literal>..., which brings up the Run configuration screen.</para>
+      
+      <para>In the Main tab, set the main class to
+        <literal>org.apache.uima.tools.cpm.CpmFrame</literal></para>
+      
+      <para>In the arguments tab, add the following to the VM arguments:
+        
+        
+        <programlisting>-Xms128M -Xmx256M 
+-Duima.home="C:\Program Files\Apache\uima"</programlisting>
+        (or wherever you installed the UIMA SDK)</para>
+      
+      <para>Click the Run button to launch the CPE Configurator, and use it as previously
+        described in this section.</para>
+      
+    </section>
+  </section>
+  
+  <section id="ugr.tug.cpe.running_cpe_from_application">
+    <title>Running a CPE from Your Own Java Application</title>
+    
+    <para>The simplest way to run a CPE from a Java application is to first create a CPE
+      descriptor as described in the previous section. Then the CPE can be instantiated and
+      run using the following code:
+      
+      
+      <programlisting>      //parse CPE descriptor in file specified on command line
+CpeDescription cpeDesc = UIMAFramework.getXMLParser().
+        parseCpeDescription(new XMLInputSource(args[0]));
+      
+      //instantiate CPE
+mCPE = UIMAFramework.produceCollectionProcessingEngine(cpeDesc);
+
+      //Create and register a Status Callback Listener
+mCPE.addStatusCallbackListener(new StatusCallbackListenerImpl());
+
+      //Start Processing
+mCPE.process();</programlisting></para>
+    
+    <para>This will start the CPE running in a separate thread.</para>
+    
+    <note><para>The <literal>process()</literal> method for a CPE can only be called once.  If you 
+    need to call it again, you have to instantiate a new CPE, and call that new CPE's process
+    method.</para></note>
+    
+    <section id="ugr.tug.cpe.using_listeners">
+      <title>Using Listeners</title>
+      
+      <para>Updates of the CPM&apos;s progress, including any errors that occur, are sent to
+        the callback handler that is registered by the call to
+        <literal>addStatusCallbackListener</literal>, above. The callback handler is a
+        class that implements the CPM&apos;s
+        <literal>StatusCallbackListener</literal> interface. It responds to events by
+        printing messages to the console. The source code is fairly straightforward and is
+        not included in this chapter &ndash; see the
+        <literal>org.apache.uima.examples.cpe.SimpleRunCPE.java</literal> in the
+        <literal>%UIMA_HOME%\examples\src</literal> directory for the complete
+        code.</para>
+      
+      <para>If you need more control over the information in the CPE descriptor, you can
+        manually configure it via its API. See the Javadocs for package
+        <literal>org.apache.uima.collection</literal> for more details.</para>
+      
+    </section>
+  </section>
+  
+  <section id="ugr.tug.cpe.developing_collection_processing_components">
+    <title>Developing Collection Processing Components</title>
+    
+    <para>This section is an introduction to the process of developing Collection Readers,
+      CAS Initializers, and CAS Consumers. The code snippets refer to the classes that can be
+      found in <literal>%UIMA_HOME%\examples\src </literal>example project.</para>
+    
+    <para>In the following sections, classes you write to represent components need to be
+      public and have public, 0-argument constructors, so that they can be instantiated by
+      the framework. (Although Java classes in which you do not define any constructor will,
+      by default, have a 0-argument constructor that doesn&apos;t do anything, a class in
+      which you have defined at least one constructor does not get a default 0-argument
+      constructor.)</para>
+    
+    <section id="ugr.tug.cpe.collection_reader.developing">
+      <title>Developing Collection Readers</title>
+      
+      <para>A Collection Reader is responsible for obtaining documents from the collection
+        and returning each document as a CAS. Like all UIMA components, a Collection Reader
+        consists of two parts &mdash; the code and an XML descriptor.</para>
+      
+      <para>A simple example of a Collection Reader is the <quote>File System Collection
+        Reader,</quote> which simply reads documents from files in a specified directory.
+        The Java code is in the class
+        <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>
+        and the XML descriptor is
+        <literal>%UIMA_HOME%/examples/src/main/descriptors/collection_reader/
+          FileSystemCollectionReader.xml</literal>.</para>
+      
+      <section id="ugr.tug.cpe.collection_reader.java_class">
+        <title>Java Class for the Collection Reader</title>
+        
+        <para>The Java class for a Collection Reader must implement the
+          <literal>org.apache.uima.collection.CollectionReader</literal>
+          interface. You may build your Collection Reader from scratch and implement this
+          interface, or you may extend the convenience base class
+          <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>
+          .</para>
+        
+        <para>The convenience base class provides default implementations for many of the
+          methods defined in the <literal>CollectionReader</literal> interface, and
+          provides abstract definitions for those methods that you are required to
+          implement in your new Collection Reader. Note that if you extend this base class,
+          you do not need to declare that your new Collection Reader implements the
+          <literal>CollectionReader</literal> interface.</para> <tip><para>Eclipse
+        tip &ndash; if you are using Eclipse, you can quickly create the boiler plate code and
+        stubs for all of the required methods by clicking <literal>File</literal>
+        &rarr; <literal>New</literal> &rarr; <literal>Class</literal> to bring up the <quote>New Java Class</quote>
+        dialogue, specifying
+        <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>
+        as the Superclass, and checking <quote>Inherited abstract methods</quote> in the
+        section <quote>Which method stubs would you like to create?</quote>, as in the 
+        screenshot below:</para></tip>     
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="4.4in" format="JPG" fileref="&imgroot;image014.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot showing Eclipse new class wizard</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        
+        <para>For the rest of this section we will assume that your new Collection Reader
+          extends the <literal>CollectionReader_ImplBase</literal> class, and we will
+          show examples from the
+          <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>
+          . If you must inherit from a different superclass, you must ensure that your
+          Collection Reader implements the <literal>CollectionReader</literal>
+          interface &ndash; see the Javadocs for <literal>CollectionReader</literal>
+          for more details.</para>
+      </section>
+      
+      <section id="ugr.tug.cpe.collection_reader.required_methods">
+        <title>Required Methods in the Collection Reader class</title>
+        
+        
+        <para>The following abstract methods must be implemented:</para>
+        
+        <section id="ugr.tug.cpe.collection_reader.required_methods.initialize">
+          <title>initialize()</title>
+          
+          <para>The <literal>initialize()</literal> method is called by the framework
+            when the Collection Reader is first created.
+            <literal>CollectionReader_ImplBase</literal> actually provides a default
+            implementation of this method (i.e., it is not abstract), so you are not strictly
+            required to implement this method. However, a typical Collection Reader will
+            implement this method to obtain parameter values and perform various
+            initialization steps.</para>
+          
+          <para>In this method, the Collection Reader class can access the values of its
+            configuration parameters and perform other initialization logic. The example
+            File System Collection Reader reads its configuration parameters and then
+            builds a list of files in the specified input directory, as follows:</para>
+          
+          
+          <programlisting>public void initialize() throws ResourceInitializationException {
+  File directory = new File(
+            (String)getConfigParameterValue(PARAM_INPUTDIR));
+  mEncoding = (String)getConfigParameterValue(PARAM_ENCODING);
+  mDocumentTextXmlTagName = (String)getConfigParameterValue(PARAM_XMLTAG);
+  mLanguage = (String)getConfigParameterValue(PARAM_LANGUAGE);
+  mCurrentIndex = 0; 
+  
+  //get list of files (not subdirectories) in the specified directory
+  mFiles = new ArrayList();
+  File[] files = directory.listFiles();
+  for (int i = 0; i &lt; files.length; i++) {
+    if (!files[i].isDirectory()) {
+      mFiles.add(files[i]);  
+    }
+  }
+}</programlisting>
+          <note><para>This is the zero-argument version of the initialize method. There is
+          also a method on the Collection Reader interface called
+          <literal>initialize(ResourceSpecifier, Map)</literal> but it is not
+          recommended that you override this method in your code. That method performs
+          internal initialization steps and then calls the zero-argument
+          <literal>initialize()</literal>. </para></note>
+          
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.hasnext">
+          <title>hasNext()</title>
+          
+          <para>The <literal>hasNext()</literal> method returns whether or not there are
+            any documents remaining to be read from the collection. The File System
+            Collection Reader&apos;s <literal>hasNext()</literal> method is very
+            simple. It just checks if there are any more files left to be read:
+            
+            
+            <programlisting>public boolean hasNext() {
+  return mCurrentIndex &lt; mFiles.size();
+}</programlisting>
+            </para>
+          
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.required_methods.getnext">
+          <title>getNext(CAS)</title>
+          
+          <para>The <literal>getNext()</literal> method reads the next document from the
+            collection and populates a CAS. In the simple case, this amounts to reading the
+            file and calling the CAS&apos;s <literal>setDocumentText</literal> method.
+            The example File System Collection Reader is slightly more complex. It first
+            checks for a CAS Initializer. If the CPE includes a CAS Initializer, the CAS
+            Initializer is used to read the document, and
+            <literal>initialize()</literal> the CAS. If the CPE does not include a CAS
+            Initializer, the File System Collection Reader reads the document and sets the
+            document text in the CAS.</para>
+          
+          <para>The File System Collection Reader also stores additional metadata about
+            the document in the CAS. In particular, it sets the document&apos;s language in
+            the special built-in feature structure
+            <literal>uima.tcas.DocumentAnnotation </literal>(see <olink
+              targetdoc="&uima_docs_ref;"
+              targetptr="ugr.ref.cas.document_annotation"/> for details about this
+            built-in type) and creates an instance of
+            <literal>org.apache.uima.examples.SourceDocumentInformation</literal>
+            , which stores information about the document&apos;s source location. This
+            information may be useful to downstream components such as CAS Consumers. Note
+            that the type system descriptor for this type can be found in
+            <literal>org.apache.uima.examples.SourceDocumentInformation.xml</literal>
+            , which is located in the <literal>examples/src</literal> directory.</para>
+          
+          <para>The getNext() method for the File System Collection Reader looks like
+            this:</para>
+          
+          
+          <programlisting>  public void getNext(CAS aCAS) throws IOException, CollectionException {
+    JCas jcas;
+    try {
+      jcas = aCAS.getJCas();
+    } catch (CASException e) {
+      throw new CollectionException(e);
+    }
+
+    // open input stream to file
+    File file = (File) mFiles.get(mCurrentIndex++);
+    BufferedInputStream fis = 
+            new BufferedInputStream(new FileInputStream(file));
+    try {
+      byte[] contents = new byte[(int) file.length()];
+      fis.read(contents);
+      String text;
+      if (mEncoding != null) {
+        text = new String(contents, mEncoding);
+      } else {
+        text = new String(contents);
+      }
+      // put document in CAS
+      jcas.setDocumentText(text);
+    } finally {
+      if (fis != null)
+        fis.close();
+    }
+
+    // set language if it was explicitly specified 
+    //as a configuration parameter
+    if (mLanguage != null) {
+      ((DocumentAnnotation) jcas.getDocumentAnnotationFs()).
+            setLanguage(mLanguage);
+    }
+
+    // Also store location of source document in CAS. 
+    // This information is critical if CAS Consumers will 
+    // need to know where the original document contents 
+    // are located.
+    // For example, the Semantic Search CAS Indexer 
+    // writes this information into the search index that 
+    // it creates, which allows applications that use the 
+    // search index to locate the documents that satisfy 
+    //their semantic queries.
+    SourceDocumentInformation srcDocInfo = 
+            new SourceDocumentInformation(jcas);
+    srcDocInfo.setUri(
+            file.getAbsoluteFile().toURL().toString());
+    srcDocInfo.setOffsetInSource(0);
+    srcDocInfo.setDocumentSize((int) file.length());
+    srcDocInfo.setLastSegment(
+            mCurrentIndex == mFiles.size());
+    srcDocInfo.addToIndexes();
+  }</programlisting>
+          
+          <para>The Collection Reader can create additional annotations in the CAS at this
+            point, in the same way that annotators create annotations.</para>
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.required_methods.getprogress">
+          <title>getProgress()</title>
+          <para>The Collection Reader is responsible for returning progress information;
+            that is, how much of the collection has been read thus far and how much remains to be
+            read. The framework defines progress very generally; the Collection Reader
+            simply returns an array of <literal>Progress</literal> objects, where each
+            object contains three fields &mdash; the amount already completed, the total
+            amount (if known), and a unit (e.g. entities (documents), bytes, or files). The
+            method returns an array so that the Collection Reader can report progress in
+            multiple different units, if that information is available. The File System
+            Collection Reader&apos;s <literal>getProgress()</literal> method looks
+            like this:
+            
+            
+            <programlisting>public Progress[] getProgress() {
+  return new Progress[]{
+     new ProgressImpl(mCurrentIndex,mFiles.size(),Progress.ENTITIES)};
+}</programlisting></para>
+          
+          <para>In this particular example, the total number of files in the collection is
+            known, but the total size of the collection is not known. As such, a
+            <literal>ProgressImpl</literal> object for
+            <literal>Progress.ENTITIES</literal> is returned, but a
+            <literal>ProgressImpl</literal> object for
+            <literal>Progress.BYTES</literal> is not.</para>
+          
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.required_methods.close">
+          <title>close()</title>
+          
+          <para>The close method is called when the Collection Reader is no longer needed.
+            The Collection Reader should then release any resources it may be holding. The
+            FileSystemCollectionReader does not hold resources and so has an empty
+            implementation of this method:</para>
+          
+          
+          <programlisting>public void close() throws IOException { }</programlisting>
+          
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.optional_methods">
+          <title>Optional Methods</title>
+          
+          <para>The following methods may be implemented:</para>
+          
+          <section id="ugr.tug.cpe.collection_reader.optional_methods.reconfigure">
+            <title>reconfigure()</title>
+            <para>This method is called if the Collection Reader&apos;s configuration
+              parameters change.</para>
+          </section>
+          
+          <section id="ugr.tug.cpe.collection_reader.optional_methods.typesysteminit">
+            <title>typeSystemInit()</title>
+            
+            <para>If you are only setting the document text in the CAS, or if you are using the
+              JCas (recommended, as in the current example, you do not have to implement this
+              method. If you are directly using the CAS API, this method is used in the same way
+              as it is used for an annotator &ndash; see <olink
+                targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae.contract_for_annotator_methods"/>
+              for more information.</para>
+          </section>
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.threading">
+          <title>Threading considerations</title>
+          
+          <para>Collection readers do not have to be thread safe; they are run with a single
+            thread per instance, and only one instance per instance of the Collection
+            Processing Manager (CPM) is made.</para>
+          
+        </section>
+        
+        <section id="ugr.tug.cpe.collection_reader.descriptor">
+          <title>XML Descriptor for a Collection Reader</title>
+          
+          <para>You can use the Component Description Editor to create and / or edit the File
+            System Collection Reader&apos;s descriptor. Here is its descriptor
+            (abbreviated somewhat), which is very similar to an Analysis
+            Engine descriptor:</para>
+          
+          
+          <programlisting><?db-font-size 80% ?><![CDATA[<collectionReaderDescription 
+          xmlns="http://uima.apache.org/resourceSpecifier">
+  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
+  <implementationName>
+    org.apache.uima.examples.cpe.FileSystemCollectionReader
+  </implementationName>
+  <processingResourceMetaData>
+    <name>File System Collection Reader</name>
+    <description>Reads files from the filesystem.</description>
+    <version>1.0</version>
+    <vendor>The Apache Software Foundation</vendor>
+    <configurationParameters>
+      <configurationParameter>
+        <name>InputDirectory</name>
+        <description>Directory containing input files</description>
+        <type>String</type>
+        <multiValued>false</multiValued>
+        <mandatory>true</mandatory>
+      </configurationParameter>
+      <configurationParameter>
+        <name>Encoding</name>
+        <description>Character encoding for the documents.</description>
+        <type>String</type>
+        <multiValued>false</multiValued>
+        <mandatory>false</mandatory>
+      </configurationParameter>
+      <configurationParameter>
+        <name>Language</name>
+        <description>ISO language code for the documents</description>
+        <type>String</type>
+        <multiValued>false</multiValued>
+        <mandatory>false</mandatory>
+      </configurationParameter>
+    </configurationParameters>
+    <configurationParameterSettings>
+      <nameValuePair>
+        <name>InputDirectory</name>
+        <value>
+          <string>C:/Program Files/apache/uima/examples/data</string>
+        </value>
+      </nameValuePair>
+    </configurationParameterSettings>
+    
+    <!-- Type System of CASes returned by this Collection Reader -->
+    
+    <typeSystemDescription>
+      <imports>
+        <import name="org.apache.uima.examples.SourceDocumentInformation"/>
+      </imports>
+    </typeSystemDescription>
+    
+    <capabilities>
+      <capability>
+        <inputs/>
+        <outputs>
+          <type allAnnotatorFeatures="true">
+            org.apache.uima.examples.SourceDocumentInformation
+          </type>
+        </outputs>
+      </capability>
+    </capabilities>
+    <operationalProperties>
+      <modifiesCas>true</modifiesCas>
+      <multipleDeploymentAllowed>false</multipleDeploymentAllowed>
+      <outputsNewCASes>true</outputsNewCASes>
+    </operationalProperties>
+  </processingResourceMetaData>
+</collectionReaderDescription>]]></programlisting>
+          
+        </section>
+      </section>
+    </section>
+    
+    <section id="ugr.tug.cpe.cas_initializer.developing"><title>Developing CAS
+      Initializers</title> <note><para>CAS Initializers are now deprecated (as of
+      version 2.1). For complex initialization, please use instead the capabilities of
+      creating additional Subjects of Analysis (see <olink
+        targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>
+      ). </para></note>
+      
+      <para>In UIMA 1.x, the CAS Initializer component was intended to be used as a plug-in
+        to the Collection Reader for when the task of populating the CAS from a raw document is
+        complex and might be reusable with other data collections.</para>
+          
+      <para>A CAS Initializer Java class must implement the interface
+        <literal>org.apache.uima.collection.CasInitializer</literal>, and will also
+        generally extend from the convenience base class
+        <literal>org.apache.uima.collection.CasInitializer_ImplBase</literal>. A
+        CAS Initializer also must have an XML descriptor, which has the exact same form as a
+        Collection Reader Descriptor except that the outer tag is
+        <literal>&lt;casInitializerDescription&gt;</literal>.</para>
+      
+      <para>CAS Initializers have optional <literal>initialize()</literal>,
+        <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>
+        methods, which perform the same functions as they do for Collection Readers. The only
+        required method for a CAS Initializer is <literal>initializeCas(Object,
+        CAS)</literal>. This method takes the raw document (for example, an
+        <literal>InputStream</literal> object from which the document can be read) and a
+        CAS, and populates the CAS from the document.</para>      
+    </section>
+    
+    <section id="ugr.tug.cpe.cas_consumer.developing"><title>Developing CAS
+      Consumers</title> 
+      
+      <note><para>In version 2, there is no difference in capability
+      between CAS Consumers and ordinary Analysis Engines, except for the default setting of
+      the XML parameters for <literal>multipleDeploymentAllowed</literal> and
+      <literal>modifiesCas</literal>. We recommend for future work that users implement
+      and use Analysis Engine components instead of CAS Consumers.</para>
+      <para>The rest of this section is written using the version 1 style of CAS Consumer;
+      the methods described are also available for Analysis Engines.  Note that the 
+      CAS Consumer <literal>processCAS</literal> method is equivalent to the Analysis Engine
+      <literal>process</literal> method.</para></note>
+      
+      <para>A CAS Consumer receives each CAS after it has been analyzed by the Analysis
+        Engine. CAS Consumers typically do not update the CAS; they typically extract data
+        from the CAS and persist selected information to aggregate data structures such as
+        search engine indexes or databases.</para>
+      
+      <para>A CAS Consumer Java class must implement the interface
+        <literal>org.apache.uima.collection.CasConsumer</literal>, and will also
+        generally extend from the convenience base class
+        <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>. A CAS
+        Consumer also must have an XML descriptor, which has the exact same form as a
+        Collection Reader Descriptor except that the outer tag is
+        <literal>&lt;casConsumerDescription&gt;</literal>.</para>
+      
+      <para>CAS Consumers have optional <literal>initialize()</literal>,
+        <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>
+        methods, which perform the same functions as they do for Collection Readers and CAS
+        Initializers. The only required method for a CAS Consumer is
+        <literal>processCas(CAS)</literal>, which is where the CAS Consumer does the bulk
+        of its work (i.e., consume the CAS).</para>
+      
+      <para>The <literal>CasConsumer</literal> interface (as well as the version 2
+        Analysis Engine interfac) additionally defines batch
+        and collection level processing methods. The CAS Consumer or Analysis Engine
+        can implement the
+        <literal>batchProcessComplete()</literal> method to perform processing that
+        should occur at the end of each batch of CASes. Similarly, the CAS Consumer 
+        or Analysis Engine can
+        implement the <literal>collectionProcessComplete()</literal> method to
+        perform any collection level processing at the end of the collection.</para>
+      
+      <para>A very simple example of a CAS Consumer, which writes an XML representation of the
+        CAS to a file, is the XMI Writer CAS Consumer. The Java code is in the class
+        <literal>org.apache.uima.examples.cpe.XmiWriterCasConsumer</literal> and
+        the descriptor is in
+        <literal>%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</literal>
+        .</para>
+      
+      <section id="ugr.tug.cpe.cas_consumer.required_methods">
+        <title>Required Methods for a CAS Consumer</title>
+        
+        <para>When extending the convenience class
+          <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>, the
+          following abstract methods must be implemented:</para>
+        
+        <section id="ugr.tug.cpe.cas_consumer.required_methods.initialize">
+          <title>initialize()</title>
+          <para>The <literal>initialize()</literal> method is called by the framework
+            when the CAS Consumer is first created.
+            <literal>CasConsumer_ImplBase</literal> actually provides a default
+            implementation of this method (i.e., it is not abstract), so you are not strictly
+            required to implement this method. However, a typical CAS Consumer will
+            implement this method to obtain parameter values and perform various
+            initialization steps.</para>
+          
+          <para>In this method, the CAS Consumer can access the values of its configuration
+            parameters and perform other initialization logic. The example XMI Writer CAS
+            Consumer reads its configuration parameters and sets up the output directory:
+            
+            
+            <programlisting><?db-font-size 80% ?>public void initialize() throws ResourceInitializationException {
+  mDocNum = 0;
+  mOutputDir = new File((String) getConfigParameterValue(PARAM_OUTPUTDIR));
+  if (!mOutputDir.exists()) {
+    mOutputDir.mkdirs();
+  }
+}</programlisting></para>
+        </section>
+        
+        <section id="ugr.tug.cpe.cas_consumer.required_methods.processcas">
+          <title>processCas()</title>
+          
+          <para>The <literal>processCas()</literal> method is where the CAS Consumer
+            does most of its work. In our example, the XMI Writer CAS Consumer obtains an
+            iterator over the document metadata in the CAS (in the
+            SourceDocumentInformation feature structure, which is created by the File
+            System Collection Reader) and extracts the URI for the current document. From
+            this the output filename is constructed in the output directory and a subroutine
+            (<literal>writeXmi</literal>) is called to generate the output file. The
+            <literal>writeXmi</literal> subroutine uses the
+            <literal>XmiCasSerializer</literal> class provided with the UIMA SDK to
+            serialize the CAS to the output file (see the example source code for
+            details).</para>
+          
+          
+          <programlisting>public void processCas(CAS aCAS) throws ResourceProcessException {
+  String modelFileName = null;
+
+  JCas jcas;
+  try {
+    jcas = aCAS.getJCas();
+  } catch (CASException e) {
+    throw new ResourceProcessException(e);
+  }
+ 
+    // retreive the filename of the input file from the CAS
+  FSIterator it = jcas
+            .getAnnotationIndex(SourceDocumentInformation.type)
+                  .iterator();
+  File outFile = null;
+  if (it.hasNext()) {
+    SourceDocumentInformation fileLoc = 
+            (SourceDocumentInformation) it.next();
+    File inFile;
+    try {
+      inFile = new File(new URL(fileLoc.getUri()).getPath());
+      String outFileName = inFile.getName();
+      if (fileLoc.getOffsetInSource() > 0) {
+        outFileName += ("_" + fileLoc.getOffsetInSource());
+      }
+      outFileName += ".xmi";
+      outFile = new File(mOutputDir, outFileName);
+      modelFileName = mOutputDir.getAbsolutePath() + 
+            "/" + inFile.getName() + ".ecore";
+    } catch (MalformedURLException e1) {
+      // invalid URL, use default processing below
+    }
+  }
+  if (outFile == null) {
+    outFile = new File(mOutputDir, "doc" + mDocNum++);
+  }
+  // serialize XCAS and write to output file
+  try {
+    writeXmi(jcas.getCas(), outFile, modelFileName);
+  } catch (IOException e) {
+    throw new ResourceProcessException(e);
+  } catch (SAXException e) {
+    throw new ResourceProcessException(e);
+  }
+}</programlisting>
+          
+        </section>
+        
+        <section id="ugr.tug.cpe.cas_consumer.optional_methods">
+          <title>Optional Methods</title>
+          <para>The following methods are optional in a CAS Consumer, though they are often
+            used.</para>
+          <section id="ugr.tug.cpe.cas_consumer.optional_methods.batchprocesscomplete">
+            <title>batchProcessComplete()</title>
+            
+            <para>The framework calls the batchProcessComplete() method at the end of each
+              batch of CASes. This gives the CAS Consumer or Analysis Engine 
+              an opportunity to perform any batch
+              level processing. Our simple XMI Writer CAS Consumer does not perform any
+              batch level processing, so this method is empty. Batch size is set in the
+              Collection Processing Engine descriptor.</para>
+          </section>
+          
+          <section id="ugr.tug.cpe.cas_consumer.optional_methods.collectionprocesscomplete">
+            <title>collectionProcessComplete()</title>
+            
+            <para>The framework calls the collectionProcessComplete() method at the end
+              of the collection (i.e., when all objects in the collection have been
+              processed). At this point in time, no CAS is passed in as a parameter. This gives
+              the CAS Consumer or Analysis Engine an opportunity to perform collection processing over the
+              entire set of objects in the collection. Our simple XMI Writer CAS Consumer
+              does not perform any collection level processing, so this method is
+              empty.</para>
+          </section>
+          
+        </section>
+        
+      </section>
+    </section>
+  </section>
+  
+  <section id="ugr.tug.cpe.deploying_a_cpe">
+    <title>Deploying a CPE</title>
+    
+    <para>The CPM provides a number of service and deployment options that cover
+      instantiation and execution of CPEs, error recovery, and local and distributed
+      deployment of the CPE components. The behavior of the CPM (and correspondingly, the
+      CPE) is controlled by various options and parameters set in the CPE descriptor. The
+      current version of the CPE Configurator tool, however, supports only default error
+      handling and deployment options. To change these options, you must manually edit the
+      CPE descriptor.</para>
+    
+    <para>Eventually the CPE Configurator tool will support configuring these options and a
+      detailed tutorial for these settings will be provided. In the meantime, we provide only
+      a high-level, conceptual overview of these advanced features in the rest of this
+      chapter, and refer the advanced user to <olink targetdoc="&uima_docs_ref;"
+        targetptr="ugr.ref.xml.cpe_descriptor"/> for details on setting these options in the CPE
+      Descriptor.</para>
+    
+    <para> <xref linkend="ugr.tug.cpe.fig.cpe_instantiation"/> shows a logical view of
+      how an application uses the UIMA framework to instantiate a CPE from a CPE descriptor.
+      The CPE descriptor identifies the CPE components (referencing their corresponding
+      descriptors) and specifies the various options for configuring the CPM and deploying
+      the CPE components.</para>
+    
+    <figure id="ugr.tug.cpe.fig.cpe_instantiation">
+      <title>CPE Instantiation</title>
+      <mediaobject>
+        <imageobject>
+          <imagedata width="5.7in" format="PNG"
+            fileref="&imgroot;image018.png"/>
+        </imageobject>
+        <textobject><phrase>Picture of deployment of a CPE</phrase></textobject>
+      </mediaobject>
+    </figure>
+    
+    <para id="ugr.tug.cpe.deployment_alternatives">There are three deployment modes
+      for CAS Processors (Analysis Engines and CAS Consumers) in a CPE:</para>
+    
+    <orderedlist><listitem><para><emphasis role="bold">Integrated</emphasis> (runs
+      in the same Java instance as the CPM)</para></listitem>
+      
+      <listitem><para><emphasis role="bold">Managed</emphasis> (runs in a separate
+        process on the same machine), and</para></listitem>
+      
+      <listitem><para><emphasis role="bold">Non-managed</emphasis> (runs in a
+        separate process, perhaps on a different machine). </para></listitem>
+    </orderedlist>
+    
+    <para>An integrated CAS Processor runs in the same JVM as the CPE. A managed CAS Processor
+      runs in a separate process from the CPE, but still on the same computer. The CPE controls
+      startup, shutdown, and recovery of a managed CAS Processor. A non-managed CAS
+      Processor runs as a service and may be on the same computer as the CPE or on a remote
+      computer. A non-managed CAS Processor <emphasis role="bold-italic">
+      service</emphasis> is started and managed independently from the CPE.</para>
+    
+    <para>For both managed and non-managed CAS Processors, the CAS must be transmitted
+      between separate processes and possibly between separate computers. This is
+      accomplished using <emphasis>Vinci</emphasis>, a communication protocol used by
+      the CPM and which is provided as a part of Apache UIMA. Vinci handles service naming and
+      location and data transport (see <olink targetdoc="&uima_docs_tutorial_guides;"
+        targetptr="ugr.tug.application.how_to_deploy_a_vinci_service"/>&nbsp; for more
+      information). Service naming and location are provided by a <emphasis>Vinci Naming
+      Service</emphasis>, or <emphasis>VNS</emphasis>. For managed CAS Processors, the
+      CPE uses its own internal VNS. For non-managed CAS Processors, a separate VNS must be
+      running.</para> <note><para>The UIMA SDK also supports using unmanaged remote
+    services via the web-standard SOAP communications protocol (see <olink
+      targetdoc="&uima_docs_tutorial_guides;"
+      targetptr="ugr.tug.application.how_to_deploy_as_soap"/>. This approach is
+    based on a proxy implementation, where the proxy is essentially running in an integrated
+    mode. To use this approach with the CPM, use the Integrated mode, with the component being
+    an Aggregate which, in turn, connects to a remote service. </para></note>
+    
+    <para>The CPE Configurator tool currently only supports constructing CPEs that deploy
+      CAS Processors in integrated mode. To deploy CAS Processors in any other mode, the CPE
+      descriptor must be edited by hand (better tooling may be provided later). Details on the
+      CPE descriptor and the required settings for various CAS Processor deployment modes
+      can be found in <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>
+      . In the following sections we merely summarize the various CAS Processor deployment
+      options.</para>
+    
+    <section id="ugr.tug.cpe.managed_deployment">
+      <title>Deploying Managed CAS Processors</title>
+      
+      <para>Managed CAS Processor deployment is shown in <xref
+          linkend="ugr.tug.cpe.fig.managed_deployment"/>. A managed CAS Processor is
+        deployed by the CPE as a Vinci service. The CPE manages the lifecycle of the CAS
+        Processor including service launch, restart on failures, and service shutdown. A
+        managed CAS Processor runs on the same machine as the CPE, but in a separate process.
+        This provides the necessary fault isolation for the CPE to protect it from non-robust
+        CAS Processors. A fatal failure of a managed CAS Processor does not threaten the
+        stability of the CPE.</para>
+      
+      <figure id="ugr.tug.cpe.fig.managed_deployment">
+        <title>CPE with Managed CAS Processors</title>
+        <mediaobject>
+          <imageobject>
+            <imagedata width="3.6in" format="PNG"
+              fileref="&imgroot;image020.png"/>
+          </imageobject>
+          <textobject><phrase>Managed deployment showing separate JVMs and CASes
+            flowing between them</phrase></textobject>
+        </mediaobject>
+      </figure>
+      
+      <para>The CPE communicates with managed CAS Processors using the Vinci communication
+        protocol. A CAS Processor is launched as a Vinci service and its
+        <literal>process()</literal> method is invoked remotely via a Vinci command. The
+        CPE uses its own internal VNS to support managed CAS processors. The VNS, by default,
+        listens on port 9005. If this port is not available, the VNS will increment its listen
+        port until it finds one that is available. All managed CAS Processors are internally
+        configured to <quote>talk</quote> to the CPE managed VNS. This internal VNS is
+        transparent to the end user launching the CPE.</para>
+      
+      <para>To deploy a managed CAS Processor, the CPE deployer must change the CPE
+        descriptor. The following is a section from the CPE descriptor that shows an example
+        configuration specifying a managed CAS Processor.</para>
+      
+      
+      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="local"</emphasis> name="Meeting Detector TAE"&gt;
+  &lt;descriptor&gt;
+    &lt;include href="deploy/vinci/Deploy_MeetingDetectorTAE.xml"/&gt;
+  &lt;/descriptor&gt;
+  &lt;runInSeparateProcess&gt;
+    &lt;exec dir="." executable="java"&gt;
+      &lt;env key="CLASSPATH" 
+         value="src;
+                C:/Program Files/apache/uima/lib/uima-core.jar;
+                C:/Program Files/apache/uima/lib/uima-cpe.jar;
+                C:/Program Files/apache/uima/lib/uima-examples.jar;
+                C:/Program Files/apache/uima/lib/uima-adapter-vinci.jar;
+                C:/Program Files/apache/uima/lib/jVinci.jar"/>
+      &lt;arg&gt;-DLOG=C:/Temp/service.log&lt;/arg&gt;
+      &lt;arg&gt;org.apache.uima.reference_impl.collection.
+         service.vinci.VinciAnalysisEnginerService_impl&lt;/arg&gt;
+      &lt;arg&gt;${descriptor}&lt;/arg&gt;
+    &lt;/exec&gt;
+  &lt;/runInSeparateProcess&gt;
+  &lt;deploymentParameters/&gt;
+  &lt;filter/&gt;
+  &lt;errorHandling&gt;
+    &lt;errorRateThreshold action="terminate" value="1/100"/&gt;
+    &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;
+    &lt;timeout max="100000"/&gt;
+  &lt;/errorHandling&gt;
+  &lt;checkpoint batch="10000"/&gt;
+&lt;/casProcessor&gt;</programlisting>
+      
+      <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
+        details and required settings.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.cpe.deploying_nonmanaged_cas_processors">
+      <title>Deploying Non-managed CAS Processors</title>
+      
+      <para>Non-managed CAS Processor deployment is shown in <xref
+          linkend="ugr.tug.cpe.fig.nonmanaged_cpe"/>. In non-managed mode, the CPE
+        supports connectivity to CAS Processors running on local or remote computers using
+        Vinci. Non-managed processors are different from managed processors in two
+        aspects:
+        
+        <orderedlist><listitem><para>Non-managed processors are neither started nor
+          stopped by the CPE.</para></listitem>
+          
+          <listitem><para>Non-managed processors use an independent VNS, also neither
+            started nor stopped by the CPE. </para></listitem></orderedlist></para>
+      
+      <figure id="ugr.tug.cpe.fig.nonmanaged_cpe">
+        <title>CPE with non-managed CAS Processors</title>
+        <mediaobject>
+          <imageobject>
+            <imagedata width="4.8in" format="PNG"
+              fileref="&imgroot;image023.png"/>
+          </imageobject>
+          <textobject><phrase>Non-managed CPE deployment</phrase></textobject>
+        </mediaobject>
+      </figure>
+      
+      <para>While non-managed CAS Processors provide the same level of fault isolation and
+        robustness as managed CAS Processors, error recovery support for non-managed CAS
+        Processors is much more limited. In particular, the CPE cannot restart a non-managed
+        CAS Processor after an error.</para>
+      
+      <para>Non-managed CAS Processors also require a separate Vinci Naming Service
+        running on the network. This VNS must be manually started and monitored by the end user
+        or application. Instructions for running a VNS can be found in <olink
+          targetdoc="&uima_docs_tutorial_guides;"
+          targetptr="ugr.tug.application.vns.starting"/>.</para>
+      
+      <para>To deploy a non-managed CAS Processor, the CPE deployer must change the CPE
+        descriptor. The following is a section from the CPE descriptor that shows an example
+        configuration for the non-managed CAS Processor.</para>
+      
+      
+      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="remote"</emphasis> name="Meeting Detector TAE"&gt;
+  &lt;descriptor&gt;
+    &lt;include href=
+        "descriptors/vinciService/MeetingDetectorVinciService.xml"/&gt;
+  &lt;/descriptor&gt;
+  &lt;deploymentParameters/&gt;
+  &lt;filter/&gt;
+  &lt;errorHandling&gt;
+    &lt;errorRateThreshold action="terminate" value="1/100"/&gt;
+    &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;
+    &lt;timeout max="100000"/&gt;
+  &lt;/errorHandling&gt;
+  &lt;checkpoint batch="10000"/&gt;
+&lt;/casProcessor&gt;</programlisting>
+      
+      <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
+        details and required settings.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.cpe.integrated_deployment">
+      <title>Deploying Integrated CAS Processors</title>
+      
+      <para>Integrated CAS Processors are shown in <xref
+          linkend="ugr.tug.cpe.fig.integrated_deployment"/>. Here the CAS Processors
+        run in the same JVM as the CPE, just like the Collection Reader and CAS Initializer.
+        This deployment method results in minimal CAS communication and transport overhead
+        as the CAS is shared in the same process space of the JVM. However, a CPE running with all
+        integrated CAS Processors is limited in scalability by the capability of the single
+        computer on which the CPE is running. There is also a stability risk associated with
+        integrated processors because a poorly written CAS Processor can cause the JVM, and
+        hence the entire CPE, to abort.</para>
+      
+      <figure id="ugr.tug.cpe.fig.integrated_deployment">
+        <title>CPE with integrated CAS Processor</title>
+        <mediaobject>
+          <imageobject>
+            <imagedata width="3.2in" format="PNG"
+              fileref="&imgroot;image026.png"/>
+          </imageobject>
+          <textobject><phrase>CPE with integrated CAS Processor</phrase>
+          </textobject>
+        </mediaobject>
+      </figure>
+      
+      <para>The following is a section from a CPE descriptor that shows an example
+        configuration for the integrated CAS Processor.</para>
+      
+      
+      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment=<quote>integrated</quote></emphasis> name=<quote>Meeting Detector TAE</quote>&gt;
+  &lt;descriptor&gt;
+    &lt;include href="descriptors/tutorial/ex4/MeetingDetectorTAE.xml"/&gt;
+  &lt;/descriptor&gt;
+  &lt;deploymentParameters/&gt;
+  &lt;filter/&gt;
+  &lt;errorHandling&gt;
+    &lt;errorRateThreshold action="terminate" value="100/1000"/&gt;
+    &lt;maxConsecutiveRestarts action="terminate" value="30"/&gt;
+    &lt;timeout max="100000"/&gt;
+  &lt;/errorHandling&gt;
+  &lt;checkpoint batch="10000"/&gt;
+&lt;/casProcessor&gt;</programlisting>
+      
+      <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
+        details and required settings.</para>
+      
+    </section>
+  </section>
+  
+  <section id="ugr.tug.cpe.collection_processing_examples">
+    <title>Collection Processing Examples</title>
+    
+    <para>The UIMA SDK includes a set of examples illustrating the three modes of deployment,
+      integrated, managed, and non-managed. These are in the
+      <literal>/examples/descriptors/collection_processing_engine</literal>
+      directory. There are three CPE descriptors that run an example annotator (the Meeting
+      Finder) in these modes.</para>
+    
+    <para>To run either the integrated or managed examples, use the
+      <literal>runCPE</literal> script in the /bin directory of the UIMA installation,
+      passing the appropriate CPE descriptor as an argument, or
+      if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your
+    workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the 
+    launch configuration <quote>UIMA Run CPE</quote>.</para> 
+    
+    <note><para>The <literal>runCPE</literal> script <emphasis role="bold-italic"> must</emphasis> 
+    be run from the <literal>%UIMA_HOME%\examples</literal> directory, because the example
+    CPE descriptors use relative path names that are resolved relative to this working directory. 
+    For instance,
+   
+    <literallayout>runCPE
+descriptors\collection_processing_engine\MeetingFinderCPE_Integrated.xml</literallayout></para>
+    </note>
+    
+    <!--
+    <para>If you installed the examples into Eclipse, you can run directly from Eclipse by
+      creating a run configuration. To do this, highlight the SimpleRunCPE.java source file
+      in the examples src/org/apache/uima/examples/cpe directory, and then</para>
+    
+    <orderedlist><listitem><para>pick the menu Run &rarr; Run...</para></listitem>
+      
+      <listitem><para>click <quote>Java Application</quote> and press
+        <quote>New</quote></para></listitem>
+      
+      <listitem><para>click on the Arguments panel, and insert a path to the appropriate CPE
+        descriptor in the <quote>Program Arguments</quote> box by typing, for instance:
+        <literal>descriptors/collection_processing_engine/
+          MeetingFinderCPE_Integrated.xml</literal>
+        </para></listitem>
+      
+      <listitem><para>Then press <quote>Run</quote> </para></listitem>
+    </orderedlist>
+    -->
+    
+    <para>To run the non-managed example, there are some additional steps.
+      
+      <orderedlist><listitem><para>Start a VNS service by running the
+        <literal>startVNS</literal> script in the <literal>/bin</literal>
+        directory, or using the Eclipse launcher <quote>UIMA Start VNS</quote>.</para></listitem>
+        
+        <listitem><para>Deploy the Meeting Detector Analysis Engine as a Vinci service, by
+          running the <literal>startVinciService</literal> script in the
+          <literal>/bin</literal> directory or using the Eclipse launcher for this, and passing it the location of the
+          descriptor to deploy, in this case
+          <literal>%UIMA_HOME%/examples/deploy/vinci/Deploy_MeetingDetectorTAE.xml</literal>,
+          or
+      if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your
+    workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the 
+    launch configuration <quote>UIMA Start Vinci Service</quote>.
+          </para></listitem>
+        
+        <listitem><para>Now, run the runCPE script (or if in Eclipse, run the 
+          launch configuration <quote>UIMA Run CPE</quote>), passing it the CPE for the non-managed
+          version
+          <literal>(%UIMA_HOME%/examples/descriptors/collection_processing_engine/
+            MeetingFinderCPE_NonManaged.xml</literal>
+          ). </para></listitem></orderedlist></para>
+    
+    <para>This assumes that the Vinci Naming Service, the runCPE application, and the
+      <literal>MeetingDetectorTAE</literal> service are all running on the same machine.
+      Most of the scripts that need information about VNS will look for values to use in
+      environment variables VNS_HOST and VNS_PORT; these default to
+      <quote>localhost</quote> and <quote>9000</quote>. You may set these to appropriate
+      values before running the scripts, as needed; you can also pass the name of the VNS host as
+      the second argument to the startVinciService script.</para>
+    
+    <para>Alternatively, you can edit the scripts and/or the XML files to specify
+      alternatives for the VNS_HOST and VNS_PORT. For instance, if the
+      <literal>runCPE</literal> application is running on a different machine from the
+      Vinci Naming Service, you can edit the
+      <literal>MeetingFinderCPE_NonManaged.xml</literal> and change the vnsHost
+      parameter:
+      <literal>&lt;parameter name="vnsHost"  value="localhost" type="string"/&gt;</literal>
+      to specify the VNS host instead of <quote>localhost</quote>.</para>
+  </section>
+  
+</chapter>
+