You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2008/08/28 23:28:16 UTC

svn commit: r689997 [27/32] - in /incubator/uima/uimaj/trunk/uima-docbooks: ./ src/ src/docbook/overview_and_setup/ src/docbook/references/ src/docbook/tools/ src/docbook/tutorials_and_users_guides/ src/docbook/uima/organization/ src/olink/references/

Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.cpe.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.cpe.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.cpe.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.cpe.xml Thu Aug 28 14:28:14 2008
@@ -1,1333 +1,1333 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY imgroot "../images/tutorials_and_users_guides/tug.cpe/">
-<!ENTITY % uimaents SYSTEM "../entities.ent">  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.tug.cpe">
-  <title>Collection Processing Engine Developer&apos;s Guide</title>
-  <titleabbrev>CPE Developer&apos;s Guide</titleabbrev>
-  
-  <para>The UIMA Analysis Engine interface provides support for developing and integrating
-    algorithms that analyze unstructured data. Analysis Engines are designed to operate on a
-    per-document basis. Their interface handles one CAS at a time. UIMA provides additional
-    support for applying analysis engines to collections of unstructured data with its
-    <emphasis>Collection Processing Architecture</emphasis>. The Collection
-    Processing Architecture defines additional components for reading raw data formats
-    from data collections, preparing the data for processing by Analysis Engines, executing
-    the analysis, extracting analysis results, and deploying the overall flow in a variety of
-    local and distributed configurations.</para>
-  
-  <para>The functionality defined in the Collection Processing Architecture is
-    implemented by a <emphasis>Collection Processing Engine</emphasis> (CPE). A CPE
-    includes an Analysis Engine and adds a <emphasis>Collection Reader</emphasis>, a
-    <emphasis>CAS Initializer</emphasis> (deprecated as of version 2), and <emphasis>CAS
-    Consumers</emphasis>. The part of the UIMA Framework that supports the execution of
-    CPEs is called the Collection Processing Manager, or CPM.</para>
-  
-  <para>A Collection Reader provides the interface to the raw input data and knows how to
-    iterate over the data collection. Collection Readers are discussed in <xref
-      linkend="ugr.tug.cpe.collection_reader.developing"/>. The CAS Initializer
-    <footnote><para>CAS Initializers are deprecated in favor of a more general mechanism,
-    multiple subjects of analysis.</para></footnote> prepares an individual data item for
-    analysis and loads it into the CAS. CAS Initializers are discussed in <xref
-      linkend="ugr.tug.cpe.cas_initializer.developing"/> A CAS Consumer extracts
-    analysis results from the CAS and may also perform <emphasis>collection level
-    processing</emphasis>, or analysis over a collection of CASes. CAS Consumers are
-    discussed in <xref linkend="ugr.tug.cpe.cas_consumer.developing"/>.</para>
-  
-  <para>Analysis Engines and CAS Consumers are both instances of <emphasis>CAS
-    Processors</emphasis>. A Collection Processing Engine (CPE) may contain multiple CAS
-    Processors. An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate
-    (composed of other Analysis Engines). Aggregates may contain Cas Consumers. While
-    Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS
-    Processor may be deployed in a variety of local and distributed modes, providing a number
-    of options for scalability and robustness. The different deployment options are covered
-    in detail in <xref linkend="ugr.tug.cpe.deployment_alternatives"/>.</para>
-  
-  <para>Each of the components in a CPE has an interface specified by the UIMA Collection
-    Processing Architecture and is described by a declarative XML descriptor file.
-    Similarly, the CPE itself has a well defined component interface and is described by a
-    declarative XML descriptor file.</para>
-  
-  <para>A user creates a CPE by assembling the components mentioned above. The UIMA SDK
-    provides a graphical tool, called the CPE Configurator, for assisting in the assembly of
-    CPEs. Use of this tool is summarized in <xref
-      linkend="ugr.tug.cpe.cpe_configurator"/>, and more details can be found in <olink
-      targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.
-    Alternatively, a CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
-    descriptor, including its syntax and content, can be found in the <olink
-      targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. The individual
-    components have associated XML descriptors, each of which can be created and / or edited
-    using the <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde">
-    Component Description Editor</olink>.</para>
-  
-  <para>A CPE is executed by a UIMA infrastructure component called the
-    <emphasis>Collection Processing Manager</emphasis> (CPM). The CPM provides a number
-    of services and deployment options that cover instantiation and execution of CPEs, error
-    recovery, and local and distributed deployment of the CPE components.</para>
-  
-  <section id="ugr.tug.cpe.concepts">
-    <title>CPE Concepts</title>
-    
-    <para> <xref linkend="ugr.tug.cpe.fig.cpe_components"/> illustrates the data flow
-      that occurs between the different types of components that make up a CPE.</para>
-    
-    <figure id="ugr.tug.cpe.fig.cpe_components">
-      <title>CPE Components</title>
-      <mediaobject>
-        <imageobject>
-          <imagedata width="5.7in" format="PNG"
-            fileref="&imgroot;image002.png"/>
-        </imageobject>
-        <textobject><phrase>CPE Components and flow between them</phrase>
-        </textobject>
-      </mediaobject>
-    </figure>
-    
-    <para>The components of a CPE are:</para>
-    
-    <itemizedlist><listitem><para><emphasis>Collection Reader &ndash;</emphasis>
-      interfaces to a collection of data items (e.g., documents) to be analyzed. Collection
-      Readers return CASes that contain the documents to analyze, possibly along with
-      additional metadata.</para></listitem>
-      
-      <listitem><para><emphasis>Analysis Engine &ndash;</emphasis> takes a CAS,
-        analyzes its contents, and produces an enriched CAS. Analysis Engines can be
-        recursively composed of other Analysis Engines (called an
-        <emphasis>Aggregate</emphasis> Analysis Engine). Aggregates may also contain
-        CAS Consumers.</para></listitem>
-      
-      <listitem><para><emphasis>CAS Consumer &ndash;</emphasis> consume the enriched
-        CAS that was produced by the sequence of Analysis Engines before it, and produce an
-        application-specific data structure, such as a search engine index or database.
-        </para></listitem></itemizedlist>
-    
-    <para>A fourth type of component, the <emphasis>CAS Initializer,</emphasis> may be
-      used by a Collection Reader to populate a CAS from a document. However, as of UIMA
-      version 2 CAS Initializers are now deprecated in favor of a more general mechsanism,
-      multiple Subjects of Analysis.</para>
-    
-    <para>The Collection Processing Manager orchestrates the data flow
-      within a CPE, monitors status, optionally manages the life-cycle of internal
-      components and collects statistics.</para>
-    
-    <para>CASes are not saved in a persistent way by the framework. If you want to save CASes,
-      then you have to save each CAS as it comes through (for example) using a CAS Consumer you
-      write to do this, in whatever format you like. The UIMA SDK supplies an example CAS
-      Consumer to save CASes to XML files, either in the standard XMI format or in an older
-      format called XCAS.  It also supplies an example CAS Consumer to extract information from CASes and
-      store the results into a relational Database, using Java&apos;s JDBC APIs.</para>
-    
-  </section>
-  
-  <section id="ugr.tug.cpe.configurator_and_viewer">
-    <title>CPE Configurator and CAS viewer</title>
-    
-    <section id="ugr.tug.cpe.cpe_configurator">
-      <title>Using the CPE Configurator</title>
-      
-      <para>A CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
-        descriptor, including its syntax and content, can be found in <olink
-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. Rather than
-        edit raw XML, you may develop a CPE Descriptor using the CPE Configurator tool. The CPE
-        Configurator tool is described briefly in this section, and in more detail in <olink
-          targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</para>
-      
-      <para>The CPE Configurator tool can be run from Eclipse (see <xref
-          linkend="ugr.tug.cpe.running_cpe_configurator_from_eclipse"/>, or using
-        the <literal>cpeGui</literal> shell script (<literal>cpeGui.bat</literal> on
-        Windows, <literal>cpeGui.sh</literal> on Unix), which is located in the
-        <literal>bin</literal> directory of the UIMA SDK installation. Executing this
-        batch file will display the window shown here:
-        
-        
-        <screenshot>
-          <mediaobject>
-            <imageobject>
-              <imagedata width="5.7in" format="JPG" fileref="&imgroot;image004.jpg"/>
-            </imageobject>
-            <textobject><phrase>Screenshot of CPE GUI</phrase></textobject>
-          </mediaobject>
-        </screenshot>
-        </para>
-      
-      <para>The window is divided into three sections, one each for the Collection Reader, 
-        Analysis Engines, and CAS Consumers.<footnote><para>There is also a fourth pane,
-        for the CAS Initializer, but it is hidden by default.  To enable it click the
-        <literal>View &rarr; CAS Initializer Panel</literal> menu item.</para></footnote> 
-        In each section, you select the component(s) you want to include in the CPE by 
-        browsing to their XML descriptors. The configuration parameters present in the XML 
-        descriptors will then be displayed in the GUI; these can be modified to override
-        the values present in the descriptor. For example, the screen shot below shows the 
-        CPE Configurator after the following components have been chosen:
-        
-        
-        <programlisting>Collection Reader: 
-   %UIMA_HOME%/examples/descriptors/collection_reader/
-          FileSystemCollectionReader.xml
-
-Analysis Engine: 
-   %UIMA_HOME%/examples/descriptors/analysis_engine/
-          NamesAndPersonTitles_TAE.xml
-
-CAS Consumer: 
-    %UIMA_HOME%/examples/descriptors/cas_consumer/
-          XmiWriterCasConsumer.xml</programlisting></para>
-      
-      
-      <screenshot>
-     <mediaobject>
-      <imageobject>
-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>
-      </imageobject>
-      <textobject><phrase>Screenshot of CPE GUI after fields filled in</phrase></textobject>
-    </mediaobject>
-    </screenshot>
-      
-      <para>For the File System Collection Reader, ensure that the Input Directory is set to
-        <literal>%UIMA_HOME%\examples\data</literal><footnote><para>Replace
-        <literal>%UIMA_HOME%</literal> with the path to where you installed UIMA.</para>
-        </footnote>. The other parameters may be left blank. For the External CAS Writer CAS
-        Consumer, ensure that the Output Directory is set to
-        <literal>%UIMA_HOME%\examples\data\processed</literal>.</para>
-      
-      <para>After selecting each of the components and providing configuration settings,
-        click the play (forward arrow) button at the bottom of the screen to begin processing.
-        A progress bar should be displayed in the lower left corner. (Note that the progress
-        bar will not begin to move until all components have completed their initialization,
-        which may take several seconds.) Once processing has begun, the pause and stop
-        buttons become enabled.</para>
-      
-      <para>If an error occurs, you will be informed by an error dialog. If processing
-        completes successfully, you will be presented with a performance report.</para>
-      
-      <para>Using the File menu, you can select <literal>Save CPE Descriptor </literal>to
-        create an .xml descriptor file that defines the CPE you have constructed. Later, you
-        can use <literal>Open CPE Descriptor</literal> to restore the CPE Configurator to
-        the saved state. Also, CPE descriptors can be used to run a CPE from a Java program
-        &ndash; see section <xref
-          linkend="ugr.tug.cpe.running_cpe_from_application"/>. CPE Descriptors
-        allow specifying operational parameters, such as error handling options, that are
-        not currently available for configuration through the CPE Configurator. For more
-        information on manually creating a CPE Descriptor, see the <olink
-          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>
-            
-      <para>The CPE configured above runs a simple name and title annotator on the sample data
-        provided with the UIMA SDK and stores the results using the XMI Writer CAS Consumer. To
-        view the results, start the External CAS Annotation Viewer by running the
-        <literal>annotationViewer</literal> batch file
-        (<literal>annotationViewer.bat</literal> on Windows,
-        <literal>annotationViewer.sh</literal> on Unix), which is located in the
-        <literal>bin</literal> directory of the UIMA SDK installation. Executing this
-        batch file will display the window shown here:
-        
-        
-        <screenshot>
-    <mediaobject>
-      <imageobject>
-        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image008.jpg"/>
-      </imageobject>
-      <textobject><phrase>Screenshot of Annotation Viewer results</phrase></textobject>
-    </mediaobject>
-  </screenshot>
-        </para>
-      
-      <para>Ensure that the Input Directory is the same as the Output Directory specified for
-        the XMI Writer CAS Consumer in the CPE configured above (e.g.,
-        <literal>%UIMA_HOME%\examples\data\processed</literal>) and that the TAE
-        Descriptor File is set to the Analysis Engine used in the CPE configured above (e.g.,
-        <literal>examples\descriptors\analysis_engine\NamesAndPersonTitles_TAE.xml</literal>
-        ).</para>
-      
-      <para>Click the View button to display the Analyzed Documents window:
-        
-        
-        <screenshot>
-    <mediaobject>
-      <imageobject>
-        <imagedata width="3.5in" format="JPG" fileref="&imgroot;image010.jpg"/>
-      </imageobject>
-      <textobject><phrase>Screenshot of CPE Configurator Analyzed Documents</phrase></textobject>
-    </mediaobject>
-  </screenshot>
-        </para>
-      
-      <para>Double click on any document in the list to view the analyzed document. Double
-        clicking the first document, IBM_LifeSciences.txt, will bring up the following
-        window:
-        
-        
-        <screenshot>
-    <mediaobject>
-      <imageobject>
-        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image012.jpg"/>
-      </imageobject>
-      <textobject><phrase>Screenshot of Document and Annotation Viewer</phrase></textobject>
-    </mediaobject>
-  </screenshot>
-        </para>
-      
-      <para>This window shows the analysis results for the document. Clicking on any
-        highlighted annotation causes the details for that annotation to be displayed in the
-        right-hand pane. Here the annotation spanning <quote>John M. Thompson</quote> has
-        been clicked.</para>
-      
-      <para>Congratulations! You have successfully configured a CPE, saved its
-        descriptor, run the CPE, and viewed the analysis results.</para>
-    </section>
-    
-    <section id="ugr.tug.cpe.running_cpe_configurator_from_eclipse">
-      <title>Running the CPE Configurator from Eclipse</title>
-      
-      <para>If you have followed the instructions in <olink
-          targetdoc="&uima_docs_overview;"
-          targetptr="ugr.ovv.eclipse_setup"/> and imported the example Eclipse
-        project, then you should already have a Run configuration for the CPE Configurator
-        tool (called <literal>UIMA CPE GUI</literal>) configured to run in the example
-        project. Simply run that configuration to start the CPE Configurator.</para>
-      
-      <para>If you haven&apos;t followed the Eclipse setup instructions and wish to run the
-        CPE Configurator tool from Eclipse, you will need to do the following. As installed,
-        this Eclipse launch configuration is associated with the
-        <quote>uimaj-examples</quote> project. If you&apos;ve not already done so, you
-        may wish to import that project into your Eclipse workspace. It&apos;s located in
-        %UIMA_HOME%/docs/examples. Doing this will supply the Eclipse launcher with all
-        the class files it needs to run the CPE configurator. If you don&apos;t do this, please
-        manually add the JAR files for UIMA to the launch configuration.</para>
-      <para>Also, you need to add any projects or JAR files for any UIMA components you will be
-        running to the launch class path.</para> <note><para>A simpler alternative may be
-      to change the CPE launch configuration to be based on your project. If you do that, it will
-      pick up all the files in your project&apos;s class path, which you should set up to
-      include all the UIMA framework files. An easy way to do this is to specify in your
-      project&apos;s properties&apos; build-path that the uimaj-examples project is on
-      the build path, because the uimaj-examples project is set up to include all the UIMA
-      framework classes in its classpath already. </para></note>
-      
-      <para>Next, in the Eclipse menu select <literal>Run &rarr;
-        Run</literal>..., which brings up the Run configuration screen.</para>
-      
-      <para>In the Main tab, set the main class to
-        <literal>org.apache.uima.tools.cpm.CpmFrame</literal></para>
-      
-      <para>In the arguments tab, add the following to the VM arguments:
-        
-        
-        <programlisting>-Xms128M -Xmx256M 
--Duima.home="C:\Program Files\Apache\uima"</programlisting>
-        (or wherever you installed the UIMA SDK)</para>
-      
-      <para>Click the Run button to launch the CPE Configurator, and use it as previously
-        described in this section.</para>
-      
-    </section>
-  </section>
-  
-  <section id="ugr.tug.cpe.running_cpe_from_application">
-    <title>Running a CPE from Your Own Java Application</title>
-    
-    <para>The simplest way to run a CPE from a Java application is to first create a CPE
-      descriptor as described in the previous section. Then the CPE can be instantiated and
-      run using the following code:
-      
-      
-      <programlisting>      //parse CPE descriptor in file specified on command line
-CpeDescription cpeDesc = UIMAFramework.getXMLParser().
-        parseCpeDescription(new XMLInputSource(args[0]));
-      
-      //instantiate CPE
-mCPE = UIMAFramework.produceCollectionProcessingEngine(cpeDesc);
-
-      //Create and register a Status Callback Listener
-mCPE.addStatusCallbackListener(new StatusCallbackListenerImpl());
-
-      //Start Processing
-mCPE.process();</programlisting></para>
-    
-    <para>This will start the CPE running in a separate thread.</para>
-    
-    <note><para>The <literal>process()</literal> method for a CPE can only be called once.  If you 
-    need to call it again, you have to instantiate a new CPE, and call that new CPE's process
-    method.</para></note>
-    
-    <section id="ugr.tug.cpe.using_listeners">
-      <title>Using Listeners</title>
-      
-      <para>Updates of the CPM&apos;s progress, including any errors that occur, are sent to
-        the callback handler that is registered by the call to
-        <literal>addStatusCallbackListener</literal>, above. The callback handler is a
-        class that implements the CPM&apos;s
-        <literal>StatusCallbackListener</literal> interface. It responds to events by
-        printing messages to the console. The source code is fairly straightforward and is
-        not included in this chapter &ndash; see the
-        <literal>org.apache.uima.examples.cpe.SimpleRunCPE.java</literal> in the
-        <literal>%UIMA_HOME%\examples\src</literal> directory for the complete
-        code.</para>
-      
-      <para>If you need more control over the information in the CPE descriptor, you can
-        manually configure it via its API. See the Javadocs for package
-        <literal>org.apache.uima.collection</literal> for more details.</para>
-      
-    </section>
-  </section>
-  
-  <section id="ugr.tug.cpe.developing_collection_processing_components">
-    <title>Developing Collection Processing Components</title>
-    
-    <para>This section is an introduction to the process of developing Collection Readers,
-      CAS Initializers, and CAS Consumers. The code snippets refer to the classes that can be
-      found in <literal>%UIMA_HOME%\examples\src </literal>example project.</para>
-    
-    <para>In the following sections, classes you write to represent components need to be
-      public and have public, 0-argument constructors, so that they can be instantiated by
-      the framework. (Although Java classes in which you do not define any constructor will,
-      by default, have a 0-argument constructor that doesn&apos;t do anything, a class in
-      which you have defined at least one constructor does not get a default 0-argument
-      constructor.)</para>
-    
-    <section id="ugr.tug.cpe.collection_reader.developing">
-      <title>Developing Collection Readers</title>
-      
-      <para>A Collection Reader is responsible for obtaining documents from the collection
-        and returning each document as a CAS. Like all UIMA components, a Collection Reader
-        consists of two parts &mdash; the code and an XML descriptor.</para>
-      
-      <para>A simple example of a Collection Reader is the <quote>File System Collection
-        Reader,</quote> which simply reads documents from files in a specified directory.
-        The Java code is in the class
-        <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>
-        and the XML descriptor is
-        <literal>%UIMA_HOME%/examples/src/main/descriptors/collection_reader/
-          FileSystemCollectionReader.xml</literal>.</para>
-      
-      <section id="ugr.tug.cpe.collection_reader.java_class">
-        <title>Java Class for the Collection Reader</title>
-        
-        <para>The Java class for a Collection Reader must implement the
-          <literal>org.apache.uima.collection.CollectionReader</literal>
-          interface. You may build your Collection Reader from scratch and implement this
-          interface, or you may extend the convenience base class
-          <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>
-          .</para>
-        
-        <para>The convenience base class provides default implementations for many of the
-          methods defined in the <literal>CollectionReader</literal> interface, and
-          provides abstract definitions for those methods that you are required to
-          implement in your new Collection Reader. Note that if you extend this base class,
-          you do not need to declare that your new Collection Reader implements the
-          <literal>CollectionReader</literal> interface.</para> <tip><para>Eclipse
-        tip &ndash; if you are using Eclipse, you can quickly create the boiler plate code and
-        stubs for all of the required methods by clicking <literal>File</literal>
-        &rarr; <literal>New</literal> &rarr; <literal>Class</literal> to bring up the <quote>New Java Class</quote>
-        dialogue, specifying
-        <literal>org.apache.uima.collection.CollectionReader_ImplBase</literal>
-        as the Superclass, and checking <quote>Inherited abstract methods</quote> in the
-        section <quote>Which method stubs would you like to create?</quote>, as in the 
-        screenshot below:</para></tip>     
-        
-        <screenshot>
-    <mediaobject>
-      <imageobject>
-        <imagedata width="4.4in" format="JPG" fileref="&imgroot;image014.jpg"/>
-      </imageobject>
-      <textobject><phrase>Screenshot showing Eclipse new class wizard</phrase></textobject>
-    </mediaobject>
-  </screenshot>
-        
-        <para>For the rest of this section we will assume that your new Collection Reader
-          extends the <literal>CollectionReader_ImplBase</literal> class, and we will
-          show examples from the
-          <literal>org.apache.uima.examples.cpe.FileSystemCollectionReader</literal>
-          . If you must inherit from a different superclass, you must ensure that your
-          Collection Reader implements the <literal>CollectionReader</literal>
-          interface &ndash; see the Javadocs for <literal>CollectionReader</literal>
-          for more details.</para>
-      </section>
-      
-      <section id="ugr.tug.cpe.collection_reader.required_methods">
-        <title>Required Methods in the Collection Reader class</title>
-        
-        
-        <para>The following abstract methods must be implemented:</para>
-        
-        <section id="ugr.tug.cpe.collection_reader.required_methods.initialize">
-          <title>initialize()</title>
-          
-          <para>The <literal>initialize()</literal> method is called by the framework
-            when the Collection Reader is first created.
-            <literal>CollectionReader_ImplBase</literal> actually provides a default
-            implementation of this method (i.e., it is not abstract), so you are not strictly
-            required to implement this method. However, a typical Collection Reader will
-            implement this method to obtain parameter values and perform various
-            initialization steps.</para>
-          
-          <para>In this method, the Collection Reader class can access the values of its
-            configuration parameters and perform other initialization logic. The example
-            File System Collection Reader reads its configuration parameters and then
-            builds a list of files in the specified input directory, as follows:</para>
-          
-          
-          <programlisting>public void initialize() throws ResourceInitializationException {
-  File directory = new File(
-            (String)getConfigParameterValue(PARAM_INPUTDIR));
-  mEncoding = (String)getConfigParameterValue(PARAM_ENCODING);
-  mDocumentTextXmlTagName = (String)getConfigParameterValue(PARAM_XMLTAG);
-  mLanguage = (String)getConfigParameterValue(PARAM_LANGUAGE);
-  mCurrentIndex = 0; 
-  
-  //get list of files (not subdirectories) in the specified directory
-  mFiles = new ArrayList();
-  File[] files = directory.listFiles();
-  for (int i = 0; i &lt; files.length; i++) {
-    if (!files[i].isDirectory()) {
-      mFiles.add(files[i]);  
-    }
-  }
-}</programlisting>
-          <note><para>This is the zero-argument version of the initialize method. There is
-          also a method on the Collection Reader interface called
-          <literal>initialize(ResourceSpecifier, Map)</literal> but it is not
-          recommended that you override this method in your code. That method performs
-          internal initialization steps and then calls the zero-argument
-          <literal>initialize()</literal>. </para></note>
-          
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.hasnext">
-          <title>hasNext()</title>
-          
-          <para>The <literal>hasNext()</literal> method returns whether or not there are
-            any documents remaining to be read from the collection. The File System
-            Collection Reader&apos;s <literal>hasNext()</literal> method is very
-            simple. It just checks if there are any more files left to be read:
-            
-            
-            <programlisting>public boolean hasNext() {
-  return mCurrentIndex &lt; mFiles.size();
-}</programlisting>
-            </para>
-          
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.required_methods.getnext">
-          <title>getNext(CAS)</title>
-          
-          <para>The <literal>getNext()</literal> method reads the next document from the
-            collection and populates a CAS. In the simple case, this amounts to reading the
-            file and calling the CAS&apos;s <literal>setDocumentText</literal> method.
-            The example File System Collection Reader is slightly more complex. It first
-            checks for a CAS Initializer. If the CPE includes a CAS Initializer, the CAS
-            Initializer is used to read the document, and
-            <literal>initialize()</literal> the CAS. If the CPE does not include a CAS
-            Initializer, the File System Collection Reader reads the document and sets the
-            document text in the CAS.</para>
-          
-          <para>The File System Collection Reader also stores additional metadata about
-            the document in the CAS. In particular, it sets the document&apos;s language in
-            the special built-in feature structure
-            <literal>uima.tcas.DocumentAnnotation </literal>(see <olink
-              targetdoc="&uima_docs_ref;"
-              targetptr="ugr.ref.cas.document_annotation"/> for details about this
-            built-in type) and creates an instance of
-            <literal>org.apache.uima.examples.SourceDocumentInformation</literal>
-            , which stores information about the document&apos;s source location. This
-            information may be useful to downstream components such as CAS Consumers. Note
-            that the type system descriptor for this type can be found in
-            <literal>org.apache.uima.examples.SourceDocumentInformation.xml</literal>
-            , which is located in the <literal>examples/src</literal> directory.</para>
-          
-          <para>The getNext() method for the File System Collection Reader looks like
-            this:</para>
-          
-          
-          <programlisting>  public void getNext(CAS aCAS) throws IOException, CollectionException {
-    JCas jcas;
-    try {
-      jcas = aCAS.getJCas();
-    } catch (CASException e) {
-      throw new CollectionException(e);
-    }
-
-    // open input stream to file
-    File file = (File) mFiles.get(mCurrentIndex++);
-    BufferedInputStream fis = 
-            new BufferedInputStream(new FileInputStream(file));
-    try {
-      byte[] contents = new byte[(int) file.length()];
-      fis.read(contents);
-      String text;
-      if (mEncoding != null) {
-        text = new String(contents, mEncoding);
-      } else {
-        text = new String(contents);
-      }
-      // put document in CAS
-      jcas.setDocumentText(text);
-    } finally {
-      if (fis != null)
-        fis.close();
-    }
-
-    // set language if it was explicitly specified 
-    //as a configuration parameter
-    if (mLanguage != null) {
-      ((DocumentAnnotation) jcas.getDocumentAnnotationFs()).
-            setLanguage(mLanguage);
-    }
-
-    // Also store location of source document in CAS. 
-    // This information is critical if CAS Consumers will 
-    // need to know where the original document contents 
-    // are located.
-    // For example, the Semantic Search CAS Indexer 
-    // writes this information into the search index that 
-    // it creates, which allows applications that use the 
-    // search index to locate the documents that satisfy 
-    //their semantic queries.
-    SourceDocumentInformation srcDocInfo = 
-            new SourceDocumentInformation(jcas);
-    srcDocInfo.setUri(
-            file.getAbsoluteFile().toURL().toString());
-    srcDocInfo.setOffsetInSource(0);
-    srcDocInfo.setDocumentSize((int) file.length());
-    srcDocInfo.setLastSegment(
-            mCurrentIndex == mFiles.size());
-    srcDocInfo.addToIndexes();
-  }</programlisting>
-          
-          <para>The Collection Reader can create additional annotations in the CAS at this
-            point, in the same way that annotators create annotations.</para>
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.required_methods.getprogress">
-          <title>getProgress()</title>
-          <para>The Collection Reader is responsible for returning progress information;
-            that is, how much of the collection has been read thus far and how much remains to be
-            read. The framework defines progress very generally; the Collection Reader
-            simply returns an array of <literal>Progress</literal> objects, where each
-            object contains three fields &mdash; the amount already completed, the total
-            amount (if known), and a unit (e.g. entities (documents), bytes, or files). The
-            method returns an array so that the Collection Reader can report progress in
-            multiple different units, if that information is available. The File System
-            Collection Reader&apos;s <literal>getProgress()</literal> method looks
-            like this:
-            
-            
-            <programlisting>public Progress[] getProgress() {
-  return new Progress[]{
-     new ProgressImpl(mCurrentIndex,mFiles.size(),Progress.ENTITIES)};
-}</programlisting></para>
-          
-          <para>In this particular example, the total number of files in the collection is
-            known, but the total size of the collection is not known. As such, a
-            <literal>ProgressImpl</literal> object for
-            <literal>Progress.ENTITIES</literal> is returned, but a
-            <literal>ProgressImpl</literal> object for
-            <literal>Progress.BYTES</literal> is not.</para>
-          
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.required_methods.close">
-          <title>close()</title>
-          
-          <para>The close method is called when the Collection Reader is no longer needed.
-            The Collection Reader should then release any resources it may be holding. The
-            FileSystemCollectionReader does not hold resources and so has an empty
-            implementation of this method:</para>
-          
-          
-          <programlisting>public void close() throws IOException { }</programlisting>
-          
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.optional_methods">
-          <title>Optional Methods</title>
-          
-          <para>The following methods may be implemented:</para>
-          
-          <section id="ugr.tug.cpe.collection_reader.optional_methods.reconfigure">
-            <title>reconfigure()</title>
-            <para>This method is called if the Collection Reader&apos;s configuration
-              parameters change.</para>
-          </section>
-          
-          <section id="ugr.tug.cpe.collection_reader.optional_methods.typesysteminit">
-            <title>typeSystemInit()</title>
-            
-            <para>If you are only setting the document text in the CAS, or if you are using the
-              JCas (recommended, as in the current example, you do not have to implement this
-              method. If you are directly using the CAS API, this method is used in the same way
-              as it is used for an annotator &ndash; see <olink
-                targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aae.contract_for_annotator_methods"/>
-              for more information.</para>
-          </section>
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.threading">
-          <title>Threading considerations</title>
-          
-          <para>Collection readers do not have to be thread safe; they are run with a single
-            thread per instance, and only one instance per instance of the Collection
-            Processing Manager (CPM) is made.</para>
-          
-        </section>
-        
-        <section id="ugr.tug.cpe.collection_reader.descriptor">
-          <title>XML Descriptor for a Collection Reader</title>
-          
-          <para>You can use the Component Description Editor to create and / or edit the File
-            System Collection Reader&apos;s descriptor. Here is its descriptor
-            (abbreviated somewhat), which is very similar to an Analysis
-            Engine descriptor:</para>
-          
-          
-          <programlisting><?db-font-size 80% ?><![CDATA[<collectionReaderDescription 
-          xmlns="http://uima.apache.org/resourceSpecifier">
-  <frameworkImplementation>org.apache.uima.java</frameworkImplementation>
-  <implementationName>
-    org.apache.uima.examples.cpe.FileSystemCollectionReader
-  </implementationName>
-  <processingResourceMetaData>
-    <name>File System Collection Reader</name>
-    <description>Reads files from the filesystem.</description>
-    <version>1.0</version>
-    <vendor>The Apache Software Foundation</vendor>
-    <configurationParameters>
-      <configurationParameter>
-        <name>InputDirectory</name>
-        <description>Directory containing input files</description>
-        <type>String</type>
-        <multiValued>false</multiValued>
-        <mandatory>true</mandatory>
-      </configurationParameter>
-      <configurationParameter>
-        <name>Encoding</name>
-        <description>Character encoding for the documents.</description>
-        <type>String</type>
-        <multiValued>false</multiValued>
-        <mandatory>false</mandatory>
-      </configurationParameter>
-      <configurationParameter>
-        <name>Language</name>
-        <description>ISO language code for the documents</description>
-        <type>String</type>
-        <multiValued>false</multiValued>
-        <mandatory>false</mandatory>
-      </configurationParameter>
-    </configurationParameters>
-    <configurationParameterSettings>
-      <nameValuePair>
-        <name>InputDirectory</name>
-        <value>
-          <string>C:/Program Files/apache/uima/examples/data</string>
-        </value>
-      </nameValuePair>
-    </configurationParameterSettings>
-    
-    <!-- Type System of CASes returned by this Collection Reader -->
-    
-    <typeSystemDescription>
-      <imports>
-        <import name="org.apache.uima.examples.SourceDocumentInformation"/>
-      </imports>
-    </typeSystemDescription>
-    
-    <capabilities>
-      <capability>
-        <inputs/>
-        <outputs>
-          <type allAnnotatorFeatures="true">
-            org.apache.uima.examples.SourceDocumentInformation
-          </type>
-        </outputs>
-      </capability>
-    </capabilities>
-    <operationalProperties>
-      <modifiesCas>true</modifiesCas>
-      <multipleDeploymentAllowed>false</multipleDeploymentAllowed>
-      <outputsNewCASes>true</outputsNewCASes>
-    </operationalProperties>
-  </processingResourceMetaData>
-</collectionReaderDescription>]]></programlisting>
-          
-        </section>
-      </section>
-    </section>
-    
-    <section id="ugr.tug.cpe.cas_initializer.developing"><title>Developing CAS
-      Initializers</title> <note><para>CAS Initializers are now deprecated (as of
-      version 2.1). For complex initialization, please use instead the capabilities of
-      creating additional Subjects of Analysis (see <olink
-        targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>
-      ). </para></note>
-      
-      <para>In UIMA 1.x, the CAS Initializer component was intended to be used as a plug-in
-        to the Collection Reader for when the task of populating the CAS from a raw document is
-        complex and might be reusable with other data collections.</para>
-          
-      <para>A CAS Initializer Java class must implement the interface
-        <literal>org.apache.uima.collection.CasInitializer</literal>, and will also
-        generally extend from the convenience base class
-        <literal>org.apache.uima.collection.CasInitializer_ImplBase</literal>. A
-        CAS Initializer also must have an XML descriptor, which has the exact same form as a
-        Collection Reader Descriptor except that the outer tag is
-        <literal>&lt;casInitializerDescription&gt;</literal>.</para>
-      
-      <para>CAS Initializers have optional <literal>initialize()</literal>,
-        <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>
-        methods, which perform the same functions as they do for Collection Readers. The only
-        required method for a CAS Initializer is <literal>initializeCas(Object,
-        CAS)</literal>. This method takes the raw document (for example, an
-        <literal>InputStream</literal> object from which the document can be read) and a
-        CAS, and populates the CAS from the document.</para>      
-    </section>
-    
-    <section id="ugr.tug.cpe.cas_consumer.developing"><title>Developing CAS
-      Consumers</title> 
-      
-      <note><para>In version 2, there is no difference in capability
-      between CAS Consumers and ordinary Analysis Engines, except for the default setting of
-      the XML parameters for <literal>multipleDeploymentAllowed</literal> and
-      <literal>modifiesCas</literal>. We recommend for future work that users implement
-      and use Analysis Engine components instead of CAS Consumers.</para>
-      <para>The rest of this section is written using the version 1 style of CAS Consumer;
-      the methods described are also available for Analysis Engines.  Note that the 
-      CAS Consumer <literal>processCAS</literal> method is equivalent to the Analysis Engine
-      <literal>process</literal> method.</para></note>
-      
-      <para>A CAS Consumer receives each CAS after it has been analyzed by the Analysis
-        Engine. CAS Consumers typically do not update the CAS; they typically extract data
-        from the CAS and persist selected information to aggregate data structures such as
-        search engine indexes or databases.</para>
-      
-      <para>A CAS Consumer Java class must implement the interface
-        <literal>org.apache.uima.collection.CasConsumer</literal>, and will also
-        generally extend from the convenience base class
-        <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>. A CAS
-        Consumer also must have an XML descriptor, which has the exact same form as a
-        Collection Reader Descriptor except that the outer tag is
-        <literal>&lt;casConsumerDescription&gt;</literal>.</para>
-      
-      <para>CAS Consumers have optional <literal>initialize()</literal>,
-        <literal>reconfigure()</literal>, and <literal>typeSystemInit()</literal>
-        methods, which perform the same functions as they do for Collection Readers and CAS
-        Initializers. The only required method for a CAS Consumer is
-        <literal>processCas(CAS)</literal>, which is where the CAS Consumer does the bulk
-        of its work (i.e., consume the CAS).</para>
-      
-      <para>The <literal>CasConsumer</literal> interface (as well as the version 2
-        Analysis Engine interfac) additionally defines batch
-        and collection level processing methods. The CAS Consumer or Analysis Engine
-        can implement the
-        <literal>batchProcessComplete()</literal> method to perform processing that
-        should occur at the end of each batch of CASes. Similarly, the CAS Consumer 
-        or Analysis Engine can
-        implement the <literal>collectionProcessComplete()</literal> method to
-        perform any collection level processing at the end of the collection.</para>
-      
-      <para>A very simple example of a CAS Consumer, which writes an XML representation of the
-        CAS to a file, is the XMI Writer CAS Consumer. The Java code is in the class
-        <literal>org.apache.uima.examples.cpe.XmiWriterCasConsumer</literal> and
-        the descriptor is in
-        <literal>%UIMA_HOME%/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml</literal>
-        .</para>
-      
-      <section id="ugr.tug.cpe.cas_consumer.required_methods">
-        <title>Required Methods for a CAS Consumer</title>
-        
-        <para>When extending the convenience class
-          <literal>org.apache.uima.collection.CasConsumer_ImplBase</literal>, the
-          following abstract methods must be implemented:</para>
-        
-        <section id="ugr.tug.cpe.cas_consumer.required_methods.initialize">
-          <title>initialize()</title>
-          <para>The <literal>initialize()</literal> method is called by the framework
-            when the CAS Consumer is first created.
-            <literal>CasConsumer_ImplBase</literal> actually provides a default
-            implementation of this method (i.e., it is not abstract), so you are not strictly
-            required to implement this method. However, a typical CAS Consumer will
-            implement this method to obtain parameter values and perform various
-            initialization steps.</para>
-          
-          <para>In this method, the CAS Consumer can access the values of its configuration
-            parameters and perform other initialization logic. The example XMI Writer CAS
-            Consumer reads its configuration parameters and sets up the output directory:
-            
-            
-            <programlisting><?db-font-size 80% ?>public void initialize() throws ResourceInitializationException {
-  mDocNum = 0;
-  mOutputDir = new File((String) getConfigParameterValue(PARAM_OUTPUTDIR));
-  if (!mOutputDir.exists()) {
-    mOutputDir.mkdirs();
-  }
-}</programlisting></para>
-        </section>
-        
-        <section id="ugr.tug.cpe.cas_consumer.required_methods.processcas">
-          <title>processCas()</title>
-          
-          <para>The <literal>processCas()</literal> method is where the CAS Consumer
-            does most of its work. In our example, the XMI Writer CAS Consumer obtains an
-            iterator over the document metadata in the CAS (in the
-            SourceDocumentInformation feature structure, which is created by the File
-            System Collection Reader) and extracts the URI for the current document. From
-            this the output filename is constructed in the output directory and a subroutine
-            (<literal>writeXmi</literal>) is called to generate the output file. The
-            <literal>writeXmi</literal> subroutine uses the
-            <literal>XmiCasSerializer</literal> class provided with the UIMA SDK to
-            serialize the CAS to the output file (see the example source code for
-            details).</para>
-          
-          
-          <programlisting>public void processCas(CAS aCAS) throws ResourceProcessException {
-  String modelFileName = null;
-
-  JCas jcas;
-  try {
-    jcas = aCAS.getJCas();
-  } catch (CASException e) {
-    throw new ResourceProcessException(e);
-  }
- 
-    // retreive the filename of the input file from the CAS
-  FSIterator it = jcas
-            .getAnnotationIndex(SourceDocumentInformation.type)
-                  .iterator();
-  File outFile = null;
-  if (it.hasNext()) {
-    SourceDocumentInformation fileLoc = 
-            (SourceDocumentInformation) it.next();
-    File inFile;
-    try {
-      inFile = new File(new URL(fileLoc.getUri()).getPath());
-      String outFileName = inFile.getName();
-      if (fileLoc.getOffsetInSource() > 0) {
-        outFileName += ("_" + fileLoc.getOffsetInSource());
-      }
-      outFileName += ".xmi";
-      outFile = new File(mOutputDir, outFileName);
-      modelFileName = mOutputDir.getAbsolutePath() + 
-            "/" + inFile.getName() + ".ecore";
-    } catch (MalformedURLException e1) {
-      // invalid URL, use default processing below
-    }
-  }
-  if (outFile == null) {
-    outFile = new File(mOutputDir, "doc" + mDocNum++);
-  }
-  // serialize XCAS and write to output file
-  try {
-    writeXmi(jcas.getCas(), outFile, modelFileName);
-  } catch (IOException e) {
-    throw new ResourceProcessException(e);
-  } catch (SAXException e) {
-    throw new ResourceProcessException(e);
-  }
-}</programlisting>
-          
-        </section>
-        
-        <section id="ugr.tug.cpe.cas_consumer.optional_methods">
-          <title>Optional Methods</title>
-          <para>The following methods are optional in a CAS Consumer, though they are often
-            used.</para>
-          <section id="ugr.tug.cpe.cas_consumer.optional_methods.batchprocesscomplete">
-            <title>batchProcessComplete()</title>
-            
-            <para>The framework calls the batchProcessComplete() method at the end of each
-              batch of CASes. This gives the CAS Consumer or Analysis Engine 
-              an opportunity to perform any batch
-              level processing. Our simple XMI Writer CAS Consumer does not perform any
-              batch level processing, so this method is empty. Batch size is set in the
-              Collection Processing Engine descriptor.</para>
-          </section>
-          
-          <section id="ugr.tug.cpe.cas_consumer.optional_methods.collectionprocesscomplete">
-            <title>collectionProcessComplete()</title>
-            
-            <para>The framework calls the collectionProcessComplete() method at the end
-              of the collection (i.e., when all objects in the collection have been
-              processed). At this point in time, no CAS is passed in as a parameter. This gives
-              the CAS Consumer or Analysis Engine an opportunity to perform collection processing over the
-              entire set of objects in the collection. Our simple XMI Writer CAS Consumer
-              does not perform any collection level processing, so this method is
-              empty.</para>
-          </section>
-          
-        </section>
-        
-      </section>
-    </section>
-  </section>
-  
-  <section id="ugr.tug.cpe.deploying_a_cpe">
-    <title>Deploying a CPE</title>
-    
-    <para>The CPM provides a number of service and deployment options that cover
-      instantiation and execution of CPEs, error recovery, and local and distributed
-      deployment of the CPE components. The behavior of the CPM (and correspondingly, the
-      CPE) is controlled by various options and parameters set in the CPE descriptor. The
-      current version of the CPE Configurator tool, however, supports only default error
-      handling and deployment options. To change these options, you must manually edit the
-      CPE descriptor.</para>
-    
-    <para>Eventually the CPE Configurator tool will support configuring these options and a
-      detailed tutorial for these settings will be provided. In the meantime, we provide only
-      a high-level, conceptual overview of these advanced features in the rest of this
-      chapter, and refer the advanced user to <olink targetdoc="&uima_docs_ref;"
-        targetptr="ugr.ref.xml.cpe_descriptor"/> for details on setting these options in the CPE
-      Descriptor.</para>
-    
-    <para> <xref linkend="ugr.tug.cpe.fig.cpe_instantiation"/> shows a logical view of
-      how an application uses the UIMA framework to instantiate a CPE from a CPE descriptor.
-      The CPE descriptor identifies the CPE components (referencing their corresponding
-      descriptors) and specifies the various options for configuring the CPM and deploying
-      the CPE components.</para>
-    
-    <figure id="ugr.tug.cpe.fig.cpe_instantiation">
-      <title>CPE Instantiation</title>
-      <mediaobject>
-        <imageobject>
-          <imagedata width="5.7in" format="PNG"
-            fileref="&imgroot;image018.png"/>
-        </imageobject>
-        <textobject><phrase>Picture of deployment of a CPE</phrase></textobject>
-      </mediaobject>
-    </figure>
-    
-    <para id="ugr.tug.cpe.deployment_alternatives">There are three deployment modes
-      for CAS Processors (Analysis Engines and CAS Consumers) in a CPE:</para>
-    
-    <orderedlist><listitem><para><emphasis role="bold">Integrated</emphasis> (runs
-      in the same Java instance as the CPM)</para></listitem>
-      
-      <listitem><para><emphasis role="bold">Managed</emphasis> (runs in a separate
-        process on the same machine), and</para></listitem>
-      
-      <listitem><para><emphasis role="bold">Non-managed</emphasis> (runs in a
-        separate process, perhaps on a different machine). </para></listitem>
-    </orderedlist>
-    
-    <para>An integrated CAS Processor runs in the same JVM as the CPE. A managed CAS Processor
-      runs in a separate process from the CPE, but still on the same computer. The CPE controls
-      startup, shutdown, and recovery of a managed CAS Processor. A non-managed CAS
-      Processor runs as a service and may be on the same computer as the CPE or on a remote
-      computer. A non-managed CAS Processor <emphasis role="bold-italic">
-      service</emphasis> is started and managed independently from the CPE.</para>
-    
-    <para>For both managed and non-managed CAS Processors, the CAS must be transmitted
-      between separate processes and possibly between separate computers. This is
-      accomplished using <emphasis>Vinci</emphasis>, a communication protocol used by
-      the CPM and which is provided as a part of Apache UIMA. Vinci handles service naming and
-      location and data transport (see <olink targetdoc="&uima_docs_tutorial_guides;"
-        targetptr="ugr.tug.application.how_to_deploy_a_vinci_service"/>&nbsp; for more
-      information). Service naming and location are provided by a <emphasis>Vinci Naming
-      Service</emphasis>, or <emphasis>VNS</emphasis>. For managed CAS Processors, the
-      CPE uses its own internal VNS. For non-managed CAS Processors, a separate VNS must be
-      running.</para> <note><para>The UIMA SDK also supports using unmanaged remote
-    services via the web-standard SOAP communications protocol (see <olink
-      targetdoc="&uima_docs_tutorial_guides;"
-      targetptr="ugr.tug.application.how_to_deploy_as_soap"/>. This approach is
-    based on a proxy implementation, where the proxy is essentially running in an integrated
-    mode. To use this approach with the CPM, use the Integrated mode, with the component being
-    an Aggregate which, in turn, connects to a remote service. </para></note>
-    
-    <para>The CPE Configurator tool currently only supports constructing CPEs that deploy
-      CAS Processors in integrated mode. To deploy CAS Processors in any other mode, the CPE
-      descriptor must be edited by hand (better tooling may be provided later). Details on the
-      CPE descriptor and the required settings for various CAS Processor deployment modes
-      can be found in <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>
-      . In the following sections we merely summarize the various CAS Processor deployment
-      options.</para>
-    
-    <section id="ugr.tug.cpe.managed_deployment">
-      <title>Deploying Managed CAS Processors</title>
-      
-      <para>Managed CAS Processor deployment is shown in <xref
-          linkend="ugr.tug.cpe.fig.managed_deployment"/>. A managed CAS Processor is
-        deployed by the CPE as a Vinci service. The CPE manages the lifecycle of the CAS
-        Processor including service launch, restart on failures, and service shutdown. A
-        managed CAS Processor runs on the same machine as the CPE, but in a separate process.
-        This provides the necessary fault isolation for the CPE to protect it from non-robust
-        CAS Processors. A fatal failure of a managed CAS Processor does not threaten the
-        stability of the CPE.</para>
-      
-      <figure id="ugr.tug.cpe.fig.managed_deployment">
-        <title>CPE with Managed CAS Processors</title>
-        <mediaobject>
-          <imageobject>
-            <imagedata width="3.6in" format="PNG"
-              fileref="&imgroot;image020.png"/>
-          </imageobject>
-          <textobject><phrase>Managed deployment showing separate JVMs and CASes
-            flowing between them</phrase></textobject>
-        </mediaobject>
-      </figure>
-      
-      <para>The CPE communicates with managed CAS Processors using the Vinci communication
-        protocol. A CAS Processor is launched as a Vinci service and its
-        <literal>process()</literal> method is invoked remotely via a Vinci command. The
-        CPE uses its own internal VNS to support managed CAS processors. The VNS, by default,
-        listens on port 9005. If this port is not available, the VNS will increment its listen
-        port until it finds one that is available. All managed CAS Processors are internally
-        configured to <quote>talk</quote> to the CPE managed VNS. This internal VNS is
-        transparent to the end user launching the CPE.</para>
-      
-      <para>To deploy a managed CAS Processor, the CPE deployer must change the CPE
-        descriptor. The following is a section from the CPE descriptor that shows an example
-        configuration specifying a managed CAS Processor.</para>
-      
-      
-      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="local"</emphasis> name="Meeting Detector TAE"&gt;
-  &lt;descriptor&gt;
-    &lt;include href="deploy/vinci/Deploy_MeetingDetectorTAE.xml"/&gt;
-  &lt;/descriptor&gt;
-  &lt;runInSeparateProcess&gt;
-    &lt;exec dir="." executable="java"&gt;
-      &lt;env key="CLASSPATH" 
-         value="src;
-                C:/Program Files/apache/uima/lib/uima-core.jar;
-                C:/Program Files/apache/uima/lib/uima-cpe.jar;
-                C:/Program Files/apache/uima/lib/uima-examples.jar;
-                C:/Program Files/apache/uima/lib/uima-adapter-vinci.jar;
-                C:/Program Files/apache/uima/lib/jVinci.jar"/>
-      &lt;arg&gt;-DLOG=C:/Temp/service.log&lt;/arg&gt;
-      &lt;arg&gt;org.apache.uima.reference_impl.collection.
-         service.vinci.VinciAnalysisEnginerService_impl&lt;/arg&gt;
-      &lt;arg&gt;${descriptor}&lt;/arg&gt;
-    &lt;/exec&gt;
-  &lt;/runInSeparateProcess&gt;
-  &lt;deploymentParameters/&gt;
-  &lt;filter/&gt;
-  &lt;errorHandling&gt;
-    &lt;errorRateThreshold action="terminate" value="1/100"/&gt;
-    &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;
-    &lt;timeout max="100000"/&gt;
-  &lt;/errorHandling&gt;
-  &lt;checkpoint batch="10000"/&gt;
-&lt;/casProcessor&gt;</programlisting>
-      
-      <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
-        details and required settings.</para>
-      
-    </section>
-    
-    <section id="ugr.tug.cpe.deploying_nonmanaged_cas_processors">
-      <title>Deploying Non-managed CAS Processors</title>
-      
-      <para>Non-managed CAS Processor deployment is shown in <xref
-          linkend="ugr.tug.cpe.fig.nonmanaged_cpe"/>. In non-managed mode, the CPE
-        supports connectivity to CAS Processors running on local or remote computers using
-        Vinci. Non-managed processors are different from managed processors in two
-        aspects:
-        
-        <orderedlist><listitem><para>Non-managed processors are neither started nor
-          stopped by the CPE.</para></listitem>
-          
-          <listitem><para>Non-managed processors use an independent VNS, also neither
-            started nor stopped by the CPE. </para></listitem></orderedlist></para>
-      
-      <figure id="ugr.tug.cpe.fig.nonmanaged_cpe">
-        <title>CPE with non-managed CAS Processors</title>
-        <mediaobject>
-          <imageobject>
-            <imagedata width="4.8in" format="PNG"
-              fileref="&imgroot;image023.png"/>
-          </imageobject>
-          <textobject><phrase>Non-managed CPE deployment</phrase></textobject>
-        </mediaobject>
-      </figure>
-      
-      <para>While non-managed CAS Processors provide the same level of fault isolation and
-        robustness as managed CAS Processors, error recovery support for non-managed CAS
-        Processors is much more limited. In particular, the CPE cannot restart a non-managed
-        CAS Processor after an error.</para>
-      
-      <para>Non-managed CAS Processors also require a separate Vinci Naming Service
-        running on the network. This VNS must be manually started and monitored by the end user
-        or application. Instructions for running a VNS can be found in <olink
-          targetdoc="&uima_docs_tutorial_guides;"
-          targetptr="ugr.tug.application.vns.starting"/>.</para>
-      
-      <para>To deploy a non-managed CAS Processor, the CPE deployer must change the CPE
-        descriptor. The following is a section from the CPE descriptor that shows an example
-        configuration for the non-managed CAS Processor.</para>
-      
-      
-      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment="remote"</emphasis> name="Meeting Detector TAE"&gt;
-  &lt;descriptor&gt;
-    &lt;include href=
-        "descriptors/vinciService/MeetingDetectorVinciService.xml"/&gt;
-  &lt;/descriptor&gt;
-  &lt;deploymentParameters/&gt;
-  &lt;filter/&gt;
-  &lt;errorHandling&gt;
-    &lt;errorRateThreshold action="terminate" value="1/100"/&gt;
-    &lt;maxConsecutiveRestarts action="terminate" value="3"/&gt;
-    &lt;timeout max="100000"/&gt;
-  &lt;/errorHandling&gt;
-  &lt;checkpoint batch="10000"/&gt;
-&lt;/casProcessor&gt;</programlisting>
-      
-      <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
-        details and required settings.</para>
-      
-    </section>
-    
-    <section id="ugr.tug.cpe.integrated_deployment">
-      <title>Deploying Integrated CAS Processors</title>
-      
-      <para>Integrated CAS Processors are shown in <xref
-          linkend="ugr.tug.cpe.fig.integrated_deployment"/>. Here the CAS Processors
-        run in the same JVM as the CPE, just like the Collection Reader and CAS Initializer.
-        This deployment method results in minimal CAS communication and transport overhead
-        as the CAS is shared in the same process space of the JVM. However, a CPE running with all
-        integrated CAS Processors is limited in scalability by the capability of the single
-        computer on which the CPE is running. There is also a stability risk associated with
-        integrated processors because a poorly written CAS Processor can cause the JVM, and
-        hence the entire CPE, to abort.</para>
-      
-      <figure id="ugr.tug.cpe.fig.integrated_deployment">
-        <title>CPE with integrated CAS Processor</title>
-        <mediaobject>
-          <imageobject>
-            <imagedata width="3.2in" format="PNG"
-              fileref="&imgroot;image026.png"/>
-          </imageobject>
-          <textobject><phrase>CPE with integrated CAS Processor</phrase>
-          </textobject>
-        </mediaobject>
-      </figure>
-      
-      <para>The following is a section from a CPE descriptor that shows an example
-        configuration for the integrated CAS Processor.</para>
-      
-      
-      <programlisting>&lt;casProcessor <emphasis role="bold-italic">deployment=<quote>integrated</quote></emphasis> name=<quote>Meeting Detector TAE</quote>&gt;
-  &lt;descriptor&gt;
-    &lt;include href="descriptors/tutorial/ex4/MeetingDetectorTAE.xml"/&gt;
-  &lt;/descriptor&gt;
-  &lt;deploymentParameters/&gt;
-  &lt;filter/&gt;
-  &lt;errorHandling&gt;
-    &lt;errorRateThreshold action="terminate" value="100/1000"/&gt;
-    &lt;maxConsecutiveRestarts action="terminate" value="30"/&gt;
-    &lt;timeout max="100000"/&gt;
-  &lt;/errorHandling&gt;
-  &lt;checkpoint batch="10000"/&gt;
-&lt;/casProcessor&gt;</programlisting>
-      
-      <para>See <olink targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/> for
-        details and required settings.</para>
-      
-    </section>
-  </section>
-  
-  <section id="ugr.tug.cpe.collection_processing_examples">
-    <title>Collection Processing Examples</title>
-    
-    <para>The UIMA SDK includes a set of examples illustrating the three modes of deployment,
-      integrated, managed, and non-managed. These are in the
-      <literal>/examples/descriptors/collection_processing_engine</literal>
-      directory. There are three CPE descriptors that run an example annotator (the Meeting
-      Finder) in these modes.</para>
-    
-    <para>To run either the integrated or managed examples, use the
-      <literal>runCPE</literal> script in the /bin directory of the UIMA installation,
-      passing the appropriate CPE descriptor as an argument, or
-      if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your
-    workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the 
-    launch configuration <quote>UIMA Run CPE</quote>.</para> 
-    
-    <note><para>The <literal>runCPE</literal> script <emphasis role="bold-italic"> must</emphasis> 
-    be run from the <literal>%UIMA_HOME%\examples</literal> directory, because the example
-    CPE descriptors use relative path names that are resolved relative to this working directory. 
-    For instance,
-   
-    <literallayout>runCPE
-descriptors\collection_processing_engine\MeetingFinderCPE_Integrated.xml</literallayout></para>
-    </note>
-    
-    <!--
-    <para>If you installed the examples into Eclipse, you can run directly from Eclipse by
-      creating a run configuration. To do this, highlight the SimpleRunCPE.java source file
-      in the examples src/org/apache/uima/examples/cpe directory, and then</para>
-    
-    <orderedlist><listitem><para>pick the menu Run &rarr; Run...</para></listitem>
-      
-      <listitem><para>click <quote>Java Application</quote> and press
-        <quote>New</quote></para></listitem>
-      
-      <listitem><para>click on the Arguments panel, and insert a path to the appropriate CPE
-        descriptor in the <quote>Program Arguments</quote> box by typing, for instance:
-        <literal>descriptors/collection_processing_engine/
-          MeetingFinderCPE_Integrated.xml</literal>
-        </para></listitem>
-      
-      <listitem><para>Then press <quote>Run</quote> </para></listitem>
-    </orderedlist>
-    -->
-    
-    <para>To run the non-managed example, there are some additional steps.
-      
-      <orderedlist><listitem><para>Start a VNS service by running the
-        <literal>startVNS</literal> script in the <literal>/bin</literal>
-        directory, or using the Eclipse launcher <quote>UIMA Start VNS</quote>.</para></listitem>
-        
-        <listitem><para>Deploy the Meeting Detector Analysis Engine as a Vinci service, by
-          running the <literal>startVinciService</literal> script in the
-          <literal>/bin</literal> directory or using the Eclipse launcher for this, and passing it the location of the
-          descriptor to deploy, in this case
-          <literal>%UIMA_HOME%/examples/deploy/vinci/Deploy_MeetingDetectorTAE.xml</literal>,
-          or
-      if you're using Eclipse and have the <literal>uimaj-examples</literal> project in your
-    workspace, you can use the Eclipse Menu &rarr; Run &rarr; Run... &rarr; and then pick the 
-    launch configuration <quote>UIMA Start Vinci Service</quote>.
-          </para></listitem>
-        
-        <listitem><para>Now, run the runCPE script (or if in Eclipse, run the 
-          launch configuration <quote>UIMA Run CPE</quote>), passing it the CPE for the non-managed
-          version
-          <literal>(%UIMA_HOME%/examples/descriptors/collection_processing_engine/
-            MeetingFinderCPE_NonManaged.xml</literal>
-          ). </para></listitem></orderedlist></para>
-    
-    <para>This assumes that the Vinci Naming Service, the runCPE application, and the
-      <literal>MeetingDetectorTAE</literal> service are all running on the same machine.
-      Most of the scripts that need information about VNS will look for values to use in
-      environment variables VNS_HOST and VNS_PORT; these default to
-      <quote>localhost</quote> and <quote>9000</quote>. You may set these to appropriate
-      values before running the scripts, as needed; you can also pass the name of the VNS host as
-      the second argument to the startVinciService script.</para>
-    
-    <para>Alternatively, you can edit the scripts and/or the XML files to specify
-      alternatives for the VNS_HOST and VNS_PORT. For instance, if the
-      <literal>runCPE</literal> application is running on a different machine from the
-      Vinci Naming Service, you can edit the
-      <literal>MeetingFinderCPE_NonManaged.xml</literal> and change the vnsHost
-      parameter:
-      <literal>&lt;parameter name="vnsHost"  value="localhost" type="string"/&gt;</literal>
-      to specify the VNS host instead of <quote>localhost</quote>.</para>
-  </section>
-  
-</chapter>
-
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY imgroot "../images/tutorials_and_users_guides/tug.cpe/">
+<!ENTITY % uimaents SYSTEM "../entities.ent">  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.cpe">
+  <title>Collection Processing Engine Developer&apos;s Guide</title>
+  <titleabbrev>CPE Developer&apos;s Guide</titleabbrev>
+  
+  <para>The UIMA Analysis Engine interface provides support for developing and integrating
+    algorithms that analyze unstructured data. Analysis Engines are designed to operate on a
+    per-document basis. Their interface handles one CAS at a time. UIMA provides additional
+    support for applying analysis engines to collections of unstructured data with its
+    <emphasis>Collection Processing Architecture</emphasis>. The Collection
+    Processing Architecture defines additional components for reading raw data formats
+    from data collections, preparing the data for processing by Analysis Engines, executing
+    the analysis, extracting analysis results, and deploying the overall flow in a variety of
+    local and distributed configurations.</para>
+  
+  <para>The functionality defined in the Collection Processing Architecture is
+    implemented by a <emphasis>Collection Processing Engine</emphasis> (CPE). A CPE
+    includes an Analysis Engine and adds a <emphasis>Collection Reader</emphasis>, a
+    <emphasis>CAS Initializer</emphasis> (deprecated as of version 2), and <emphasis>CAS
+    Consumers</emphasis>. The part of the UIMA Framework that supports the execution of
+    CPEs is called the Collection Processing Manager, or CPM.</para>
+  
+  <para>A Collection Reader provides the interface to the raw input data and knows how to
+    iterate over the data collection. Collection Readers are discussed in <xref
+      linkend="ugr.tug.cpe.collection_reader.developing"/>. The CAS Initializer
+    <footnote><para>CAS Initializers are deprecated in favor of a more general mechanism,
+    multiple subjects of analysis.</para></footnote> prepares an individual data item for
+    analysis and loads it into the CAS. CAS Initializers are discussed in <xref
+      linkend="ugr.tug.cpe.cas_initializer.developing"/> A CAS Consumer extracts
+    analysis results from the CAS and may also perform <emphasis>collection level
+    processing</emphasis>, or analysis over a collection of CASes. CAS Consumers are
+    discussed in <xref linkend="ugr.tug.cpe.cas_consumer.developing"/>.</para>
+  
+  <para>Analysis Engines and CAS Consumers are both instances of <emphasis>CAS
+    Processors</emphasis>. A Collection Processing Engine (CPE) may contain multiple CAS
+    Processors. An Analysis Engine contained in a CPE may itself be a Primitive or an Aggregate
+    (composed of other Analysis Engines). Aggregates may contain Cas Consumers. While
+    Collection Readers and CAS Initializers always run in the same JVM as the CPM, a CAS
+    Processor may be deployed in a variety of local and distributed modes, providing a number
+    of options for scalability and robustness. The different deployment options are covered
+    in detail in <xref linkend="ugr.tug.cpe.deployment_alternatives"/>.</para>
+  
+  <para>Each of the components in a CPE has an interface specified by the UIMA Collection
+    Processing Architecture and is described by a declarative XML descriptor file.
+    Similarly, the CPE itself has a well defined component interface and is described by a
+    declarative XML descriptor file.</para>
+  
+  <para>A user creates a CPE by assembling the components mentioned above. The UIMA SDK
+    provides a graphical tool, called the CPE Configurator, for assisting in the assembly of
+    CPEs. Use of this tool is summarized in <xref
+      linkend="ugr.tug.cpe.cpe_configurator"/>, and more details can be found in <olink
+      targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.
+    Alternatively, a CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
+    descriptor, including its syntax and content, can be found in the <olink
+      targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. The individual
+    components have associated XML descriptors, each of which can be created and / or edited
+    using the <olink targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cde">
+    Component Description Editor</olink>.</para>
+  
+  <para>A CPE is executed by a UIMA infrastructure component called the
+    <emphasis>Collection Processing Manager</emphasis> (CPM). The CPM provides a number
+    of services and deployment options that cover instantiation and execution of CPEs, error
+    recovery, and local and distributed deployment of the CPE components.</para>
+  
+  <section id="ugr.tug.cpe.concepts">
+    <title>CPE Concepts</title>
+    
+    <para> <xref linkend="ugr.tug.cpe.fig.cpe_components"/> illustrates the data flow
+      that occurs between the different types of components that make up a CPE.</para>
+    
+    <figure id="ugr.tug.cpe.fig.cpe_components">
+      <title>CPE Components</title>
+      <mediaobject>
+        <imageobject>
+          <imagedata width="5.7in" format="PNG"
+            fileref="&imgroot;image002.png"/>
+        </imageobject>
+        <textobject><phrase>CPE Components and flow between them</phrase>
+        </textobject>
+      </mediaobject>
+    </figure>
+    
+    <para>The components of a CPE are:</para>
+    
+    <itemizedlist><listitem><para><emphasis>Collection Reader &ndash;</emphasis>
+      interfaces to a collection of data items (e.g., documents) to be analyzed. Collection
+      Readers return CASes that contain the documents to analyze, possibly along with
+      additional metadata.</para></listitem>
+      
+      <listitem><para><emphasis>Analysis Engine &ndash;</emphasis> takes a CAS,
+        analyzes its contents, and produces an enriched CAS. Analysis Engines can be
+        recursively composed of other Analysis Engines (called an
+        <emphasis>Aggregate</emphasis> Analysis Engine). Aggregates may also contain
+        CAS Consumers.</para></listitem>
+      
+      <listitem><para><emphasis>CAS Consumer &ndash;</emphasis> consume the enriched
+        CAS that was produced by the sequence of Analysis Engines before it, and produce an
+        application-specific data structure, such as a search engine index or database.
+        </para></listitem></itemizedlist>
+    
+    <para>A fourth type of component, the <emphasis>CAS Initializer,</emphasis> may be
+      used by a Collection Reader to populate a CAS from a document. However, as of UIMA
+      version 2 CAS Initializers are now deprecated in favor of a more general mechsanism,
+      multiple Subjects of Analysis.</para>
+    
+    <para>The Collection Processing Manager orchestrates the data flow
+      within a CPE, monitors status, optionally manages the life-cycle of internal
+      components and collects statistics.</para>
+    
+    <para>CASes are not saved in a persistent way by the framework. If you want to save CASes,
+      then you have to save each CAS as it comes through (for example) using a CAS Consumer you
+      write to do this, in whatever format you like. The UIMA SDK supplies an example CAS
+      Consumer to save CASes to XML files, either in the standard XMI format or in an older
+      format called XCAS.  It also supplies an example CAS Consumer to extract information from CASes and
+      store the results into a relational Database, using Java&apos;s JDBC APIs.</para>
+    
+  </section>
+  
+  <section id="ugr.tug.cpe.configurator_and_viewer">
+    <title>CPE Configurator and CAS viewer</title>
+    
+    <section id="ugr.tug.cpe.cpe_configurator">
+      <title>Using the CPE Configurator</title>
+      
+      <para>A CPE can be assembled by writing an XML CPE descriptor. Details on the CPE
+        descriptor, including its syntax and content, can be found in <olink
+          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>. Rather than
+        edit raw XML, you may develop a CPE Descriptor using the CPE Configurator tool. The CPE
+        Configurator tool is described briefly in this section, and in more detail in <olink
+          targetdoc="&uima_docs_tools;" targetptr="ugr.tools.cpe"/>.</para>
+      
+      <para>The CPE Configurator tool can be run from Eclipse (see <xref
+          linkend="ugr.tug.cpe.running_cpe_configurator_from_eclipse"/>, or using
+        the <literal>cpeGui</literal> shell script (<literal>cpeGui.bat</literal> on
+        Windows, <literal>cpeGui.sh</literal> on Unix), which is located in the
+        <literal>bin</literal> directory of the UIMA SDK installation. Executing this
+        batch file will display the window shown here:
+        
+        
+        <screenshot>
+          <mediaobject>
+            <imageobject>
+              <imagedata width="5.7in" format="JPG" fileref="&imgroot;image004.jpg"/>
+            </imageobject>
+            <textobject><phrase>Screenshot of CPE GUI</phrase></textobject>
+          </mediaobject>
+        </screenshot>
+        </para>
+      
+      <para>The window is divided into three sections, one each for the Collection Reader, 
+        Analysis Engines, and CAS Consumers.<footnote><para>There is also a fourth pane,
+        for the CAS Initializer, but it is hidden by default.  To enable it click the
+        <literal>View &rarr; CAS Initializer Panel</literal> menu item.</para></footnote> 
+        In each section, you select the component(s) you want to include in the CPE by 
+        browsing to their XML descriptors. The configuration parameters present in the XML 
+        descriptors will then be displayed in the GUI; these can be modified to override
+        the values present in the descriptor. For example, the screen shot below shows the 
+        CPE Configurator after the following components have been chosen:
+        
+        
+        <programlisting>Collection Reader: 
+   %UIMA_HOME%/examples/descriptors/collection_reader/
+          FileSystemCollectionReader.xml
+
+Analysis Engine: 
+   %UIMA_HOME%/examples/descriptors/analysis_engine/
+          NamesAndPersonTitles_TAE.xml
+
+CAS Consumer: 
+    %UIMA_HOME%/examples/descriptors/cas_consumer/
+          XmiWriterCasConsumer.xml</programlisting></para>
+      
+      
+      <screenshot>
+     <mediaobject>
+      <imageobject>
+        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image006.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of CPE GUI after fields filled in</phrase></textobject>
+    </mediaobject>
+    </screenshot>
+      
+      <para>For the File System Collection Reader, ensure that the Input Directory is set to
+        <literal>%UIMA_HOME%\examples\data</literal><footnote><para>Replace
+        <literal>%UIMA_HOME%</literal> with the path to where you installed UIMA.</para>
+        </footnote>. The other parameters may be left blank. For the External CAS Writer CAS
+        Consumer, ensure that the Output Directory is set to
+        <literal>%UIMA_HOME%\examples\data\processed</literal>.</para>
+      
+      <para>After selecting each of the components and providing configuration settings,
+        click the play (forward arrow) button at the bottom of the screen to begin processing.
+        A progress bar should be displayed in the lower left corner. (Note that the progress
+        bar will not begin to move until all components have completed their initialization,
+        which may take several seconds.) Once processing has begun, the pause and stop
+        buttons become enabled.</para>
+      
+      <para>If an error occurs, you will be informed by an error dialog. If processing
+        completes successfully, you will be presented with a performance report.</para>
+      
+      <para>Using the File menu, you can select <literal>Save CPE Descriptor </literal>to
+        create an .xml descriptor file that defines the CPE you have constructed. Later, you
+        can use <literal>Open CPE Descriptor</literal> to restore the CPE Configurator to
+        the saved state. Also, CPE descriptors can be used to run a CPE from a Java program
+        &ndash; see section <xref
+          linkend="ugr.tug.cpe.running_cpe_from_application"/>. CPE Descriptors
+        allow specifying operational parameters, such as error handling options, that are
+        not currently available for configuration through the CPE Configurator. For more
+        information on manually creating a CPE Descriptor, see the <olink
+          targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.cpe_descriptor"/>.</para>
+            
+      <para>The CPE configured above runs a simple name and title annotator on the sample data
+        provided with the UIMA SDK and stores the results using the XMI Writer CAS Consumer. To
+        view the results, start the External CAS Annotation Viewer by running the
+        <literal>annotationViewer</literal> batch file
+        (<literal>annotationViewer.bat</literal> on Windows,
+        <literal>annotationViewer.sh</literal> on Unix), which is located in the
+        <literal>bin</literal> directory of the UIMA SDK installation. Executing this
+        batch file will display the window shown here:
+        
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image008.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Annotation Viewer results</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        </para>
+      
+      <para>Ensure that the Input Directory is the same as the Output Directory specified for
+        the XMI Writer CAS Consumer in the CPE configured above (e.g.,
+        <literal>%UIMA_HOME%\examples\data\processed</literal>) and that the TAE
+        Descriptor File is set to the Analysis Engine used in the CPE configured above (e.g.,
+        <literal>examples\descriptors\analysis_engine\NamesAndPersonTitles_TAE.xml</literal>
+        ).</para>
+      
+      <para>Click the View button to display the Analyzed Documents window:
+        
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="3.5in" format="JPG" fileref="&imgroot;image010.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of CPE Configurator Analyzed Documents</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        </para>
+      
+      <para>Double click on any document in the list to view the analyzed document. Double
+        clicking the first document, IBM_LifeSciences.txt, will bring up the following
+        window:
+        
+        
+        <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="5.7in" format="JPG" fileref="&imgroot;image012.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Document and Annotation Viewer</phrase></textobject>
+    </mediaobject>
+  </screenshot>
+        </para>
+      
+      <para>This window shows the analysis results for the document. Clicking on any
+        highlighted annotation causes the details for that annotation to be displayed in the
+        right-hand pane. Here the annotation spanning <quote>John M. Thompson</quote> has
+        been clicked.</para>
+      
+      <para>Congratulations! You have successfully configured a CPE, saved its
+        descriptor, run the CPE, and viewed the analysis results.</para>
+    </section>
+    
+    <section id="ugr.tug.cpe.running_cpe_configurator_from_eclipse">
+      <title>Running the CPE Configurator from Eclipse</title>
+      
+      <para>If you have followed the instructions in <olink
+          targetdoc="&uima_docs_overview;"
+          targetptr="ugr.ovv.eclipse_setup"/> and imported the example Eclipse
+        project, then you should already have a Run configuration for the CPE Configurator
+        tool (called <literal>UIMA CPE GUI</literal>) configured to run in the example
+        project. Simply run that configuration to start the CPE Configurator.</para>
+      
+      <para>If you haven&apos;t followed the Eclipse setup instructions and wish to run the
+        CPE Configurator tool from Eclipse, you will need to do the following. As installed,
+        this Eclipse launch configuration is associated with the
+        <quote>uimaj-examples</quote> project. If you&apos;ve not already done so, you
+        may wish to import that project into your Eclipse workspace. It&apos;s located in
+        %UIMA_HOME%/docs/examples. Doing this will supply the Eclipse launcher with all
+        the class files it needs to run the CPE configurator. If you don&apos;t do this, please

[... 1019 lines stripped ...]