You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2008/08/28 23:28:16 UTC

svn commit: r689997 [14/32] - in /incubator/uima/uimaj/trunk/uima-docbooks: ./ src/ src/docbook/overview_and_setup/ src/docbook/references/ src/docbook/tools/ src/docbook/tutorials_and_users_guides/ src/docbook/uima/organization/ src/olink/references/

Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.cpe_descriptor.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.cpe_descriptor.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.cpe_descriptor.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.xml.cpe_descriptor.xml Thu Aug 28 14:28:14 2008
@@ -1,1368 +1,1368 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY imgroot "../images/references/ref.xml.cpe_descriptor/">
-<!ENTITY tp "ugr.ref.xml.cpe_descriptor.">
-<!ENTITY % uimaents SYSTEM "../entities.ent" >  
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.ref.xml.cpe_descriptor">
-  <title>Collection Processing Engine Descriptor Reference</title>
-  <titleabbrev>CPE Descriptor Reference</titleabbrev>
-  
-  <para>A UIMA <emphasis>Collection Processing Engine</emphasis> (CPE) is a combination
-    of UIMA components assembled to analyze a collection of artifacts. A CPE is an
-    instantiation of the UIMA <emphasis>Collection Processing Architecture</emphasis>,
-    which defines the collection processing components, interfaces, and APIs. A CPE is
-    executed by a UIMA framework component called the <emphasis>Collection Processing
-    Manager</emphasis> (CPM), which provides a number of services for deploying CPEs,
-    running CPEs, and handling errors.</para>
-  
-  <para>A CPE can be assembled programmatically within a Java application, or it can be
-    assembled declaratively via a CPE configuration specification, called a CPE
-    Descriptor. This chapter describes the format of the CPE Descriptor.</para>
-  
-  <para>Details about the CPE, including its function, sub-components, APIs, and related
-    tools, can be found in <olink targetdoc="&uima_docs_tutorial_guides;"
-      targetptr="ugr.tug.cpe"/>. Here we briefly summarize the CPE to define terms and
-    provide context for the later sections that describe the CPE Descriptor.</para>
-  
-  <section id="&tp;overview">
-    <title>CPE Overview</title>
-    
-    <figure id="&tp;overview.fig.runtime">
-      <title>CPE Runtime Overview</title>
-      <mediaobject>
-        <imageobject>
-          <imagedata width="5.8in" format="PNG"
-            fileref="&imgroot;image002.png"/>
-        </imageobject>
-        <textobject><phrase>CPE Runtime Overview diagram</phrase></textobject>
-      </mediaobject>
-    </figure>
-    
-    <para>An illustration of the CPE runtime is shown in <xref
-        linkend="&tp;overview.fig.runtime"/>. Some of the CPE components, such as the
-      <emphasis>queues</emphasis> and <emphasis>processing pipelines</emphasis>, are
-      internal to the CPE, but their behavior and deployment may be configured using the CPE
-      Descriptor. Other CPE components, such as the <emphasis>Collection
-      Reader</emphasis> and <emphasis>CAS Processors</emphasis>, are defined and
-      configured externally from the CPE and then plugged in to the CPE to create the overall
-      engine. The parts of a CPE are:
-      
-      <variablelist>
-        <varlistentry>
-          <term>Collection Reader</term>
-          <listitem><para>understands the native data collection format and iterates
-            over the collection producing subjects of analysis</para></listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>CAS Initializer<footnote><para>Deprecated</para></footnote>
-            </term>
-          <listitem><para>initializes a CAS with a subject of analysis</para>
-            </listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>Artifact Producer</term>
-          <listitem><para>asynchronously pulls CASes from the Collection Reader,
-            creates batches of CASes and puts them into the work queue</para></listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>Work Queue</term>
-          <listitem><para>shared queue containing batches of CASes queued by the Artifact
-            Producer for analysis by Analysis Engines</para>
-          </listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>B1-Bn</term>
-          <listitem><para>individual batches containing 1 or more CASes</para>
-            </listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>AE1-AEn</term>
-          <listitem><para>Analysis Engines arranged by a CPE descriptor</para>
-            </listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>Processing Pipelines</term>
-          <listitem><para>each pipeline runs in a separate thread and contains a
-            replicated set of the Analysis Engines running in the defined sequence</para>
-            </listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>Output Queue</term>
-          <listitem><para>holds batches of CASes with analysis results intended for CAS
-            Consumers</para></listitem>
-        </varlistentry>
-        
-        <varlistentry>
-          <term>CAS Consumers</term>
-          <listitem><para>perform collection level analysis over the CASes and extract
-            analysis results, e.g., creating indexes or databases</para></listitem>
-        </varlistentry>
-      </variablelist>
-      </para>
-  </section>
-  
-  <section id="&tp;notation">
-    <title>Notation</title>
-    
-    <para>CPE Descriptors are XML files. This chapter uses an informal notation to specify
-      the syntax of CPE Descriptors.</para>
-    
-    <para>The notation used in this chapter is:
-      
-      <itemizedlist><listitem><para>An ellipsis (...) inside an element body indicates
-        that the substructure of that element has been omitted (to be described in another
-        section of this chapter). An example of this would be:
-        
-        
-        <programlisting>&lt;collectionReader&gt;
-...
-&lt;/collectionReader&gt;</programlisting></para>
-        </listitem>
-        
-        <listitem><para>An ellipsis immediately after an element indicates that the
-          element type may be repeated arbitrarily many times. For example:
-          
-          
-          <programlisting>&lt;parameter&gt;[String]&lt;/parameter&gt;
-&lt;parameter&gt;[String]&lt;/parameter&gt;
-...</programlisting>
-          indicates that there may be arbitrarily many parameter elements in this
-          context.</para></listitem>
-        
-        <listitem><para>An ellipsis inside an element means details of the attributes
-          associated with that element are defined later, e.g.:
-          
-          <programlisting>&lt;casProcessor ...&gt;</programlisting></para>
-          </listitem>
-        
-        <listitem><para>Bracketed expressions (e.g. <literal>[String]</literal>)
-          indicate the type of value that may be used at that location.</para></listitem>
-        
-        <listitem><para>A vertical bar, as in <literal>true|false</literal>, indicates
-          alternatives. This can be applied to literal values, bracketed type names, and
-          elements. </para></listitem></itemizedlist></para>
-    
-    <para>Which elements are optional and which are required is specified in prose, not in the
-      syntax definition.</para>
-    
-  </section>
-  
-  <section id="&tp;imports">
-    <title>Imports</title>
-    
-    <para>As of version 2.2, a CPE Descriptor can use the same <literal>import</literal> mechanism
-      as other component descriptors.  This allows referring to component
-      descriptors using either relative paths (resolved relative to the location of the CPE descriptor)
-      or the classpath/datapath.  For details see <olink targetdoc="&uima_docs_ref;"
-      targetptr="ugr.ref.xml.component_descriptor"/>.</para>
-     
-    <para>The follwing older syntax is still supported, but <emphasis>not recommended</emphasis>:
-      
-      <programlisting><![CDATA[<descriptor>
-    <include href="[URL or File]"/>
-</descriptor>]]></programlisting></para>
-    
-    <para>The <literal>[URL or File]</literal> attribute is a URL or a filename for the descriptor of the
-      incorporated component. The argument is first attempted to be resolved as a URL.</para>
-    
-    <para>
-      Relative paths in an <literal>include</literal> are resolved relative to the current working directory 
-      (NOT the CPE descriptor location as is the case for <literal>import</literal>). 
-      A filename relative to another directory can be specified using the <literal>CPM_HOME</literal>
-      variable, e.g.,    
-    <programlisting>&lt;descriptor&gt;
-    &lt;include href="${CPM_HOME}/desc_dir/descriptor.xml"/&gt;
-&lt;/descriptor&gt;</programlisting>
-    
-      In this case, the value for the <literal>CPM_HOME</literal> variable must be
-      provided to the CPE by specifying it on the Java command line, e.g.,
-        
-    <programlisting>java -DCPM_HOME="C:/Program Files/apache/uima/cpm" ...</programlisting>
-    
-  </para>
-    
-  </section>
-  
-  <section id="&tp;descriptor">
-    <title>CPE Descriptor Overview</title>
-    
-    <para>A CPE Descriptor consists of information describing the following four main
-      elements.</para>
-    
-    <orderedlist><listitem><para>The <emphasis>Collection Reader</emphasis>, which
-      is responsible for gathering artifacts and initializing the Common Analysis
-      Structure (CAS) used to support processing in the UIMA collection processing
-      engine.</para></listitem>
-      
-      <listitem><para>The <emphasis>CAS Processors</emphasis>, responsible for
-        analyzing individual artifacts, analyzing across artifacts, and extracting
-        analysis results. CAS Processors include <emphasis>Analysis Engines</emphasis>
-        and <emphasis>CAS Consumers</emphasis>.</para></listitem>
-      
-      <listitem><para>Operational parameters of the <emphasis>Collection Processing
-        Manager</emphasis> (CPM), such as checkpoint frequency and deployment
-        mode.</para></listitem>
-      
-      <listitem><para>Resource Manager Configuration (optional). </para></listitem>
-      </orderedlist>
-    
-    <para>The CPE Descriptor has the following high level skeleton:
-      
-      
-      <programlisting><![CDATA[<?xml version="1.0"?>
-<cpeDescription>
-   <collectionReader>
-...
-   </collectionReader>
-   <casProcessors>
-...
-   </casProcessors>
-   <cpeConfig>
-...
-   </cpeConfig>
-   <resourceManagerConfiguration>
-...
-   </resourceManagerConfiguration>
-</cpeDescription>]]></programlisting></para>
-    
-    <para>Details of each of the four main elements are described in the sections that
-      follow.</para>
- </section>   
-    <section id="&tp;descriptor.collection_reader">
-      <title>Collection Reader</title>
-      
-      <para>The <literal>&lt;collectionReader&gt;</literal> section identifies the
-        Collection Reader and optional CAS Initializer that are to be used in the CPE. The
-        Collection Reader is responsible for retrieval of artifacts from a collection
-        outside of the CPE, and the optional CAS Initializer (deprecated as of UIMA Version 2)
-        is responsible for initializing the CAS with the artifact.</para>
-      
-      <para>A Collection Reader may initialize the CAS itself, in which case it does not
-        require a CAS Initializer. This should be clearly specified in the documentation for
-        the Collection Reader. Specifying a CAS Initializer for a Collection Reader that
-        does not make use of a CAS Initializer will not cause an error, but the specified CAS
-        Initializer will not be used.</para>
-      
-      <para>The complete structure of the <literal>&lt;collectionReader&gt;</literal>
-        section is:
-        
-        
-        <programlisting><![CDATA[<collectionReader>
-  <collectionIterator>
-    <descriptor>
-      <import ...> | <include .../>
-    </descriptor>
-    <configurationParameterSettings>...</configurationParameterSettings>
-    <sofaNameMappings>...</sofaNameMappings>
-  </collectionIterator>
-  <casInitializer>
-    <descriptor>
-      <import ...> | <include .../>
-    </descriptor>
-    <configurationParameterSettings>...</configurationParameterSettings>
-    <sofaNameMappings>...</sofaNameMappings>
-  </casInitializer>
-</collectionReader>]]></programlisting></para>
-      
-      <para>The <literal>&lt;collectionIterator&gt;</literal> identifies the
-        descriptor for the Collection Reader, and the <literal>&lt;casInitializer&gt;
-        </literal>identifies the descriptor for the CAS Initializer. The format and
-        details of the Collection Reader and CAS Initializer descriptors are described in
-          <olink targetdoc="&uima_docs_ref;"
-          targetptr="ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader"/>
-        . The <literal>&lt;configurationParameterSettings&gt; </literal>and the
-        <literal>&lt;sofaNameMappings&gt;</literal> elements are described in the next
-        section.</para>
-      
-      <section id="&tp;descriptor.collection_reader.error_handling">
-        <title>Error handling for Collection Readers</title>
-        
-        <para>The CPM will abort if the Collection Reader throws a large number of
-          consecutive exceptions (default = 100). This default can by changed by using the
-          Java initialization parameter <literal>-DMaxCRErrorThreshold
-          xxx.</literal></para>
-      </section>
-    </section>
-    
-    <section id="&tp;descriptor.cas_processors">
-      <title>CAS Processors</title>
-      
-      <para>The <literal>&lt;casProcessors&gt;</literal> section identifies the
-        components that perform the analysis on the input data, including CAS analysis
-        (Analysis Engines) and analysis results extraction (CAS Consumers). The CAS
-        Consumers may also perform collection level analysis, where the analysis is
-        performed (or aggregated) over multiple CASes. The basic structure of the CAS
-        Processors section is:
-        
-        
-        <programlisting><![CDATA[<casProcessors 
-    dropCasOnException="true|false"
-    casPoolSize="[Number]" 
-    processingUnitThreadCount="[Number]">
-
-  <casProcessor ...>
-        ...
-  </casProcessor>
-
-  <casProcessor ...>
-        ...
-  </casProcessor>
-    ...
-</casProcessors>]]></programlisting></para>
-      
-      <para>The <literal>&lt;casProcessors&gt;</literal> section has two mandatory
-        attributes and one optional attribute that configure the characteristics of the CAS
-        Processor flow in the CPE. The first mandatory attribute is a casPoolSize, which
-        defines the fixed number of CAS instances that the CPM will create and use during
-        processing. All CAS instances are maintained in a CAS Pool with a check-in and
-        check-out access. Each CAS is checked-out from the CAS Pool by the Collection Reader
-        and initialized with an initial subject of analysis. The CAS is checked-in into the
-        CAS Pool when it is completely processed, at the end of the processing chain. A larger
-        CAS Pool size will result in more memory being used by the CPM. CAS objects can be large
-        and care should be taken to determine the optimum size of the CAS Pool, weighing memory
-        tradeoffs with performance.</para>
-      
-      <para>The second mandatory <literal>&lt;casProcessors&gt;</literal> attribute
-        is <literal>processingUnitThreadCount</literal>, which specifies the number of
-        replicated <emphasis>Processing Pipelines</emphasis>. Each Processing
-        Pipeline runs in its own thread. The CPM takes CASes from the work queue and submits
-        each CAS to one of the Processing Pipelines for analysis. A Processing Pipeline
-        contains one or more Analysis Engines invoked in a given sequence. If more than one
-        Processing Pipeline is specified, the CPM replicates instances of each Analysis
-        Engine defined in the CPE descriptor. Each Processing Pipeline thread runs
-        independently, consuming CASes from work queue and depositing CASes with analysis
-        results onto the output queue. On multiprocessor machines, multiple Processing
-        Pipelines can run in parallel, improving overall throughput of the CPM.</para>
-      <note><para>The number of Processing Pipelines should be equal to or greater than CAS
-      Pool size. </para></note>
-      
-      <para>Elements in the pipeline (each represented by a &lt;casProcessor&gt; element)
-        may indicate that they do not permit multiple deployment in their Analysis Engine
-        descriptor. If so, even though multiple pipelines are being used, all CASes passing
-        through the pipelines will be routed through one instance of these marked Engines.
-        </para>
-      
-      <para>The final, optional, &lt;casProcessors&gt; attribute is
-        <literal>dropCasOnException</literal>. It defines a policy that determines what
-        happens with the CAS when an exception happens during processing. If the value of this
-        attribute is set to true and an exception happens, the CPM will notify all registered
-        listeners of the exception (see <olink targetdoc="&uima_docs_tutorial_guides;"
-          targetptr="ugr.tug.cpe.using_listeners"/>), clear the CAS and check the CAS
-        back into the CAS Pool so that it can be re-used. The presumption is that an exception
-        may leave the CAS in an inconsistent state and therefore that CAS should not be allowed
-        to move through the processing chain. When this attribute is omitted the CPM&apos;s
-        default is the same as specifying
-        <literal>dropCasOnException="false"</literal>.</para>
-      
-      <section id="&tp;descriptor.cas_processors.individual">
-        <title>Specifying an Individual CAS Processor</title>
-        
-        <para>The CAS Processors that make up the Processing Pipeline and the CAS Consumer
-          pipeline are specified with the <literal>&lt;casProcessor&gt;</literal>
-          entity, which appears within the <literal>&lt;casProcessors&gt;</literal>
-          entity. It may appear multiple times, once for each CAS Processor specified for
-          this CPE.</para>
-        
-        <para>The order of the <literal>&lt;casProcessor&gt;</literal> entities with
-          the <literal>&lt;casProcessors&gt;</literal> section specifies the order in
-          which the CAS Processors will run. Although CAS Consumers are usually put at the end
-          of the pipeline, they need not be. Also, Aggregate Analysis Engines may include CAS
-          Consumers.</para>
-        
-        <para>The overall format of the <literal>&lt;casProcessor&gt;</literal> entity
-          is:
-          
-          
-          <programlisting><![CDATA[<casProcessor deployment="local|remote|integrated" name="[String]" >
-    <descriptor>
-      <import ...> | <include .../>
-    </descriptor>
-    <configurationParameterSettings>...</configurationParameterSettings>
-    <sofaNameMappings>...</sofaNameMappings>
-    <runInSeparateProcess>...</runInSeparateProcess>
-    <deploymentParameters>...</deploymentParameters>
-    <filter/>
-    <errorHandling>...</errorHandling>
-    <checkpoint batch="Number"/>
-</casProcessor>]]></programlisting></para>
-        
-        <para>The <literal>&lt;casProcessor&gt;</literal> element has two mandatory
-          attributes, <literal>deployment</literal> and <literal>name</literal>. The
-          mandatory <literal>name</literal> attribute specifies a unique string
-          identifying the CAS Processor.</para>
-        
-        <para>The mandatory <literal>deployment</literal> attribute specifies the CAS
-          Processor deployment mode. Currently, three deployment options are supported:
-          
-          <variablelist>
-            <varlistentry>
-              <term>integrated</term>
-              <listitem><para>indicates <emphasis>integrated</emphasis> deployment
-                of the CAS Processor. The CPM deploys and collocates the CAS Processor in the
-                same process space as the CPM. This type of deployment is recommended to
-                increase the performance of the CPE. However, it is NOT recommended to
-                deploy annotators containing JNI this way. Such CAS Processors may cause a
-                fatal exception and force the JVM to exit without cleanup (bringing down the
-                CPM). Any UIMA SDK compliant pure Java CAS Processors may be safely deployed
-                this way.</para>
-                <para>The descriptor for an integrated deployment can, in fact, be a remote
-                  service descriptor. When used this way, however, the CPM error recovery 
-                  options (see below) operate in the integrated mode, which means that many 
-                  of the retry options are not available.</para></listitem>
-            </varlistentry>
-            <varlistentry>
-              <term>remote</term>
-              <listitem><para>indicates <emphasis>non-managed</emphasis>
-                deployment of the CAS Processor. The CAS Processor descriptor referenced
-                in the <literal>&lt;descriptor&gt;</literal> element must be a Vinci
-                <emphasis>Service Client Descriptor</emphasis>, which identifies a
-                remotely deployed CAS Processor service (see <olink
-                  targetdoc="&uima_docs_tutorial_guides;"
-                  targetptr="ugr.tug.application.remote_services"/>). The CPM
-                assumes that the CAS Processor is already running as a remote service and
-                will connect to it using the URI provided in the client service descriptor.
-                The lifecycle of a remotely deployed CAS Processor is not managed by the CPM,
-                so appropriate infrastructure should be in place to start/restart such CAS
-                Processors when necessary. This deployment provides fault isolation and
-                is implementation (i.e., programming language) neutral.</para>
-                </listitem>
-            </varlistentry>
-            <varlistentry>
-              <term>local</term>
-              <listitem><para>indicates <emphasis>managed</emphasis> deployment of
-                the CAS Processor. The CAS Processor descriptor referenced in the
-                <literal>&lt;descriptor&gt;</literal> element must be a Vinci
-                <emphasis>Service Deployment Descriptor</emphasis>, which configures
-                a CAS Processor for deployment as a Vinci service (see <olink
-                  targetdoc="&uima_docs_tutorial_guides;"
-                  targetptr="ugr.tug.application.remote_services"/>). The CPM
-                deploys the CAS Processor in a separate process and manages the life cycle
-                (start/stop) of the CAS Processor. Communication between the CPM and the
-                CAS Processor is done with Vinci. When the CPM completes processing, the
-                process containing the CAS Processor is terminated. This deployment mode
-                insulates the CPM from the CAS Processor, creating a more robust deployment
-                at the cost of a small communication overhead. On multiprocessor machines,
-                the separate processes may run concurrently and improve overall
-                throughput.</para></listitem>
-            </varlistentry>
-          </variablelist></para>
-        
-        <para>A number of elements may appear within the
-          <literal>&lt;casProcessor&gt;</literal> element.</para>
-        
-        <section id="&tp;descriptor.cas_processors.individual.descriptor">
-          <title>&lt;descriptor&gt; Element</title>
-          
-          <para>The <literal>&lt;descriptor&gt;</literal> element is mandatory. It
-            identifies the descriptor for the referenced CAS Processor using the syntax
-            described in <olink targetdoc="&uima_docs_ref;"
-              targetptr="ugr.ref.xml.component_descriptor.aes"/>.
-            
-            <itemizedlist spacing="compact"><listitem><para>For
-              <emphasis><literal>remote</literal></emphasis> CAS Processors, the
-              referenced descriptor must be a Vinci <emphasis>Service Client
-              Descriptor</emphasis>, which identifies a remotely deployed CAS Processor
-              service.</para></listitem>
-              
-              <listitem><para>For <emphasis>local</emphasis> CAS Processors, the
-                referenced descriptor must be a Vinci <emphasis>Service Deployment
-                Descriptor</emphasis>.</para></listitem>
-              
-              <listitem><para>For <emphasis>integrated</emphasis> CAS Processors,
-                the referenced descriptor must be an Analysis Engine Descriptor
-                (primitive or aggregate). </para></listitem></itemizedlist> </para>
-          
-          <para>See <olink targetdoc="&uima_docs_tutorial_guides;"
-              targetptr="ugr.tug.application.remote_services"/> for more
-            information on creating these descriptors and deploying services.</para>
-          
-        </section>
-        
-        <section
-          id="&tp;descriptor.cas_processors.individual.configuration_parameter_settings">
-          <title>&lt;configurationParameterSettings&gt; Element</title>
-          
-          <para>This element provides a way to override the contained Analysis
-            Engine&apos;s parameters settings. Any entry specified here must already be
-            defined; values specified replace the corresponding values for each
-            parameter. <emphasis role="bold-italic">For Cas Processors, this mechanism
-            is only available when they are deployed in <quote>integrated</quote>
-            mode.</emphasis> For Collection Readers and Initializers, it always is
-            available.</para>
-          
-          <para>The content of this element is identical to the component descriptor for
-            specifying parameters (in the case where no parameter groups are
-            specified)<footnote><para>An earlier UIMA version required these to have a
-            suffix of <quote>_p</quote>, e.g., <quote>string_p</quote>. This is no
-            longer required, but this format is accepted, also, for backward
-            compatibility.</para></footnote>. Here is an example:
-            
-            
-            <programlisting><![CDATA[<configurationParameterSettings>
-  <nameValuePair>
-    <name>CivilianTitles</name>
-    <value>
-      <array>
-        <string>Mr.</string>
-        <string>Ms.</string>
-        <string>Mrs.</string>
-        <string>Dr.</string>
-      </array>  
-    </value>
-  </nameValuePair>
-  ...
-</configurationParameterSettings>]]></programlisting></para>
-          
-        </section>
-        
-        <section
-          id="&tp;descriptor.cas_processors.individual.sofa_name_mappings">
-          <title>&lt;sofaNameMappings&gt; Element</title>
-          
-          <para>This optional element provides a mapping from defined Sofa names in the
-            component, or the default Sofa name (if the component does not declare any Sofa
-            names). The form of this element is:
-            
-            
-            <programlisting>&lt;sofaNameMappings&gt;
-  &lt;sofaNameMapping cpeSofaName="a_CPE_name"
-                   componentSofaName="a_component_Name"/&gt;
-  ...
-&lt;/sofaNameMappings&gt;</programlisting></para>
-          
-          <para>There can be any number of<literal>
-            &lt;sofaNameMapping&gt;</literal> elements contained in the
-            <literal>&lt;sofaNameMappings&gt;</literal> element. The
-            <literal>componentSofaName</literal> attribute is optional; leave it out to
-            specify a mapping for the <literal>_InitialView</literal> - that is, for
-            Single-View components.</para>
-          
-        </section>
-        
-        <section id="&tp;descriptor.cas_processors.run_in_separate_process">
-          <title>&lt;runInSeparateProcess&gt; Element</title>
-          
-          <para>The <literal>&lt;runInSeparateProcess&gt;</literal> element is
-            mandatory for <literal>local</literal> CAS Processors, but should not appear
-            for <literal>remote</literal> or <literal>integrated</literal> CAS
-            Processors. It enables the CPM to create external processes using the provided
-            runtime environment. Applications launched this way communicate with the CPM
-            using the Vinci protocol and connectivity is enabled by a local instance of the
-            VNS that the CPM manages. Since communication is based on Vinci, the application
-            need not be implemented in Java. Any language for which Vinci provides support
-            may be used to create an application, and the CPM will seamlessly communicate
-            with it. The overall structure of this element is:
-            
-            
-            <programlisting><![CDATA[<runInSeparateProcess>
-    <exec dir="[String]" executable="[String]">
-        <env key="[String]" value ="[String]"/>
-        ...
-        <arg>[String]</arg>
-        ...
-    </exec>
-</runInSeparateProcess>]]></programlisting></para>
-          
-          <para>The <literal>&lt;exec&gt;</literal> element provides information
-            about how to execute the referenced CAS Processor. Two attributes are defined
-            for the <literal>&lt;exec&gt;</literal> element. The
-            <literal>dir</literal> attribute is currently not used &ndash; it is reserved
-            for future functionality. The <literal>executable</literal> attribute
-            specifies the actual Vinci service executable that will be run by the CPM, e.g.,
-            <literal>java</literal>, a batch script, an application (.exe), etc. The
-            executable must be specified with a fully qualified path, or be found in the
-            <literal>PATH</literal> of the CPM.</para>
-          
-          <para>The <literal>&lt;exec&gt;</literal> element has two elements within it
-            that define parameters used to construct the command line for executing the CAS
-            Processor. These elements must be listed in the order in which they should be
-            defined for the CAS Processor.</para>
-          
-          <para>The optional <literal>&lt;env&gt;</literal> element is used to set an
-            environment variable. The variable <literal>key</literal> will be set to
-            <literal>value</literal>. For example,
-            
-            
-            <programlisting>&lt;env key="CLASSPATH" value="C:Javalib"/&gt;</programlisting>
-            will set the environment variable <literal>CLASSPATH</literal> to the value
-            <literal>C:Javalib</literal>. The <literal>&lt;env&gt;</literal>
-            element may be repeated to set multiple environment variables. All of the
-            key/value pairs will be added to the environment by the CPM prior to launching the
-            executable.</para>
-          <note><para>The CPM actually adds ALL system environment variables when it
-          launches the program. It queries the Operating System for its current system
-          variables and one by one adds them to the program&apos;s process
-          configuration.</para></note>
-          
-          <para>The <literal>&lt;arg&gt;</literal> element is used to specify arbitrary
-            string arguments that will appear on the command line when the CPM runs the
-            command specified in the <literal>executable</literal> attribute.</para>
-          
-          <para>For example, the following would be used to invoke the UIMA Java
-            implementation of the Vinci service wrapper on a Java CAS Processor:
-            
-            
-            <programlisting><![CDATA[<runInSeparateProcess>
-    <exec executable="java">
-        <arg>-DVNS_HOST=localhost</arg> 
-        <arg>-DVNS_PORT=9099</arg>
-        <arg>org.apache.uima.reference_impl.analysis_engine.service.
-vinci.VinciAnalysisEngineService_impl</arg> 
-        <arg>C:uimadescdeployCasProcessor.xml</arg>
-    </exec>
-<runInSeparateProcess>]]></programlisting></para>
-          
-          <para>This will cause the CPM to run the following command line when starting the
-            CAS Processor:
-            
-            
-            <programlisting>java -DVNS_HOST=localhost -DVNS_PORT=9099 
-  org.apache.uima.reference_impl.analysis_engine.service.vinci.\\
-              VinciAnalysisEngineService_impl 
-  C:uimadescdeployCasProcessor.xml</programlisting></para>
-          
-          <para>The first argument specifies that the Vinci Naming Service is running on the
-            <literal>localhost</literal>. The second argument specifies that the Vinci
-            Naming Service port number is <literal>9099</literal>. The third argument
-            (split over 2 lines in this documentation) 
-            identifies the UIMA implementation of the Vinci service wrapper. This class
-            contains the <literal>main</literal> method that will execute. That main
-            method in turn takes a single argument &ndash; the filename for the CAS Processor
-            service deployment descriptor. Thus the last argument identifies the Vinci
-            service deployment descriptor file for the CAS Processor. Since this is the same
-            descriptor file specified earlier in the
-            <literal>&lt;descriptor&gt;</literal> element, the string
-            <literal>${descriptor}</literal> can be used to refer to the descriptor,
-            e.g.:
-            
-            
-            <programlisting>&lt;arg&gt;${descriptor}&lt;/arg&gt;</programlisting></para>
-          
-          <para>The CPM will expand this out to the service deployment descriptor file
-            referenced in the <literal>&lt;descriptor&gt;</literal> element.</para>
-          
-        </section>
-        
-        <section
-          id="&tp;descriptor.cas_processors.individual.deployment_parameters">
-          <title>&lt;deploymentParameters&gt; Element</title>
-          
-          <para>The <literal>&lt;deploymentParameters&gt;</literal> element defines
-            a number of deployment parameters that control how the CPM will interact with the
-            CAS Processor. This element has the following overall form:
-            
-            
-            <programlisting>&lt;deploymentParameters&gt;
-    &lt;parameter name="[String]" value="..." type="string|integer" /&gt; 
-    ...
-&lt;/deploymentParameters&gt;</programlisting></para>
-          
-          <para>The <literal>name</literal> attribute identifies the parameter, the
-            <literal>value</literal> attribute specifies the value that will be assigned
-            to the parameter, and the <literal>type</literal> attribute indicates the
-            type of the parameter, either <literal>string</literal> or
-            <literal>integer</literal>. The available parameters include:
-            
-            <variablelist>
-              
-              <varlistentry>
-                <term>service-access</term>
-                <listitem><para>string parameter whose value must be
-                  <quote>exclusive</quote>, if present. This parameter is only
-                  effective for remote deployments. It modifies the Vinci service
-                  connections to be preallocated and dedicated, one service instance per
-                  pipe-line. It is only relevant for non-Integrated deployement modes. If
-                  there are fewer services instances that are available (and alive &ndash;
-                  responding to a <quote>ping</quote> request) than there are pipelines,
-                  the number of pipelines (the number of concurrent threads) is reduced to
-                  match the number of available instances. If not specified, the VNS is
-                  queried each time a service is needed, and a <quote>random</quote>
-                  instance is assigned from the pool of available instances. If a services
-                  dies during processing, the CPM will use its normal error handling
-                  procedures to attempt to reconnect. The number of attempts is specified
-                  in the CPE descriptor for each Cas Processor using the
-                  <literal>&lt;maxConsecutiveRestarts value="10"
-                  action="kill-pipeline"
-                  waitTimeBetweenRetries="50"/&gt;</literal> xml element. The
-                  <quote>value</quote> attribute is the number of reconnection tries;
-                  the <quote>action</quote> says what to do if the retries exceed the
-                  limit. The <quote>kill-pipeline</quote> action stops the pipeline
-                  that was associated with the failing service (other pipelines will
-                  continue to work). The CAS in process within a killed pipeline will be
-                  dropped. These events are communicated to the application using the
-                  normal event listener mechanism. The
-                  <literal>waitTimeBetweenRetries</literal> says how many
-                  milliseconds to wait inbetween attempts to reconnect.</para>
-                  </listitem>
-              </varlistentry>
-              
-              <varlistentry>
-                <term>vnsHost</term>
-                <listitem><para>(Deprecated) string parameter specifying the VNS host,
-                  e.g., <literal>localhost</literal> for local CAS Processors, host
-                  name or IP address of VNS host for remote CAS Processors. This parameter is
-                  deprecated; use the parameter specification instead inside the Vinci
-                  <emphasis>Service Client Descriptor</emphasis>, if needed. It is
-                  ignored for integrated and local deployments. If present, for remote
-                  deployments, it specifies the VNS Host to use, unless that is specified in
-                  the Vinci <emphasis>Service Client Descriptor</emphasis>.</para>
-                  </listitem>
-              </varlistentry>
-              
-              <varlistentry>
-                <term>vnsPort</term>
-                <listitem><para>(Deprecated) integer parameter specifying the VNS port
-                  number. This parameter is deprecated; use the parameter specification
-                  instead inside the Vinci <emphasis>Service Client
-                  Descriptor,</emphasis> if needed. It is ignored for integrated and
-                  local deployments. If present, for remote deployments, it specifies the
-                  VNS Port number to use, unless that is specified in the Vinci
-                  <emphasis>Service Client Descriptor.</emphasis></para>
-                  </listitem>
-              </varlistentry>
-            </variablelist></para>
-          
-          <para>For example, the following parameters might be used with a CAS Processor
-            deployed in local mode:
-            
-            
-            <programlisting>&lt;deploymentParameters&gt;
-  &lt;parameter name="service-access" value="exclusive" type="string"/&gt; 
-&lt;/deploymentParameters&gt;</programlisting></para>
-          
-        </section>
-        
-        <section id="&tp;descriptor.cas_processors.individual.filter">
-          <title>&lt;filter&gt; Element</title>
-          
-          <para>The &lt;filter&gt; element is a required element but currently should be
-            left empty. This element is reserved for future use.</para>
-          
-        </section>
-        
-        <section id="&tp;descriptor.cas_processors.individual.error_handling">
-          <title>&lt;errorHandling&gt; Element</title>
-          
-          <para>The mandatory <literal>&lt;errorHandling&gt;</literal> element
-            defines error and restart policies for the CAS Processor. Each CAS Processor may
-            define different actions in the event of errors and restarts. The CPM monitors
-            and logs errant behaviors and attempts to recover the component based on the
-            policies specified in this element.</para>
-          
-          <para>There are two kinds of faults:
-            
-            <orderedlist><listitem><para>One kind only occurs with non-integrated CAS
-              Processors &ndash; this fault is either a timeout attempting to launch or
-              connect to the non-integrated component, or some other kind of connection
-              related exception (for instance, the network connection might timeout or get
-              reset).</para></listitem>
-              
-              <listitem><para>The other kind happens when the CAS Processor component (an
-                Annotator, for example) throws any kind of exception. This kind may occur
-                with any kind of deployment, integrated or not. </para></listitem>
-              </orderedlist></para>
-          
-          <para>The &lt;errorHandling&gt; has specifications for each of these kinds of
-            faults. The format of this element is:
-            
-            
-            <programlisting><![CDATA[<errorHandling>
-  <maxConsecutiveRestarts action="continue|disable|terminate"
-                           value="[Number]"/>
-  <errorRateThreshold action="continue|disable|terminate" value="[Rate]"/>
-  <timeout max="[Number]"/>
-</errorHandling>]]></programlisting></para>
-          
-          <para>The mandatory <literal>&lt;maxConsecutiveRestarts&gt;</literal>
-            element applies only to faults of the first kind, and therefore, only applies to
-            non-integrated deployments. If such a fault occurs, a retry is attempted, up to
-            <literal>value="[Number]"</literal> of times. This retry resets the
-            connection (if one was made) and attempts to reconnect and perhaps re-launch
-            (see below for details). The original CAS (not a partially updated one) is sent to
-            the CAS Processor as part of the retry, once the deployed component has been
-            successfully restarted or reconnected to.</para>
-          
-          <para>The <literal>action</literal> attribute specifies the action to take
-            when the threshold specified by the <literal>value="[Number]"</literal> is
-            exceeded. The possible actions are:
-            
-            <variablelist>
-              <varlistentry>
-                <term>continue</term>
-                <listitem><para>skip any further processing for this CAS by this CAS
-                  Processor, and pass the CAS to the next CAS Processor in the Pipeline.
-                  </para>
-                  <para>The <quote>restart</quote> action is done, because it is needed
-                    for the next CAS.</para>
-                  
-                  <para>If the <literal>dropCasOnException="true"</literal>, the CPM
-                    will NOT pass the CAS to the next CAS Processor in the chain. Instead, the
-                    CPM will abort processing of this CAS, release the CAS back to the CAS
-                    Pool and will process the next CAS in the queue.</para>
-                  
-                  <para>The counter counting the restarts toward the threshold is only
-                    reset after a CAS is successfully processed.</para></listitem>
-              </varlistentry>
-              
-              <varlistentry>
-                <term>disable</term>
-                <listitem><para>the current CAS is handled just as in the
-                  <literal>continue</literal> case, but in addition, the CAS Processor
-                  is marked so that its <emphasis>process()</emphasis> method will not be
-                  called again (i.e., it will be <quote>skipped</quote> for future
-                  CASes)</para></listitem>
-              </varlistentry>
-              
-              <varlistentry>
-                <term>terminate</term>
-                <listitem><para>the CPM will terminate all processing and exit.</para>
-                  </listitem>
-              </varlistentry>
-            </variablelist></para>
-          
-          <para>The definition of an error for the
-            <literal>&lt;maxConsecutiveRestarts&gt;</literal> element differs
-            slightly for each of the three CAS Processor deployment modes:
-            <variablelist>
-              <varlistentry>
-                <term>local</term>
-                <listitem><para>Local CAS Processors experience two general error
-                  types:
-                  <itemizedlist>
-                    <listitem><para>launch errors &ndash; errors associated with
-                      launching a process</para></listitem>
-                    <listitem><para>processing errors &ndash; errors associated with
-                      sending Vinci commands to the process</para></listitem>
-                  </itemizedlist></para>
-                  
-                  <para>A launch error is defined by a failure of the process to
-                    successfully register with the local VNS within a default time window.
-                    The current timeout is 15 minutes. Multiple local CAS Processors are
-                    launched sequentially, with a subsequent processor launched
-                    immediately after its previous processor successfully registers
-                    with the VNS.</para>
-                  
-                  <para>A processing error is detected if a connection to the CAS Processor
-                    is lost or if the processing time exceeds a specified timeout
-                    value.</para>
-                  
-                  <para>For local CAS Processors, the
-                    &lt;maxConsecutiveRestarts&gt; element specifies the number of
-                    consecutive attempts made to launch the CAS Processor at CPM startup or
-                    after the CPM has lost a connection to the CAS Processor.</para>
-                  </listitem>
-              </varlistentry>
-              
-              <varlistentry>
-                <term>remote</term>
-                <listitem><para>For remote CAS Processors, the
-                  &lt;maxConsecutiveRestarts&gt; element applies to errors from
-                  sending Vinci commands. An error is detected if a connection to the CAS
-                  Processor is lost, or if the processing time exceeds the timeout value
-                  specified in the &lt;timeout&gt; element (see below).</para>
-                  </listitem>
-              </varlistentry>
-              
-              <varlistentry>
-                <term>integrated</term>
-                <listitem><para>Although mandatory, the
-                  &lt;maxConsecutiveRestarts&gt; element is NOT used for integrated CAS
-                  Processors, because Integrated CAS Processors are not
-                  re-instantiated/restarted on exceptions. This setting is ignored by
-                  the CPM for Integrated CAS Processors but it is required. Future version
-                  of the CPM will make this element mandatory for remote and local CAS
-                  Processors only.</para></listitem>
-              </varlistentry>
-              
-            </variablelist></para>
-          
-          <para>The mandatory <literal>&lt;errorRateThreshold&gt;</literal> element
-            is used for all faults &ndash; both those above, and exceptions thrown by the CAS
-            Processor itself. It specifies the number of retries for exceptions thrown by
-            the CAS Processor itself, a maximum error rate, and the corresponding action to
-            take when this rate is exceeded. The <literal>value</literal> attribute
-            specifies the error rate in terms of errors per sample size in the form
-            <quote><literal>N/M</literal></quote>, where <literal>N</literal> is the
-            number of errors and <literal>M</literal> is the sample size, defined in terms
-            of the number of documents.</para>
-          
-          <para>The first number is used also to indicate the maximum number of retries. If
-            this number is less than the <literal>&lt;maxConsecutiveRestarts
-            value="[Number]"&gt;, </literal>it will override, reducing the number of
-            <quote>restarts</quote> attempted. A retry is done only if the
-            <literal>dropCasOnException </literal>is false. If it is set to true, no retry
-            occurs, but the error is counted.</para>
-          
-          <para>When the number of counted errors exceeds the sample size, an action
-            specified by the <literal>action</literal> attribute is taken. The possible
-            actions and their meaning are the same as described above for the
-            <literal>&lt;maxConsecutiveRestarts&gt;</literal> element:
-            <itemizedlist spacing="compact">
-              <listitem><para><literal>continue</literal></para></listitem>
-              <listitem><para><literal>disable</literal></para></listitem>
-              <listitem><para><literal>terminate</literal></para></listitem>
-            </itemizedlist></para>
-         
-          <para>The <literal>dropCasOnException="true"</literal> attribute of the
-            <literal>&lt;casProcessors&gt;</literal> element modifies the action
-            taken for continue and disable, in the same manner as above. For example:
-            
-            
-            <programlisting>&lt;errorRateThreshold value="3/1000" action="disable"/&gt;</programlisting>
-            specifies that each error thrown by the CAS Processor itself will be retried up to
-            3 times (if <literal>dropCasOnException</literal> is false) and the CAS
-            Processor will be disabled if the error rate exceeds 3 errors in 1000
-            documents.</para>
-          
-          <para>If a document causes an error and the error rate threshold for the CAS
-            Processor is not exceeded, the CPM increments the CAS Processor&apos;s error
-            count and retries processing that document (if
-            <literal>dropCasOnException</literal> is false). The retry means that the
-            CPM calls the CAS Processor&apos;s process() method again, passing in as an
-            argument the same CAS that previously caused an exception.</para>
-          <note><para>The CPM does not attempt to rollback any partial changes that may have
-          been applied to the CAS in the previous process() call. </para></note>
-          
-          <para>Errors are accumulated across documents. For example, assume the error
-            rate threshold is <literal>3/1000</literal>. The same document may fail three
-            times before finally succeeding on the fourth try, but the error count is now 3. If
-            one more error occurs within the current sample of 1000 documents, the error rate
-            threshold will be exceeded and the specified action will be taken. If no more
-            errors occur within the current sample, the error counter is reset to 0 for the
-            next sample of 1000 documents.</para>
-          
-          <para>The <literal>&lt;timeout&gt;</literal> element is a mandatory element.
-            Although mandatory for all CAS Processors, this element is only relevant for
-            local and remote CAS Processors. For integrated CAS Processors, this element is
-            ignored. In the current CPM implementation the integrated CAS Processor
-            process() method is not subject to timeouts.</para>
-          
-          <para>The <literal>max</literal> attribute specifies the maximum amount of
-            time in milliseconds the CPM will wait for a process() method to complete When
-            exceeded, the CPM will generate an exception and will treat this as an error
-            subject to the threshold defined in the
-            <literal>&lt;errorRateThreshold&gt;</literal> element above, including
-            doing retries.</para>
-          
-          <section
-            id="&tp;descriptor.cas_processors.individual.error_handling.timeout_retry_action">
-            <title>Retry action taken on a timeout</title>
-            
-            <para>The action taken depends on whether the CAS Processor is local (managed)
-              or remote (unmanaged). Local CAS Processors (which are services) are killed
-              and restarted, and a new connection to them is established. For remote CAS
-              Processors, the connection to them is dropped, and a new connection is
-              reestablished (which may actually connect to a different instance of the
-              remote services, if it has multiple instances).</para>
-          </section>
-        </section>
-        
-        <section id="&tp;descriptor.cas_processors.individual.checkpoint">
-          <title>&lt;checkpoint&gt; Element</title>
-          
-          <para>The <literal>&lt;checkpoint&gt;</literal> element is an optional
-            element used to improve the performance of CAS Consumers. It has a single
-            attribute, <literal>batch</literal>, which specifies the number of CASes in a
-            batch, e.g.:
-            
-            
-            <programlisting>&lt;checkpoint batch="1000"&gt;</programlisting></para>
-          
-          <para>sets the batch size to 1000 CASes. The batch size is the interval used to mark a
-            point in processing requiring special handling. The CAS Processor&apos;s
-            <literal>batchProcessComplete()</literal> method will be called by the CPM
-            when this mark is reached so that the processor can take appropriate action. This
-            mark could be used as a mechanism to buffer up results in CAS Consumers and perform
-            time-consuming operations, such as check-pointing, that should not be done on a
-            per-document basis.</para>
-          
-        </section>
-      </section>
-    </section>
-    
-    <section id="&tp;descriptor.operational_parameters">
-      <title>CPE Operational Parameters</title>
-      
-      <para>The parameters for configuring the overall CPE and CPM are specified in the
-        <literal>&lt;cpeConfig&gt;</literal> section. The overall format of this
-        section is:
-        
-        
-        <programlisting><![CDATA[<cpeConfig>
-  <startAt>[NumberOrID]</startAt>
-
-  <numToProcess>[Number]</numToProcess>
-
-  <outputQueue dequeueTimeout="[Number]" queueClass="[ClassName]" />
-
-  <checkpoint file="[File]" time="[Number]" batch="[Number]"/>
-
-  <timerImpl>[ClassName]</timerImpl>
-
-  <deployAs>vinciService|interactive|immediate|single-threaded
-  </deployAs>
-
-</cpeConfig>]]></programlisting></para>
-      
-      <para>This section of the CPE descriptor allows for defining the starting entity, the
-        number of entities to process, a checkpoint file and frequency, a pluggable timer, an
-        optional output queue implementation, and finally a mode of operation. The mode of
-        operation determines how the CPM interacts with users and other systems.</para>
-      
-      <para>The <literal>&lt;startAt&gt;</literal> element is an optional argument. It
-        defines the starting entity in the collection at which the CPM should start
-        processing.</para>
-      
-      <para>The implementation in the CPM passes this argument to the Collection Reader
-        as the value of the parameter <quote><literal>startNumber</literal></quote>.
-        The CPM does not do anything else with this parameter; in particular, the CPM has no
-        ability to skip to a specific document - that function, if available, is only provided
-        by a particular Collection Reader implementation.</para>
-      
-      <para>If the <literal>&lt;startAt&gt;</literal> element is used, the Collection
-        Reader descriptor must define a single-valued configuration parameter with the
-        name <literal>startNumber</literal>. It can declare this value to be of any type;
-        the value passed in this XML element must be convertible to that type.</para>
-      
-      <para>A typical use is to declare this to be an integer type, and to pass the sequential
-        document number where processing should start. An alternative implementation
-        might take a specific document ID; the collection reader could search through its
-        collection until it reaches this ID and then start there.</para>
-      
-      <para>This parameter will only make sense if the particular collection reader is
-        implemented to use the <literal>startNumber</literal> configuration
-        parameter.</para>
-      
-      <para>The <literal>&lt;numToProcess&gt;</literal> element is an optional
-        element. It specifies the total number of entities to process. Use -1 to indicate ALL.
-        If not defined, the number of entities to process will be taken from the Collection
-        Reader configuration. If present, this value overrides the Collection Reader
-        configuration.</para>
-      
-      <para>The <literal>&lt;outputQueue&gt;</literal> element is an optional element.
-        It enables plugging in a custom implementation for the Output Queue. When omitted,
-        the CPM will use a default output queue that is based on First-in First-out (FIFO)
-        model.</para>
-      
-      <para>The UIMA SDK provides a second implementation for the Output Queue that can be
-        plugged in to the CPM, named <quote>
-        <literal>org.apache.uima.collection.impl.cpm.engine.SequencedQueue</literal>
-        </quote>.</para>
-      
-      <para>This implementation supports handling very large documents that are split into
-        <quote>chunks</quote>; it provides a delivery mechanism that insures the
-        sequential order of the chunks using information carried in the CAS metadata. This
-        metadata, which is required for this implementation to work correctly, must be added
-        as an instance of a Feature Structure of type
-        <literal>org.apache.es.tt.DocumentMetaData</literal> and referred to by an
-        additional feature named <literal>esDocumentMetaData</literal> in the special
-        instance of <literal>uima.tcas.DocumentAnnotation</literal> that is
-        associated with the CAS. This is usually done by the Collection Reader; the instance
-        contains the following features:
-        
-        <variablelist>
-          <varlistentry>
-            <term>sequenceNumber</term>
-            <listitem><para>[Number] the sequential number of a chunk, starting at 1. If
-              not a chunk (i.e. complete document), the value should be 0.</para>
-              </listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>documentId</term>
-            <listitem><para>[Number] current document id. Chunks belonging to the same
-              document have identical document id.</para></listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>isCompleted</term>
-            <listitem><para>[Number] 1 if the chunk is the last in a sequence, 0
-              otherwise.</para></listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>url</term>
-            <listitem><para>[String] document url.</para></listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>throttleID</term>
-            <listitem><para>[String] special attribute currently used by
-              OmniFind.</para></listitem>
-          </varlistentry>
-        </variablelist></para>
-      
-      <para>This implementation of a sequenced queue supports proper sequencing of CASes in
-        CPM deployments that use document chunking. Chunking is a technique of splitting
-        large documents into pieces to reduce overall memory consumption. Chunking does not
-        depend on the number of CASes in the CAS Pool. It works equally well with one or more
-        CASes in the CAS Pool. Each chunk is packaged in a separate CAS and placed in the Work
-        Queue. If the CAS Pool is depleted, the CollectionReader thread is suspended until a
-        CAS is released back to the pool by the processing threads. A document may be split into
-        1, 2, 3 or more chunks that are analyzed independently. In order to reconstruct the
-        document correctly, the CAS Consumer can depend on receiving the chunks in the same
-        sequential order that the chunks were <quote>produced</quote>, when this
-        sequenced queue implementation is used. To plug in this sequenced queue to the CPM use
-        the following specification:
-        
-        
-        <programlisting>&lt;outputQueue dequeueTimeout="100000" queueClass=
-"org.apache.uima.collection.impl.cpm.engine.SequencedQueue"/&gt;</programlisting>
-        
-        where the mandatory <literal>queueClass</literal> attribute defines the name of
-        the class and the second mandatory attribute, <literal>dequeueTimeout</literal>
-        specifies the maximum number of milliseconds to wait for the expected chunk.</para>
-      
-      <note><para>The value for this timeout must be carefully determined to avoid
-      excessive occurrences of timeouts. Typically, the size of a chunk and the type of
-      analysis being done are the most important factors when deciding on the value for the
-      timeout. The larger the chunk and the more complicated analysis, the more time it takes
-      for the chunk to go from source to sink. You may specify 0, in which case, the timeout is 
-      disabled - i.e., it is equivalent to an infinitely long timeout.</para></note>
-      
-      <para>If the chunk doesn&apos;t arrive in the configured time window, the entire
-        document is presumed to be invalid and the CAS is dropped from further processing.
-        This action occurs regardless of any other error action specification. The
-        SequencedQueue invalidate the document, adding the offending document&apos;s
-        metadata to a local cache of invalid documents. </para>
-      
-      <para>If the time out occurs, the CPM notifies all registered listeners (see <olink
-          targetdoc="&uima_docs_tutorial_guides;"
-          targetptr="ugr.tug.cpe.using_listeners"/>) by calling
-        entityProcessComplete(). As part of this call, the SequencedQueue will pass null
-        instead of a CAS as the first argument, and a special exception &ndash;
-        CPMChunkTimeoutException. The reason for passing null as the first argument is
-        because the time out occurs due to the fact that the chunk has not been received in the
-        configured timeout window, so there is no CAS available when the timeout event
-        occurs.</para>
-      
-      <para>The CPMChunkTimeoutException object includes an API that allows the listener
-        to retrieve the offending document id as well as the other metadata attributes as
-        defined above. These attributes are part of each chunk&apos;s metadata and are added
-        by the Collection Reader.</para>
-      
-      <para>Each chunk that SequencedQueue works on is subjected to a test to determine if the
-        chunk belongs to an invalid document. This test checks the chunk&apos;s metadata
-        against the data in the local cache. If there is a match, the chunk is dropped. This
-        check is only performed for chunks and complete documents are not subject to this
-        check.</para>
-      
-      <para>If there is an exception during the processing of a chunk, the CPM sends a
-        notification to all registered listeners. The notification includes the CAS and an
-        exception. When the listener notification is completed, the CPM also sends separate
-        notifications, containing the CAS, to the Artifact Producer and the
-        SequencedQueue. The intent is to stop adding new chunks to the Work Queue that belong
-        to an <quote>invalid</quote> document and also to deal with chunks that are
-        en-route, being processed by the processing threads.</para>
-      
-      <para>In response to the notification, the Artifact Producer will drop and release
-        back to the CAS Pool all CASes that belong to an <quote>invalid</quote> document.
-        Currently, there is no support in the CollectionReader&apos;s API to tell it to stop
-        generating chunks. The CollectionReader keeps producing the chunks but the
-        Artifact Producer immediately drops/releases them to the CAS Pool. Before the CAS is
-        released back to the CAS Pool, the Artifact Producer sends notification to all
-        registered listeners. This notification includes the CAS and an exception &ndash;
-        SkipCasException.</para>
-      
-      <para>In response to the notification of an exception involving a chunk, the
-        SequencedQueue retrieves from the CAS the metadata and adds it to its local cache of
-        <quote>invalid</quote> documents. All chunks de-queued from the OutputQueue and
-        belonging to <quote>invalid</quote> documents will be dropped and released back to
-        the CAS Pool. Before dropping the CAS, the CPM sends notification to all registered
-        listeners. The notification includes the CAS and SkipCasException.</para>
-      
-      <para>The <literal>&lt;checkpoint&gt;</literal> element is an optional element.
-        It specifies a CPE checkpoint file, checkpoint frequency, and strategy for
-        checkpoints (time or count based). At checkpoint time, the CPM saves status
-        information and statistics to the checkpoint file. The checkpoint file is specified
-        in the <literal>file</literal> attribute, which has the same form as the
-        <literal>href</literal> attribute of the <literal>&lt;include&gt;</literal>
-        element described in <xref linkend="&tp;imports"/>. The
-        <literal>time</literal> attribute indicates that a checkpoint should be taken
-        every <literal>[Number]</literal> seconds, and the <literal>batch</literal>
-        attribute indicates that a checkpoint should be taken every
-        <literal>[Number]</literal> batches.</para>
-      
-      <para>The <literal>&lt;timerImpl&gt;</literal> element is optional. It is used to
-        identify a custom timer plug-in class to generate time stamps during the CPM
-        execution. The value of the element is a Java class name.</para>
-      
-      <para>The <literal>&lt;deployAs&gt;</literal> element indicates the type of CPM
-        deployment. Valid contents for this element include:
-        
-        <variablelist>
-          <varlistentry>
-            <term>vinciService</term>
-            <listitem><para>Vinci service exposing APIs for stop, pause, resume, and
-              getStats</para></listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>interactive</term>
-            <listitem><para>provide command line menus (start, stop, pause,
-              resume)</para></listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>immediate</term>
-            <listitem><para>run the CPM without menus or a service API</para></listitem>
-          </varlistentry>
-          <varlistentry>
-            <term>single-threaded</term>
-            <listitem><para>run the CPM in a single threaded mode. In this mode, the
-              Collection Reader, the Processing Pipeline, and the CAS Consumer Pipeline
-              are all running in one thread without the work queue and the output
-              queue.</para></listitem>
-          </varlistentry>
-        </variablelist></para>
-      
-    </section>
-    
-    <section id="&tp;descriptor.resource_manager_configuration">
-      <title>Resource Manager Configuration</title>
-      
-      <para>External resource bindings for the CPE may optionally be specified in an
-        element:
-        
-        
-        <programlisting>&lt;resourceManagerConfiguration href="..."/&gt;</programlisting></para>
-      
-      <para>For an introduction to external resources, refer to <olink
-          targetdoc="&uima_docs_tutorial_guides;"
-          targetptr="ugr.tug.aae.accessing_external_resource_files"/>.</para>
-      
-      <para>In the <literal>resourceManagerConfiguration</literal> element, the value
-        of the href attribute refers to another file that contains definitions and bindings
-        for the external resources used by the CPE. The format of this file is the same as the XML
-        snippet <olink targetdoc="&uima_docs_ref;"
-          targetptr="ugr.ref.xml.component_descriptor.aes.aggregate.external_resource_bindings"/>
-        . For example, in a CPE containing an aggregate analysis engine with two annotators,
-        and a CAS Consumer, the following resource manager configuration file would bind
-        external resource dependencies in all three components to the same physical
-        resource:
-        
-        
-        <programlisting><![CDATA[<resourceManagerConfiguration>
-
-  <!-- Declare Resource -->
-
-  <externalResources>
-    <externalResource>
-      <name>ExampleResource</name>
-      <fileResourceSpecifier>
-        <fileUrl>file:MyResourceFile.dat</fileUrl>
-      </fileResourceSpecifier>
-    </externalResource>
-  </externalResources>
-
-  <!-- Bind component resource dependencies to ExampleResource -->
-
-  <externalResourceBindings>
-    <externalResourceBinding>
-      <key>MyAE/annotator1/myResourceKey</key>
-      <resourceName>ExampleResource</resourceName>
-    </externalResourceBinding>
-
-    <externalResourceBinding>
-      <key>MyAE/annotator2/someResourceKey</key>
-      <resourceName>ExampleResource</resourceName>
-    </externalResourceBinding>
-
-    <externalResourceBinding>
-      <key>MyCasConsumer/otherResourceKey</key>
-      <resourceName>ExampleResource</resourceName>
-    </externalResourceBinding>
-
-  </externalResourceBindings>
-
-</resourceManagerConfiguration>]]></programlisting></para>
-      
-      <para>In this example, <literal>MyAE</literal> and
-        <literal>MyCasConsumer</literal> are the names of the Analysis Engine and CAS
-        Consumer, as specified by the name attributes of the CPE&apos;s
-        <literal>&lt;casProcessor&gt;</literal> elements.
-        <literal>annotator1</literal> and <literal>annotator2</literal> are the
-        annotator keys specified within the Aggregate AE Descriptor, and
-        <literal>myResourceKey</literal>, <literal>someResourceKey</literal>, and
-        <literal>otherResourceKey</literal> are the keys of the resource dependencies
-        declared in the individual annotator and CAS Consumer descriptors.</para>
-      
-    </section>
-    
-    <section id="&tp;descriptor.example">
-      <title>Example CPE Descriptor</title>
-      
-      
-      <programlisting><![CDATA[<?xml version="1.0" encoding="UTF-8"?>
-<cpeDescription>
-  <collectionReader>
-    <collectionIterator>
-      <descriptor>
-        <import location=
-           "../collection_reader/FileSystemCollectionReader.xml"/>
-      </descriptor>
-    </collectionIterator>
-  </collectionReader>
-  <casProcessors dropCasOnException="true" casPoolSize="1" 
-      processingUnitThreadCount="1">
-    <casProcessor deployment="integrated" 
-      name="Aggregate TAE - Name Recognizer and Person Title Annotator">
-      <descriptor>
-        <import location=
-           "../analysis_engine/NamesAndPersonTitles_TAE.xml"/>
-      </descriptor>
-      <deploymentParameters/>
-      <filter/>
-      <errorHandling>
-        <errorRateThreshold action="terminate" value="100/1000"/>
-                <maxConsecutiveRestarts action="terminate" value="30"/>
-                <timeout max="100000"/>
-      </errorHandling>
-      <checkpoint batch="1"/>
-    </casProcessor>
-    <casProcessor deployment="integrated" name="Annotation Printer">
-      <descriptor>
-        <import location="../cas_consumer/AnnotationPrinter.xml"/>
-      </descriptor>
-      <deploymentParameters/>
-      <filter/>
-      <errorHandling>
-        <errorRateThreshold action="terminate" value="100/1000"/>
-        <maxConsecutiveRestarts action="terminate" value="30"/>
-        <timeout max="100000"/>
-      </errorHandling>
-      <checkpoint batch="1"/>
-    </casProcessor>
-  </casProcessors>
-  <cpeConfig>
-    <numToProcess>1</numToProcess>
-    <deployAs>immediate</deployAs>
-    <checkpoint file="" time="3000"/>
-    <timerImpl/>
-  </cpeConfig>
-</cpeDescription>]]></programlisting>
-    </section>
-  
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY imgroot "../images/references/ref.xml.cpe_descriptor/">
+<!ENTITY tp "ugr.ref.xml.cpe_descriptor.">
+<!ENTITY % uimaents SYSTEM "../entities.ent" >  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.ref.xml.cpe_descriptor">
+  <title>Collection Processing Engine Descriptor Reference</title>
+  <titleabbrev>CPE Descriptor Reference</titleabbrev>
+  
+  <para>A UIMA <emphasis>Collection Processing Engine</emphasis> (CPE) is a combination
+    of UIMA components assembled to analyze a collection of artifacts. A CPE is an
+    instantiation of the UIMA <emphasis>Collection Processing Architecture</emphasis>,
+    which defines the collection processing components, interfaces, and APIs. A CPE is
+    executed by a UIMA framework component called the <emphasis>Collection Processing
+    Manager</emphasis> (CPM), which provides a number of services for deploying CPEs,
+    running CPEs, and handling errors.</para>
+  
+  <para>A CPE can be assembled programmatically within a Java application, or it can be
+    assembled declaratively via a CPE configuration specification, called a CPE
+    Descriptor. This chapter describes the format of the CPE Descriptor.</para>
+  
+  <para>Details about the CPE, including its function, sub-components, APIs, and related
+    tools, can be found in <olink targetdoc="&uima_docs_tutorial_guides;"
+      targetptr="ugr.tug.cpe"/>. Here we briefly summarize the CPE to define terms and
+    provide context for the later sections that describe the CPE Descriptor.</para>
+  
+  <section id="&tp;overview">
+    <title>CPE Overview</title>
+    
+    <figure id="&tp;overview.fig.runtime">
+      <title>CPE Runtime Overview</title>
+      <mediaobject>
+        <imageobject>
+          <imagedata width="5.8in" format="PNG"
+            fileref="&imgroot;image002.png"/>
+        </imageobject>
+        <textobject><phrase>CPE Runtime Overview diagram</phrase></textobject>
+      </mediaobject>
+    </figure>
+    
+    <para>An illustration of the CPE runtime is shown in <xref
+        linkend="&tp;overview.fig.runtime"/>. Some of the CPE components, such as the
+      <emphasis>queues</emphasis> and <emphasis>processing pipelines</emphasis>, are
+      internal to the CPE, but their behavior and deployment may be configured using the CPE
+      Descriptor. Other CPE components, such as the <emphasis>Collection
+      Reader</emphasis> and <emphasis>CAS Processors</emphasis>, are defined and
+      configured externally from the CPE and then plugged in to the CPE to create the overall
+      engine. The parts of a CPE are:
+      
+      <variablelist>
+        <varlistentry>
+          <term>Collection Reader</term>
+          <listitem><para>understands the native data collection format and iterates
+            over the collection producing subjects of analysis</para></listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>CAS Initializer<footnote><para>Deprecated</para></footnote>
+            </term>
+          <listitem><para>initializes a CAS with a subject of analysis</para>
+            </listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>Artifact Producer</term>
+          <listitem><para>asynchronously pulls CASes from the Collection Reader,
+            creates batches of CASes and puts them into the work queue</para></listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>Work Queue</term>
+          <listitem><para>shared queue containing batches of CASes queued by the Artifact
+            Producer for analysis by Analysis Engines</para>
+          </listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>B1-Bn</term>
+          <listitem><para>individual batches containing 1 or more CASes</para>
+            </listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>AE1-AEn</term>
+          <listitem><para>Analysis Engines arranged by a CPE descriptor</para>
+            </listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>Processing Pipelines</term>
+          <listitem><para>each pipeline runs in a separate thread and contains a
+            replicated set of the Analysis Engines running in the defined sequence</para>
+            </listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>Output Queue</term>
+          <listitem><para>holds batches of CASes with analysis results intended for CAS
+            Consumers</para></listitem>
+        </varlistentry>
+        
+        <varlistentry>
+          <term>CAS Consumers</term>
+          <listitem><para>perform collection level analysis over the CASes and extract
+            analysis results, e.g., creating indexes or databases</para></listitem>
+        </varlistentry>
+      </variablelist>
+      </para>
+  </section>
+  
+  <section id="&tp;notation">
+    <title>Notation</title>
+    
+    <para>CPE Descriptors are XML files. This chapter uses an informal notation to specify
+      the syntax of CPE Descriptors.</para>
+    
+    <para>The notation used in this chapter is:
+      
+      <itemizedlist><listitem><para>An ellipsis (...) inside an element body indicates
+        that the substructure of that element has been omitted (to be described in another
+        section of this chapter). An example of this would be:
+        
+        
+        <programlisting>&lt;collectionReader&gt;
+...
+&lt;/collectionReader&gt;</programlisting></para>
+        </listitem>
+        
+        <listitem><para>An ellipsis immediately after an element indicates that the
+          element type may be repeated arbitrarily many times. For example:
+          
+          
+          <programlisting>&lt;parameter&gt;[String]&lt;/parameter&gt;
+&lt;parameter&gt;[String]&lt;/parameter&gt;
+...</programlisting>
+          indicates that there may be arbitrarily many parameter elements in this
+          context.</para></listitem>
+        
+        <listitem><para>An ellipsis inside an element means details of the attributes
+          associated with that element are defined later, e.g.:
+          
+          <programlisting>&lt;casProcessor ...&gt;</programlisting></para>
+          </listitem>
+        
+        <listitem><para>Bracketed expressions (e.g. <literal>[String]</literal>)
+          indicate the type of value that may be used at that location.</para></listitem>
+        
+        <listitem><para>A vertical bar, as in <literal>true|false</literal>, indicates
+          alternatives. This can be applied to literal values, bracketed type names, and
+          elements. </para></listitem></itemizedlist></para>
+    
+    <para>Which elements are optional and which are required is specified in prose, not in the
+      syntax definition.</para>
+    
+  </section>
+  
+  <section id="&tp;imports">
+    <title>Imports</title>
+    
+    <para>As of version 2.2, a CPE Descriptor can use the same <literal>import</literal> mechanism
+      as other component descriptors.  This allows referring to component
+      descriptors using either relative paths (resolved relative to the location of the CPE descriptor)
+      or the classpath/datapath.  For details see <olink targetdoc="&uima_docs_ref;"
+      targetptr="ugr.ref.xml.component_descriptor"/>.</para>
+     
+    <para>The follwing older syntax is still supported, but <emphasis>not recommended</emphasis>:
+      
+      <programlisting><![CDATA[<descriptor>
+    <include href="[URL or File]"/>
+</descriptor>]]></programlisting></para>
+    
+    <para>The <literal>[URL or File]</literal> attribute is a URL or a filename for the descriptor of the
+      incorporated component. The argument is first attempted to be resolved as a URL.</para>
+    
+    <para>
+      Relative paths in an <literal>include</literal> are resolved relative to the current working directory 
+      (NOT the CPE descriptor location as is the case for <literal>import</literal>). 
+      A filename relative to another directory can be specified using the <literal>CPM_HOME</literal>
+      variable, e.g.,    
+    <programlisting>&lt;descriptor&gt;
+    &lt;include href="${CPM_HOME}/desc_dir/descriptor.xml"/&gt;
+&lt;/descriptor&gt;</programlisting>
+    
+      In this case, the value for the <literal>CPM_HOME</literal> variable must be
+      provided to the CPE by specifying it on the Java command line, e.g.,
+        
+    <programlisting>java -DCPM_HOME="C:/Program Files/apache/uima/cpm" ...</programlisting>
+    
+  </para>
+    
+  </section>
+  
+  <section id="&tp;descriptor">
+    <title>CPE Descriptor Overview</title>
+    
+    <para>A CPE Descriptor consists of information describing the following four main
+      elements.</para>
+    
+    <orderedlist><listitem><para>The <emphasis>Collection Reader</emphasis>, which
+      is responsible for gathering artifacts and initializing the Common Analysis
+      Structure (CAS) used to support processing in the UIMA collection processing
+      engine.</para></listitem>
+      
+      <listitem><para>The <emphasis>CAS Processors</emphasis>, responsible for
+        analyzing individual artifacts, analyzing across artifacts, and extracting
+        analysis results. CAS Processors include <emphasis>Analysis Engines</emphasis>
+        and <emphasis>CAS Consumers</emphasis>.</para></listitem>
+      
+      <listitem><para>Operational parameters of the <emphasis>Collection Processing
+        Manager</emphasis> (CPM), such as checkpoint frequency and deployment
+        mode.</para></listitem>
+      
+      <listitem><para>Resource Manager Configuration (optional). </para></listitem>
+      </orderedlist>
+    
+    <para>The CPE Descriptor has the following high level skeleton:
+      
+      
+      <programlisting><![CDATA[<?xml version="1.0"?>
+<cpeDescription>
+   <collectionReader>
+...
+   </collectionReader>
+   <casProcessors>
+...
+   </casProcessors>
+   <cpeConfig>
+...
+   </cpeConfig>
+   <resourceManagerConfiguration>
+...
+   </resourceManagerConfiguration>
+</cpeDescription>]]></programlisting></para>
+    
+    <para>Details of each of the four main elements are described in the sections that
+      follow.</para>
+ </section>   
+    <section id="&tp;descriptor.collection_reader">
+      <title>Collection Reader</title>
+      
+      <para>The <literal>&lt;collectionReader&gt;</literal> section identifies the
+        Collection Reader and optional CAS Initializer that are to be used in the CPE. The
+        Collection Reader is responsible for retrieval of artifacts from a collection
+        outside of the CPE, and the optional CAS Initializer (deprecated as of UIMA Version 2)
+        is responsible for initializing the CAS with the artifact.</para>
+      
+      <para>A Collection Reader may initialize the CAS itself, in which case it does not
+        require a CAS Initializer. This should be clearly specified in the documentation for
+        the Collection Reader. Specifying a CAS Initializer for a Collection Reader that
+        does not make use of a CAS Initializer will not cause an error, but the specified CAS
+        Initializer will not be used.</para>
+      
+      <para>The complete structure of the <literal>&lt;collectionReader&gt;</literal>
+        section is:
+        
+        
+        <programlisting><![CDATA[<collectionReader>
+  <collectionIterator>
+    <descriptor>
+      <import ...> | <include .../>
+    </descriptor>
+    <configurationParameterSettings>...</configurationParameterSettings>
+    <sofaNameMappings>...</sofaNameMappings>
+  </collectionIterator>
+  <casInitializer>
+    <descriptor>
+      <import ...> | <include .../>
+    </descriptor>
+    <configurationParameterSettings>...</configurationParameterSettings>
+    <sofaNameMappings>...</sofaNameMappings>
+  </casInitializer>
+</collectionReader>]]></programlisting></para>
+      
+      <para>The <literal>&lt;collectionIterator&gt;</literal> identifies the
+        descriptor for the Collection Reader, and the <literal>&lt;casInitializer&gt;
+        </literal>identifies the descriptor for the CAS Initializer. The format and
+        details of the Collection Reader and CAS Initializer descriptors are described in
+          <olink targetdoc="&uima_docs_ref;"
+          targetptr="ugr.ref.xml.component_descriptor.collection_processing_parts.collection_reader"/>
+        . The <literal>&lt;configurationParameterSettings&gt; </literal>and the
+        <literal>&lt;sofaNameMappings&gt;</literal> elements are described in the next
+        section.</para>
+      
+      <section id="&tp;descriptor.collection_reader.error_handling">
+        <title>Error handling for Collection Readers</title>
+        
+        <para>The CPM will abort if the Collection Reader throws a large number of
+          consecutive exceptions (default = 100). This default can by changed by using the
+          Java initialization parameter <literal>-DMaxCRErrorThreshold
+          xxx.</literal></para>
+      </section>
+    </section>
+    
+    <section id="&tp;descriptor.cas_processors">
+      <title>CAS Processors</title>
+      

[... 1053 lines stripped ...]