You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:06:04 UTC

svn commit: r941744 [7/7] - in /uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides: ./ src/ src/docbook/ src/docbook/images/ src/docbook/images/tutorials_and_users_guides/ src/docbook/images/tutorials_and_users_guides/tug.aae/ src/d...

Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml Thu May  6 14:06:02 2010
@@ -0,0 +1,394 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tutorials_and_users_guides/tug.fc/">
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.fc">
+  <title>Flow Controller Developer&apos;s Guide</title>
+  
+  <para>A Flow Controller is a component that plugs into an Aggregate Analysis Engine. When a CAS is input to the
+    Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that
+    CAS. The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.</para>
+  
+  <para>Flow Controllers may decide the flow dynamically, based on the contents of the CAS. So, as just one example,
+    you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then,
+    based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized
+    for that particular language.</para>
+  
+  <section id="ugr.tug.fc.developing_fc_code">
+    <title>Developing the Flow Controller Code</title>
+    
+    <section id="ugr.tug.fc.fc_interface_overview">
+      <title>Flow Controller Interface Overview</title>
+      
+      <para>Flow Controller implementations should extend from the
+        <literal>JCasFlowController_ImplBase</literal> or
+        <literal>CasFlowController_ImplBase</literal> classes, depending on which CAS interface they prefer
+        to use. As with other types of components, the Flow Controller ImplBase classes define optional
+        <literal>initialize</literal>, <literal>destroy</literal>, and <literal>reconfigure</literal>
+        methods. They also define the required method <literal>computeFlow</literal>.</para>
+      
+      <para>The <literal>computeFlow</literal> method is called by the framework whenever a new CAS enters the
+        Aggregate Analysis Engine. It is given the CAS as an argument and must return an object which implements the
+        <literal>Flow</literal> interface (the Flow object). The Flow Controller developer must define this
+        object. It is the object that is responsible for routing this particular CAS through the components of the
+        Aggregate Analysis Engine. For convenience, the framework provides basic implementation of flow objects
+        in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are using the JCas interface
+        to the CAS.</para>
+      
+      <para>The framework then uses the Flow object and calls its <literal>next()</literal> method, which returns
+        a <literal>Step</literal> object (implemented by the UIMA Framework) that indicates what to do next with
+        this CAS next. There are three types of steps currently supported:</para>
+      
+      <itemizedlist>
+        <listitem>
+          <para><literal>SimpleStep</literal>, which specifies a single Analysis Engine that should receive
+            the CAS next.</para>
+        </listitem>
+        
+        <listitem>
+          <para><literal>ParallelStep</literal>, which specifies that multiple Analysis Engines should
+            receive the CAS next, and that the relative order in which these Analysis Engines execute does not
+            matter. Logically, they can run in parallel. The runtime is not obligated to actually execute them in
+            parallel, however, and the current implementation will execute them serially in an arbitrary
+            order.</para>
+        </listitem>
+        
+        <listitem>
+          <para><literal>FinalStep</literal>, which indicates that the flow is completed. </para>
+        </listitem>
+      </itemizedlist>
+      
+      <para>After executing the step, the framework will call the Flow object&apos;s <literal>next()</literal>
+        method again to determine the next destination, and this will be repeated until the Flow Object indicates
+        that processing is complete by returning a <literal>FinalStep</literal>.</para>
+      
+      <para>The Flow Controller has access to a <literal>FlowControllerContext</literal>, which is a subtype of
+        <literal>UimaContext</literal>. In addition to the configuration parameter and resource access
+        provided by a <literal>UimaContext</literal>, the <literal>FlowControllerContext</literal> also
+        gives access to the metadata for all of the Analysis Engines that the Flow Controller can route CASes to. Most
+        Flow Controllers will need to use this information to make routing decisions. You can get a handle to the
+        <literal>FlowControllerContext</literal> by calling the <literal>getContext()</literal> method
+        defined in <literal>JCasFlowController_ImplBase</literal> and
+        <literal>CasFlowController_ImplBase</literal>. Then, the
+        <literal>FlowControllerContext.getAnalysisEngineMetaDataMap</literal> method can be called to get a
+        map containing an entry for each of the Analysis Engines in the Aggregate. The keys in this map are the same as
+        the delegate analysis engine keys specified in the aggregate descriptor, and the values are the
+        corresponding <literal>AnalysisEngineMetaData</literal> objects.</para>
+      
+      <para>Finally, the Flow Controller has optional methods <literal>addAnalysisEngines</literal> and
+        <literal>removeAnalysisEngines</literal>. These methods are intended to notify the Flow Controller if
+        new Analysis Engines are available to route CASes to, or if previously available Analysis Engines are no
+        longer available. However, the current version of the Apache UIMA framework does not support dynamically
+        adding or removing Analysis Engines to/from an aggregate, so these methods are not currently called. Future
+        versions may support this feature. </para>
+    </section>
+    
+    <section id="ugr.tug.fc.example_code">
+      <title>Example Code</title>
+      
+      <para>This section walks through the source code of an example Flow Controller that simluates a simple version
+        of the <quote>Whiteboard</quote> flow model. At each step of the flow, the Flow Controller looks it all of the
+        available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are
+        satisfied.</para>
+      
+      <para>The Java class for the example is
+        <literal>org.apache.uima.examples.flow.WhiteboardFlowController</literal> and the source code is
+        included in the UIMA SDK under the <literal>examples/src</literal> directory.</para>
+      
+      <section id="ugr.tug.fc.whiteboard">
+        <title>The WhiteboardFlowController Class</title>
+        
+        
+        <programlisting>public class WhiteboardFlowController 
+          extends CasFlowController_ImplBase {
+  public Flow computeFlow(CAS aCAS) 
+          throws AnalysisEngineProcessException {
+    WhiteboardFlow flow = new WhiteboardFlow();
+    // As of release 2.3.0, the following is not needed,
+    //   because the framework does this automatically
+    // flow.setCas(aCAS); 
+                        
+    return flow;
+  }
+
+  class WhiteboardFlow extends CasFlow_ImplBase {
+     // Discussed Later
+  }
+}</programlisting>
+        
+        <para>The <literal>WhiteboardFlowController</literal> extends from
+          <literal>CasFlowController_ImplBase</literal> and implements the
+          <literal>computeFlow</literal> method. The implementation of the <literal>computeFlow</literal>
+          method is very simple; it just constructs a new <literal>WhiteboardFlow</literal> object that will be
+          responsible for routing this CAS.  The framework will add a handle to that CAS
+          which it will later use to make its routing decisions.</para>
+        
+        <para>Note that we will have one instance of <literal>WhiteboardFlow</literal> per CAS, so if there are
+          multiple CASes being simultaneously processed there will not be any confusion.</para>
+        
+      </section>
+      <section id="ugr.tug.fc.whiteboardflow">
+        <title>The WhiteboardFlow Class</title>
+        
+        
+        <programlisting>class WhiteboardFlow extends CasFlow_ImplBase {
+  private Set mAlreadyCalled = new HashSet();
+
+  public Step next() throws AnalysisEngineProcessException {
+    // Get the CAS that this Flow object is responsible for routing.
+    // Each Flow instance is responsible for a single CAS.
+    CAS cas = getCas();
+
+    // iterate over available AEs
+    Iterator aeIter = getContext().getAnalysisEngineMetaDataMap().
+        entrySet().iterator();
+    while (aeIter.hasNext()) {
+      Map.Entry entry = (Map.Entry) aeIter.next();
+      // skip AEs that were already called on this CAS
+      String aeKey = (String) entry.getKey();
+      if (!mAlreadyCalled.contains(aeKey)) {
+        // check for satisfied input capabilities 
+        //(i.e. the CAS contains at least one instance
+        // of each required input
+        AnalysisEngineMetaData md = 
+            (AnalysisEngineMetaData) entry.getValue();
+        Capability[] caps = md.getCapabilities();
+        boolean satisfied = true;
+        for (int i = 0; i &lt; caps.length; i++) {
+          satisfied = inputsSatisfied(caps[i].getInputs(), cas);
+          if (satisfied)
+            break;
+        }
+        if (satisfied) {
+          mAlreadyCalled.add(aeKey);
+          if (mLogger.isLoggable(Level.FINEST)) {
+            getContext().getLogger().log(Level.FINEST, 
+                "Next AE is: " + aeKey);
+          }
+          return new SimpleStep(aeKey);
+        }
+      }
+    }
+    // no appropriate AEs to call - end of flow
+    getContext().getLogger().log(Level.FINEST, "Flow Complete.");
+    return new FinalStep();
+  }
+
+  private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS) {
+      //implementation detail; see the actual source code
+  }
+}</programlisting>
+        
+        <para>Each instance of the <literal>WhiteboardFlowController</literal> is responsible for routing a
+          single CAS. A handle to the CAS instance is available by calling the <literal>getCas()</literal> method,
+          which is a standard method defined on the <literal>CasFlow_ImplBase </literal>superclass.</para>
+        
+        <para>Each time the <literal>next</literal> method is called, the Flow object iterates over the metadata
+          of all of the available Analysis Engines (obtained via the call to <literal>getContext().
+          getAnalysisEngineMetaDataMap)</literal> and sees if the input types declared in an
+          AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS contains at least one instance of
+          each declared input type). The exact details of checking for instances of types in the CAS are not discussed
+          here &ndash; see the WhiteboardFlowController.java file for the complete source.</para>
+        
+        <para>When the Flow object decides which AnalysisEngine should be called next, it indicates this by
+          creating a SimpleStep object with the key for that AnalysisEngine and returning it:</para>
+        
+        <programlisting>return new SimpleStep(aeKey);</programlisting>
+        
+        <para>The Flow object keeps a list of which Analysis Engines it has invoked in the
+          <literal>mAlreadyCalled</literal> field, and never invokes the same Analysis Engine twice. Note this
+          is not a hard requirement. It is acceptable to design a FlowController that invokes the same Analysis
+          Engine more than once. However, if you do this you must make sure that the flow will eventually
+          terminate.</para>
+        
+        <para>If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals
+          the end of the flow by returning a FinalStep object:</para>
+        
+        <programlisting>return new FinalStep();</programlisting>
+        
+        <para>Also, note the use of the logger to write tracing messages indicating the decisions made by the Flow
+          Controller. This is a good practice that helps with debugging if the Flow Controller is behaving in an
+          unexpected way.</para>
+      </section>
+    </section>
+  </section>
+  
+  <section id="ugr.tug.fc.creating_fc_descriptor">
+    <title>Creating the Flow Controller Descriptor</title>
+    
+    <para>To create a Flow Controller Descriptor in the CDE, use File &rarr; New &rarr; Other
+      &rarr; UIMA &rarr; Flow Controller Descriptor File:
+      
+      
+      <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image002.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Eclipse new object wizard showing Flow Controller</phrase></textobject>
+    </mediaobject>
+  </screenshot></para>
+    
+    <para>This will bring up the Overview page for the Flow Controller Descriptor:
+      
+      
+      <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="5.5in" format="JPG" fileref="&imgroot;image004.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Component Descriptor Editor Overview page for new Flow Controller</phrase></textobject>
+    </mediaobject>
+  </screenshot></para>
+    
+    <para>Type in the Java class name that implements the Flow Controller, or use the <quote>Browse</quote> button
+      to select it. You must select a Java class that implements the <literal>FlowController</literal>
+      interface.</para>
+    
+    <para>Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors &ndash; for
+      example you can specify configuration parameters and external resources if you wish.</para>
+    
+    <para>If you wish to edit a Flow Controller Descriptor by hand, see section <olink targetdoc="&uima_docs_ref;"
+        targetptr="ugr.ref.xml.component_descriptor.flow_controller"/> for the syntax.</para>
+  </section>
+  
+  <section id="ugr.tug.fc.adding_fc_to_aggregate">
+    <title>Adding a Flow Controller to an Aggregate Analysis Engine</title>
+    <titleabbrev>Adding Flow Controller to an Aggregate</titleabbrev>
+    
+    <para>To use a Flow Controller you must add it to an Aggregate Analysis Engine. You can only have one Flow
+      Controller per Aggregate Analysis Engine. In the Component Descriptor Editor, the Flow Controller is
+      specified on the Aggregate page, as a choice in the flow control kind - pick <quote>User-defined Flow</quote>.
+      When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow
+      Controller Descriptor, which when you select it, will be imported into the aggregate descriptor.
+      
+      
+      <screenshot>
+    <mediaobject>
+      <imageobject>
+        <imagedata width="4.5in" format="JPG" fileref="&imgroot;image006.jpg"/>
+      </imageobject>
+      <textobject><phrase>Screenshot of Component Descriptor Editor Aggregate page showing selecting user-defined flow</phrase></textobject>
+    </mediaobject>
+  </screenshot></para>
+    
+    <para>The key name is created automatically from the name element in the Flow Controller Descriptor being
+      imported. If you need to change this name, you can do so by switching to the <quote>Source</quote> view using the
+      bottom tabs, and editing the name in the XML source.</para>
+    
+    <para>If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is:
+      
+      
+      <programlisting>  &lt;delegateAnalysisEngineSpecifiers&gt;
+    ...
+  &lt;/delegateAnalysisEngineSpecifiers&gt;  
+  <emphasis role="bold">&lt;flowController key=<quote>[String]</quote>&gt;
+    &lt;import .../&gt; 
+  &lt;/flowController&gt;</emphasis></programlisting></para>
+    
+    <para>As usual, you can use either in import by location or import by name &ndash; see <olink
+        targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor.imports"/>.</para>
+    
+    <para>The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine
+      Descriptor &ndash; in parameter overrides, resource bindings, and Sofa mappings.</para>
+  </section>
+  
+  <section id="ugr.tug.fc.adding_fc_to_cpe">
+    <title>Adding a Flow Controller to a Collection Processing Engine</title>
+    <titleabbrev>Adding Flow Controller to CPE</titleabbrev>
+    
+    <para>Flow Controllers cannot be added directly to Collection Processing Engines. To use a Flow Controller in a
+      CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis
+      Engine, and then add the Aggregate Analysis Engine to your CPE. The CPE&apos;s deployment and error handling
+      options can then only be configured for the entire Aggregate Analysis Engine as a unit.</para>
+    
+  </section>
+  
+  <section id="ugr.tug.fc.using_fc_with_cas_multipliers">
+    <title>Using Flow Controllers with CAS Multipliers</title>
+    
+    <para>If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a CAS Multiplier
+      (see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>), there are additional
+      things you must consider.</para>
+    
+    <para>When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that
+      then will also need to be routed by the Flow Controller. When a new output CAS is produced, the framework will call
+      the <literal>newCasProduced</literal> method on the Flow object that was managing the flow of the parent CAS 
+      (the one that was input to the CAS Multiplier). The <literal>newCasProduced</literal> method must create a new Flow 
+      object that will be responsible for routing the new output CAS.</para>
+    
+    <para>In the <literal>CasFlow_ImplBase</literal> and <literal>JCasFlow_ImplBase</literal> classes, the
+      <literal>newCasProduced</literal> method is defined to throw an exception indicating that the Flow
+      Controller does not handle CAS Multipliers. If you want your Flow Controller to properly deal with CAS
+      Multipliers you must override this method.</para>
+        
+    <para>If your Flow class extends <literal>CasFlow_ImplBase</literal>, the method signature to override is:           
+      <programlisting>protected Flow newCasProduced(CAS newOutputCas, String producedBy)</programlisting>
+    </para>
+    
+    <para>If your Flow class extends <literal>JCasFlow_ImplBase</literal>, the method signature to override is:
+      <programlisting>protected Flow newCasProduced(JCas newOutputCas, String producedBy)</programlisting>
+    </para>  
+    
+    <para>Also, there is a variant of <literal>FinalStep</literal> which can only be specified for output CASes
+      produced by CAS Multipliers within the Aggregate Analysis Engine containing the Flow Controller. This
+      version of <literal>FinalStep</literal> is produced by the calling the constructor with a
+      <literal>true</literal> argument, and it causes the CAS to be immediately released back to the pool. No
+      further processing will be done on it and it will not be output from the aggregate. This is the way that you can
+      build an Aggregate Analysis Engine that outputs some new CASes but not others. Note that if you never want any new
+      CASes to be output from the Aggregate Analysis Engine, you don&apos;t need to use this; instead just declare
+      <literal>&lt;outputsNewCASes&gt;false&lt;/outputsNewCASes&gt;</literal> in your Aggregate Analysis
+      Engine Descriptor as described in <olink targetdoc="&uima_docs_tutorial_guides;"
+        targetptr="ugr.tug.cm.aggregate_cms"/>.</para>
+    
+    <para>For more information on how CAS Multipliers interact with Flow Controllers, see 
+      <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm.cm_and_fc"/>.
+    </para>
+  </section>
+  
+  <section id="ugr.tug.fc.continuing_when_exceptions_occur">
+    <title>Continuing the Flow When Exceptions Occur</title>
+    <para> If an exception occurs when processing a CAS, the framework may call the method     
+      <programlisting>boolean continueOnFailure(String failedAeKey, Exception failure)</programlisting>
+      on the Flow object that was managing the flow of that CAS. If this method returns <literal>true</literal>, then
+      the framework may continue to call the <literal>next()</literal> method to continue routing the CAS. If this
+      method returns <literal>false</literal> (the default), the framework will not make any more calls to the
+      <literal>next()</literal> method. </para>
+    <para>In the case where the last Step was a ParallelStep, if at least one of the destinations resulted in a failure,
+      then <literal>continueOnFailure</literal> will be called to report one of the failures. If this method
+      returns true, but one of the other destinations in the ParallelStep resulted in a failure, then the
+      <literal>continueOnFailure</literal> method will be called again to report the next failure. This
+      continues until either this method returns false or there are no more failures. </para>
+    <para>Note that it is possible for processing of a CAS to be aborted without this method being called. This method
+      is only called when an attempt is being made to continue processing of the CAS following an exception, which may
+      be an application configuration decision.</para>
+    <para>In any case, if processing is aborted by the framework for any reason, including because
+      <literal>continueOnFailure</literal> returned false, the framework will call the
+      <literal>Flow.aborted()</literal> method to allow the Flow object to clean up any resources.</para>   
+    <para>For an example of how to continue after an exception, see the example
+      code <literal>org.apache.uima.examples.flow.AdvancedFixedFlowController</literal>, in
+      the <literal>examples/src</literal> directory of the UIMA SDK.  This exampe also demonstrates the use of
+      <literal>ParallelStep</literal>.</para>
+  </section>
+</chapter>
\ No newline at end of file

Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml Thu May  6 14:06:02 2010
@@ -0,0 +1,696 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.mvs">
+  <title>Multiple CAS Views of an Artifact</title>
+  <titleabbrev>Multiple CAS Views</titleabbrev>
+  
+  <para>UIMA provides an extension to the basic model of the CAS which supports analysis of
+    multiple views of the same artifact, all contained with the CAS. This chapter describes
+    the concepts, terminology, and the API and XML extensions that enable this.</para>
+  
+  <para>Multiple CAS Views can simplify things when different versions of the artifact are
+    needed at different stages of the analysis. They are also key to enabling multimodal
+    analysis where the initial artifact is transformed from one modality to another, or where
+    the artifact itself is multimodal, such as the audio, video and closed-captioned text
+    associated with an MPEG object. Each representation of the artifact can be analyzed
+    independently with the standard UIMA programming model; in addition, multi-view
+    components and applications can be constructed.</para>
+  
+  <para>UIMA supports this by augmenting the CAS with additional light-weight CAS objects,
+    one for each view, where these objects share most of the same underlying CAS, except for two
+    things: each view has its own set of indexed Feature Structures, and each view has its own
+    subject of analysis (Sofa) - its own version of the artifact being analyzed. The Feature
+    Structure instances themselves are in the shared part of the CAS; only the entries in the
+    indexes are unique for each CAS view.</para>
+  
+  <para>All of these CAS view objects are kept together with the CAS, and passed as a unit
+    between components in a UIMA application. APIs exist which allow components and
+    applications to switch among the various view objects, as needed.</para>
+  
+  <para>Feature Structures may be indexed in multiple views, if necessary. New methods on CAS
+    Views facilitate adding or removing Feature Structures to or from their index
+    repositories:</para>
+  
+  
+  <programlisting>aView.addFsToIndexes(aFeatureStructure) 
+aView.removeFsFromIndexes(aFeatureStructure)</programlisting>
+  
+  <para>specify the view in which this Feature Structure should be added to or removed from the
+    indexes.</para>
+  
+  <section id="ugr.tug.mvs.cas_views_and_sofas">
+    <title>CAS Views and Sofas</title>
+    
+    <para>Sofas (see <olink targetdoc="&uima_docs_tutorial_guides;"
+        targetptr="ugr.tug.aas.sofa"/>) and CAS Views are linked. In this implementation,
+      every CAS view has one associated Sofa, and every Sofa has one associated CAS
+      View.</para>
+    
+    <section id="ugr.tug.mvs.naming_views_sofas">
+      <title>Naming CAS Views and Sofas</title>
+      
+      <para>The developer assigns a name to the View / Sofa, which is a simple string
+        (following the rules for Java identifiers, usually without periods, but see special
+        exception below). These names are declared in the component XML metadata, and are
+        used during assembly and by the runtime to enable switching among multiple Views of
+        the CAS at the same time.</para>
+      <note><para>The name is called the Sofa name, for historical reasons, but it applies
+      equally to the View. In the rest of this chapter, we&apos;ll refer to it as the Sofa
+      name.</para></note>
+      
+      <para>Some applications contain components that expect a variable number of Sofas as
+        input or output. An example of a component that takes a variable number of input Sofas
+        could be one that takes several translations of a document and merges them, where each
+        translation was in a separate Sofa. </para>
+      
+      <para> You can specify a variable number of input or output sofa names, where each name
+        has the same base part, by writing the base part of the name (with no periods), followed
+        by a period character and an asterisk character (.*). These denote sofas that have
+        names matching the base part up to the period; for example, names such as
+        <literal>base_name_part.TTX_3d</literal> would match a specification of
+        <literal>base_name_part.*</literal>.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.multi_view_and_single_view">
+      <title>Multi-View, Single-View components &amp; applications</title>
+      <titleabbrev>Multi/Single View parts in Applications</titleabbrev>
+      
+      <para>Components and applications can be written to be Multi-View or Single-View.
+        Most components used as primitive building blocks are expected to be Single-View.
+        UIMA provides capabilities to combine these kinds of components with Multi-View
+        components when assembling analysis aggregates or applications.</para>
+      
+      <para>Single-View components and applications use only one subject of analysis, and
+        one CAS View. The code and descriptors for these components do not use the facilities
+        described in this chapter.</para>
+      
+      <para>Conversely, Multi-View components and applications are aware of the
+        possibility of multiple Views and Sofas, and have code and XML descriptors that
+        create and manipulate them.</para>
+      
+    </section>
+  </section>
+  
+  <section id="ugr.tug.mvs.multi_view_components">
+    <title>Multi-View Components</title>
+    <section id="ugr.tug.mvs.deciding_multi_view">
+      <title>How UIMA decides if a component is Multi-View</title>
+      <titleabbrev>Deciding: Multi-View</titleabbrev>
+      
+      <para>Every UIMA component has an associated XML Component Descriptor. Multi-View
+        components are identified simply as those whose descriptors declare one or more Sofa
+        names in their Capability sections, as inputs or outputs. If a Component Descriptor
+        does not mention any input or output Sofa names, the framework treats that component
+        as a Single-View component.</para>
+      
+      <para>A Multi-View component is passed a special kind of a CAS object, called a base CAS,
+        which it must use to switch to the particular view it wishes to process. The base CAS
+        object itself has no Sofa and no ability to use Indexes; only the views have that
+        capability.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.additional_capabilities">
+      <title>Multi-View: additional capabilities</title>
+      
+      <para>Additional capabilities provided for components and applications aware of the
+        possibilities of multiple Views and Sofas include:</para>
+      
+      <itemizedlist spacing="compact"><listitem><para>Creating new Views, and for
+        each, setting up the associated Sofa data</para></listitem>
+        
+        <listitem><para>Getting a reference to an existing View and its associated Sofa, by
+          name </para></listitem>
+        
+        <listitem><para>Specifying a view in which to index a particular Feature Structure
+          instance </para></listitem></itemizedlist>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.component_xml_metadata">
+      <title>Component XML metadata</title>
+      
+      <para>Each Multi-View component that creates a Sofa or wants to switch to a specific
+        previously created Sofa must declare the name for the Sofa in the capabilities
+        section. For example, a component expecting as input a web document in html format and
+        creating a plain text document for further processing might declare:</para>
+      
+      
+      <programlisting>&lt;capabilities&gt;
+  &lt;capability&gt;
+    &lt;inputs/&gt;
+    &lt;outputs/&gt;
+    &lt;inputSofas&gt;
+<emphasis role="bold">      &lt;sofaName&gt;rawContent&lt;/sofaName&gt;</emphasis>
+    &lt;/inputSofas&gt;
+    &lt;outputSofas&gt;
+<emphasis role="bold">      &lt;sofaName&gt;detagContent&lt;/sofaName&gt;</emphasis>
+    &lt;/outputSofas&gt;
+  &lt;/capability&gt;
+&lt;/capabilities&gt;</programlisting>
+      
+      <para>Details on this specification are found in <olink
+          targetdoc="&uima_docs_ref;"
+          targetptr="ugr.ref.xml.component_descriptor"/>. The Component Descriptor
+        Editor supports Sofa declarations on the <olink targetdoc="&uima_docs_tools;"
+          targetptr="ugr.tools.cde.capabilities"/>.</para>
+      
+    </section>
+  </section>
+  
+  <section id="ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">
+    <title>Sofa Capabilities and APIs for Applications</title>
+    <titleabbrev>Sofa Capabilities &amp; APIs for Apps</titleabbrev>
+    
+    <para>In addition to components, applications can make use of these capabilities. When
+      an application creates a new CAS, it also creates the initial view of that CAS - and this
+      view is the object that is returned from the create call. Additional views beyond this
+      first one can be dynamically created at any time. The application can use the Sofa APIs
+      described in <olink targetdoc="&uima_docs_tutorial_guides;"
+        targetptr="ugr.tug.aas"/> to specify the data to be analyzed.</para>
+    
+    <para>If an Application creates a new CAS, the initial CAS that is created will be a view
+      named <quote>_InitialView</quote>. This name can be used in the application and in
+      Sofa Mapping (see the next section) to refer to this otherwise unnamed view.</para>
+    
+  </section>
+  
+  <section id="ugr.tug.mvs.sofa_name_mapping">
+    <title>Sofa Name Mapping</title>
+    
+    <para>Sofa Name mapping is the mechanism which enables UIMA component developers to
+      choose locally meaningful Sofa names in their source code and let aggregate,
+      collection processing engine developers, and application developers connect output
+      Sofas created in one component to input Sofas required in another.</para>
+    
+    <para>At a given aggregation level, the assembler or application developer defines
+      names for all the Sofas, and then specifies how these names map to the contained
+      components, using the Sofa Map.</para>
+    
+    <para>Consider annotator code to create a new CAS view:</para>
+    
+    
+    <programlisting>CAS viewX = cas.createView("X");</programlisting>
+    
+    <para>Or code to get an existing CAS view:</para>
+    
+    <programlisting>CAS viewX = cas.getView("X");</programlisting>
+    
+    <para>Without Sofa name mapping the SofaID for the new Sofa will be <quote>X</quote>.
+      However, if a name mapping for <quote>X</quote> has been specified by the aggregate or
+      CPE calling this annotator, the actual SofaID in the CAS can be different.</para>
+    
+    <para>All Sofas in a CAS must have unique names. This is accomplished by mapping all
+      declared Sofas as described in the following sections. An attempt to create a Sofa with a
+      SofaID already in use will throw an exception.</para>
+    
+    <para>Sofa name mapping must not use the <quote>.</quote> (period) character. Runtime Sofa
+      mapping maps names up to the <quote>.</quote> and appends the period and the following
+      characters to the mapped name.</para>
+    
+    <para>To get a Java Iterator for all the views in a CAS:</para>
+    
+    <programlisting>Iterator allViews = cas.getViewIterator();</programlisting>
+    
+    <para>To get a Java Iterator for selected views in a CAS, for example, views whose name 
+      is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix 
+      can be any String:</para>
+    
+    <programlisting>Iterator someViews = cas.getViewIterator(String namePrefix);</programlisting>
+
+      <note><para>Sofa name mapping is applied to namePrefix.</para></note>
+    
+    <para>Sofa name mappings are not currently supported for remote Analysis Engines.
+      See <xref linkend="ugr.tug.mvs.name_mapping_remote_services"/>.</para>
+               
+    <section id="ugr.tug.mvs.name_mapping_aggregate">
+      <title>Name Mapping in an Aggregate Descriptor</title>
+      
+      <para>For each component of an Aggregate, name mapping specifies the conversion
+        between component Sofa names and names at the aggregate level.</para>
+      
+      <para>Here&apos;s an example. Consider two Multi-View annotators to be assembled
+        into an aggregate which takes an audio segment consisting of spoken English and
+        produces a German text translation.</para>
+      
+      <para>The first annotator takes an audio segment as input Sofa and produces a text
+        transcript as output Sofa. The annotator designer might choose these Sofa names to be
+        <quote>AudioInput</quote> and <quote>TranscribedText</quote>.</para>
+      
+      <para>The second annotator is designed to translate text from English to German. This
+        developer might choose the input and output Sofa names to be
+        <quote>EnglishDocument</quote> and <quote>GermanDocument</quote>,
+        respectively.</para>
+      
+      <para>In order to hook these two annotators together, the following section would be
+        added to the top level of the aggregate descriptor:</para>
+      
+      
+      <programlisting><![CDATA[<sofaMappings>
+  <sofaMapping>
+    <componentKey>SpeechToText</componentKey>
+    <componentSofaName>AudioInput</componentSofaName>
+    <aggregateSofaName>SegementedAudio</aggregateSofaName>
+  </sofaMapping>
+  <sofaMapping>
+    <componentKey>SpeechToText</componentKey>
+    <componentSofaName>TranscribedText</componentSofaName>
+    <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+  </sofaMapping>
+  <sofaMapping>
+    <componentKey>EnglishToGermanTranslator</componentKey>
+    <componentSofaName>EnglishDocument</componentSofaName>
+    <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+  </sofaMapping>
+  <sofaMapping>
+    <componentKey>EnglishToGermanTranslator</componentKey>
+    <componentSofaName>GermanDocument</componentSofaName>
+    <aggregateSofaName>GermanTranslation</aggregateSofaName>
+  </sofaMapping>
+</sofaMappings>]]></programlisting>
+      
+      <para>The Component Descriptor Editor supports Sofa name mapping in aggregates and
+        simplifies the task. See <olink targetdoc="&uima_docs_tools;"
+          targetptr="ugr.tools.cde.capabilities.sofa_name_mapping"/> for details.</para> 
+    </section>
+    
+    <section id="ugr.tug.mvs.name_mapping_cpe"><title>Name Mapping in a CPE
+      Descriptor</title>
+      
+      <para>The CPE descriptor aggregates together a Collection Reader and CAS Processors
+        (Annotators and CAS Consumers). Sofa mappings can be added to the following elements
+        of CPE descriptors: <literal>&lt;collectionIterator&gt;</literal>,
+        <literal>&lt;casInitializer&gt;</literal> and the
+        <literal>&lt;casProcessor&gt;</literal>. To be consistent with the
+        organization of CPE descriptors, the maps for the CPE descriptor are distributed
+        among the XML markup for each of the parts (collectionIterator, casInitializer,
+        casProcessor). Because of this the<literal>
+        &lt;componentKey&gt;</literal> element is not needed. Finally, rather than
+        sub-elements for the parts, the XML markup for these uses attributes. See <olink
+          targetdoc="&uima_docs_ref;"
+          targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings"/>.</para>
+      
+      <para>Here&apos;s an example. Let&apos;s use the aggregate from the previous section
+        in a collection processing engine. Here we will add a Collection Reader that outputs
+        audio segments in an output Sofa named <quote>nextSegment</quote>. Remember to
+        declare an output Sofa nextSegment in the collection reader description.
+        We&apos;ll add a CAS Consumer in the next section.</para>
+      
+      
+      <programlisting>&lt;collectionReader&gt;
+  &lt;collectionIterator&gt;
+    &lt;descriptor&gt;
+    . . .
+    &lt;/descriptor&gt;
+    &lt;configurationParameterSettings&gt;...&lt;/configurationParameterSettings&gt;
+<emphasis role="bold">    &lt;sofaNameMappings&gt;
+      &lt;sofaNameMapping componentSofaName="nextSegment"
+                       cpeSofaName="SegementedAudio"/&gt;
+      &lt;/sofaNameMappings&gt;
+</emphasis>  &lt;/collectionIterator&gt;
+  &lt;casInitializer/&gt;
+&lt;collectionReader&gt;</programlisting>
+      
+      <para>At this point the CAS Processor section for the aggregate does not need any Sofa
+        mapping because the aggregate input Sofa has the same name,
+        <quote>SegementedAudio</quote>, as is being produced by the Collection
+        Reader.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.specifying_cas_view_for_single_view">
+      <title>Specifying the CAS View for a Single-View Component</title>
+      <titleabbrev>CAS View for Single-View Parts</titleabbrev>
+      
+      <para>Single-View components receive a Sofa named <quote>_InitialView</quote>, or
+        a Sofa that is mapped to this name.</para>
+      
+      <para>For example, assume that the CAS Consumer to be used in our CPE is a Single-View
+        component that expects the analysis results associated with the input CAS, and that
+        we want it to use the results from the translated German text Sofa. The following
+        mapping added to the CAS Processor section for the CPE will instruct the CPE to get the
+        CAS view for the German text Sofa and pass it to the CAS Consumer:</para>
+      
+      
+      <programlisting>&lt;casProcessor&gt;
+  . . .
+  <emphasis role="bold">&lt;sofaNameMappings&gt;
+    &lt;sofaNameMapping componentSofaName="_InitialView"
+                           cpeSofaName="GermanTranslation"/&gt;
+  &lt;sofaNameMappings&gt;
+</emphasis>&lt;/casProcessor&gt;</programlisting>
+      
+      <para id="ugr.tug.mvs.sofa_mapping_leav_out_name">An alternative syntax for
+        this kind of mapping is to simply leave out the component sofa name in this
+        case.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.name_mapping_application">
+      <title>Name Mapping in a UIMA Application</title>
+      
+      <para>Applications which instantiate UIMA components directly using the
+        UIMAFramework methods can also create a top level Sofa mapping using the
+        <quote>additional parameters</quote> capability.</para>
+      
+      
+      <programlisting>//create a "root" UIMA context for your whole application
+
+UimaContextAdmin rootContext =
+   UIMAFramework.newUimaContext(UIMAFramework.getLogger(),
+      UIMAFramework.newDefaultResourceManager(),
+      UIMAFramework.newConfigurationManager());
+
+input = new XMLInputSource("test.xml");
+desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input);
+
+//setup sofa name mappings using the api
+
+HashMap sofamappings = new HashMap();
+sofamappings.put("localName1", "globalName1");
+sofamappings.put("localName2", "globalName2");
+  
+//create a UIMA Context for the new AE we are about to create
+
+//first argument is unique key among all AEs used in the application
+UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap);
+
+//instantiate AE, passing the UIMA Context through the additional
+//parameters map
+
+Map additionalParams = new HashMap();
+additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);
+
+AnalysisEngine ae = 
+        UIMAFramework.produceAnalysisEngine(desc,additionalParams);</programlisting>
+      
+      <para>Sofa mappings are applied from the inside out, i.e., local to global. First, any
+        aggregate mappings are applied, then any CPE mappings, and finally, any specified
+        using this <quote>additional parameters</quote> capability.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.name_mapping_remote_services">
+      <title>Name Mapping for Remote Services</title>
+      
+      <para>Currently, no client-side Sofa mapping information is passed from a UIMA client
+        to a remote service. This can cause complications for UIMA services in a Multi-View
+        application.</para>
+      
+      <para>Remote Multi-View services will work only if the service is Single-View, or if the 
+        Sofa names expected by the service exactly match the Sofa names produced by the client.</para>
+      
+      <para>If your application requires Sofa mappings for a remote Analysis Engine, you
+        can wrap your remotely deployed AE in an aggregate (on the remote side), and specify
+        the necessary Sofa mappings in the descriptor for that aggregate.</para>
+    </section>
+  </section>
+  
+  <section id="ugr.tug.mvs.jcas_extensions_for_multi_views">
+    <title>JCas extensions for Multiple Views</title>
+    
+    <para>The JCas interface to the CAS can be used with any / all views, as well as the base CAS
+      sent to Multi-View components. You can always get a JCas object from an existing CAS
+      object by using the method getJCas(); this call will create the JCas if it doesn&apos;t
+      already exist. If it does exist, it just returns the existing JCas that corresponds to
+      the CAS.</para>
+    
+    <para>JCas implements the getView(...) method, enabling switching to other named
+      views, just like the corresponding method on the CAS. The JCas version, however,
+      returns JCas objects, instead of CAS objects, corresponding to the view.</para>
+  </section>
+  
+  <section id="ugr.tug.mvs.sample_application">
+    <title>Sample Multi-View Application</title>
+    
+    <para>The UIMA SDK contains a simple Sofa example application which demonstrates many
+      Sofa specific concepts and methods. The source code for the application driver is in
+      <literal>examples/src/org/apache/uima/examples/SofaExampleApplication.java</literal>
+      and the Multi-View annotator is given in
+      <literal>SofaExampleAnnotator.java</literal> in the same directory.</para>
+    
+    <para>This sample application demonstrates a language translator annotator which
+      expects an input text Sofa with an English document and creates an output text Sofa
+      containing a German translation. Some of the key Sofa concepts illustrated here
+      include:</para>
+    
+    <itemizedlist spacing="compact"><listitem><para>Sofa creation.</para>
+      </listitem>
+      
+      <listitem><para>Access of multiple CAS views.</para></listitem>
+      
+      <listitem><para>Unique feature structure index space for each view.</para>
+        </listitem>
+      
+      <listitem><para>Feature structures containing cross references between
+        annotations in different CAS views.</para></listitem>
+      
+      <listitem><para>The strong affinity of annotations with a specific Sofa. </para>
+        </listitem></itemizedlist>
+    
+    <section id="ugr.tug.mvs.sample_application.descriptor">
+      <title>Annotator Descriptor</title>
+      
+      <para>The annotator descriptor in
+        <literal>examples/descriptors/analysis_engine/SofaExampleAnnotator.xml</literal>
+        declares an input Sofa named <quote>EnglishDocument</quote> and an output Sofa
+        named <quote>GermanDocument</quote>. A custom type
+        <quote>CrossAnnotation</quote> is also defined:</para>
+      
+      
+      <programlisting><![CDATA[<typeDescription>
+  <name>sofa.test.CrossAnnotation</name>
+  <description/>
+  <supertypeName>uima.tcas.Annotation</supertypeName>
+  <features>
+    <featureDescription>
+      <name>otherAnnotation</name>
+      <description/>
+      <rangeTypeName>uima.tcas.Annotation</rangeTypeName>
+    </featureDescription>
+  </features>
+</typeDescription>]]></programlisting>
+      
+      <para>The <literal>CrossAnnotation</literal> type is derived from
+        <literal>uima.tcas.Annotation </literal>and includes one new feature: a
+        reference to another annotation.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.sample_application.setup">
+      <title>Application Setup</title>
+      
+      <para>The application driver instantiates an analysis engine,
+        <literal>seAnnotator</literal>, from the annotator descriptor, obtains a new
+        base CAS using that engine&apos;s CAS definition, and creates the expected input
+        Sofa using:</para>
+      
+      
+      <programlisting>CAS cas = seAnnotator.newCAS();
+CAS aView = cas.createView("EnglishDocument");</programlisting>
+      
+      <para>Since <literal>seAnnotator</literal> is a primitive component, and no Sofa
+        mapping has been defined, the SofaID will be <quote>EnglishDocument</quote>.
+        Local Sofa data is set using:</para>
+      
+      
+      <programlisting>aView.setDocumentText("this beer is good");</programlisting>
+      
+      <para>At this point the CAS contains all necessary inputs for the translation
+        annotator and its process method is called.</para>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.sample_application.annotator_processing">
+      <title>Annotator Processing</title>
+      
+      <para>Annotator processing consists of parsing the English document into individual
+        words, doing word-by-word translation and concatenating the translations into a
+        German translation. Analysis metadata on the English Sofa will be an annotation for
+        each English word. Analysis metadata on the German Sofa will be a
+        <literal>CrossAnnotation</literal> for each German word, where the
+        <literal>otherAnnotation</literal> feature will be a reference to the associated
+        English annotation.</para>
+      
+      <para>Code of interest includes two CAS views:</para>
+      
+      
+      <programlisting>// get View of the English text Sofa
+englishView = aCas.getView("EnglishDocument");
+
+// Create the output German text Sofa
+germanView = aCas.createView("GermanDocument");</programlisting>
+      
+      <para>the indexing of annotations with the appropriate view:</para>
+      
+      
+      <programlisting>englishView.addFsToIndexes(engAnnot);
+. . .
+germanView.addFsToIndexes(germAnnot);</programlisting>
+      
+      <para>and the combining of metadata belonging to different Sofas in the same feature
+        structure:</para>
+      
+      
+      <programlisting>// add link to English text
+germAnnot.setFeatureValue(other, engAnnot);</programlisting>
+      
+    </section>
+    
+    <section id="ugr.tug.mvs.sample_application.accessing_results">
+      <title>Accessing the results of analysis</title>
+      
+      <para>The application needs to get the results of analysis, which may be in different
+        views. Analysis results for each Sofa are dumped independently by iterating over all
+        annotations for each associated CAS view. For the English Sofa:</para>
+      
+      
+      <programlisting>//get annotation iterator for this CAS
+FSIndex anIndex = aView.getAnnotationIndex();
+FSIterator anIter = anIndex.iterator();
+while (anIter.isValid()) {
+  AnnotationFS annot = (AnnotationFS) anIter.get();
+  System.out.println(" " + annot.getType().getName()
+                         + ": " + annot.getCoveredText());
+  anIter.moveToNext();
+}</programlisting>
+      
+      <para>Iterating over all German annotations looks the same, except for the
+        following:</para>
+      
+      
+      <programlisting>if (annot.getType() == cross) {
+  AnnotationFS crossAnnot =
+          (AnnotationFS) annot.getFeatureValue(other);
+  System.out.println("   other annotation feature: "
+          + crossAnnot.getCoveredText());
+}</programlisting>
+      
+      <para>Of particular interest here is the built-in Annotation type method
+        <literal>getCoveredText()</literal>. This method uses the
+        <quote>begin</quote> and <quote>end</quote> features of the annotation to create
+        a substring from the CAS document. The SofaRef feature of the annotation is used to
+        identify the correct Sofa&apos;s data from which to create the substring.</para>
+      
+      <para>The example program output is:</para>
+      
+      
+      <programlisting>---Printing all annotations for English Sofa---
+uima.tcas.DocumentAnnotation: this beer is good
+uima.tcas.Annotation: this
+uima.tcas.Annotation: beer
+uima.tcas.Annotation: is
+uima.tcas.Annotation: good
+      
+---Printing all annotations for German Sofa---
+uima.tcas.DocumentAnnotation: das bier ist gut
+sofa.test.CrossAnnotation: das
+ other annotation feature: this
+sofa.test.CrossAnnotation: bier
+ other annotation feature: beer
+sofa.test.CrossAnnotation: ist
+ other annotation feature: is
+sofa.test.CrossAnnotation: gut
+ other annotation feature: good</programlisting>
+      
+    </section>
+  </section>
+  
+  <section id="ugr.tug.mvs.views_api_summary">
+    <title>Views API Summary</title>
+    
+    <para>The recommended way to deliver a particular CAS view to a <emphasis role="bold-italic">Single-View</emphasis> component is to use by Sofa-mapping in
+      the CPE and/or aggregate descriptors.</para>
+    
+    <para>For <emphasis role="bold-italic">Multi-View </emphasis> components or
+      applications, the following methods are used to create or get a reference to a CAS view
+      for a particular Sofa:</para>
+    
+    <para>Creating a new View:</para>
+    
+    
+    <programlisting>JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping);
+CAS  newView = aCAS .createView(String localNameOfTheViewBeforeMapping);</programlisting>
+    
+    <para>Getting a View from a CAS or JCas:</para>
+    
+    
+    <programlisting><?db-font-size 80% ?>JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping);
+CAS  myView = aCAS .getView(String localNameOfTheViewBeforeMapping);
+Iterator allViews = aCasOrJCas.getViewIterator();
+Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);</programlisting>
+    
+    <para>The following methods are useful for all annotators and applications:</para>
+    
+    <para>Setting Sofa data for a CAS or JCas:</para>
+    
+    
+    <programlisting>aCasOrJCas.setDocumentText(String docText);
+aCasOrJCas.setSofaDataString(String docText, String mimeType);
+aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType);
+aCasOrJCas.setSofaDataURI(String uri, String mimeType);</programlisting>
+    
+    <para>Getting Sofa data for a particular CAS or JCas:</para>
+    
+    
+    <programlisting>String doc = aCasOrJCas.getDocumentText();
+String doc = aCasOrJCas.getSofaDataString();
+FeatureStructure array = aCasOrJCas.getSofaDataArray();
+String uri = aCasOrJCas.getSofaDataURI();
+InputStream is = aCasOrJCas.getSofaDataStream();</programlisting>
+    
+  </section>
+  
+  <section id="ugr.tug.mvs.sofa_incompatibilities_v1_v2">
+    <title>Sofa Incompatibilities between UIMA version 1 and version 2</title>
+    <titleabbrev>Sofa Incompatibilities: V1 and V2</titleabbrev>
+    
+    <para>A major change in version 2 is related to the support of Single-View components
+      and applications. Given an analysis engine, <literal>ae</literal>, the API
+      
+      <programlisting>CAS cas = ae.newCas();</programlisting>
+      used to return the base CAS. Now it returns a view of the Sofa named
+      <quote>_InitialView</quote>. This Sofa will actually only be created if any Sofa data
+      is set for this view. The initial view is used for Single-View applications and
+      Multi-View annotators with no Sofa mapping.</para>
+    
+    <para>The process method of Multi-View annotators receive the base CAS, however the base
+      CAS no longer has an index repository to hold <quote>global</quote> data. Global data
+      needs to be put in a specific named CAS view of your choice.</para>
+    
+    <para>Because of these changes, the following scenarios will break with v2.0 clients:
+      
+      <itemizedlist spacing="compact"><listitem><para>Any version 1.x services (you
+        must migrate the services to version 2).</para></listitem>
+        
+        <listitem><para>Applications or components explicitly referencing
+          <quote>_DefaultTextSofaName</quote> in code or descriptors.</para>
+          </listitem>
+        
+        <listitem><para>Multi-View applications using the Base CAS index repository.
+          </para></listitem></itemizedlist></para>
+  </section>
+</chapter>
\ No newline at end of file

Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml Thu May  6 14:06:02 2010
@@ -0,0 +1,186 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">  
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.xmi_emf">
+  <title>XMI and EMF Interoperability</title>
+  <titleabbrev>XMI &amp; EMF</titleabbrev>
+  
+  <section id="ugr.tug.xmi_emf.overview">
+    <title>Overview</title>
+    
+    <para>In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph.
+      There are established standards in this area
+      &ndash; specifically, <trademark class="registered">UML</trademark> is an <trademark class="trade">
+      OMG</trademark> standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML
+      representation of object graphs.</para>
+    
+    <para>Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based
+      application development, and it is based on UML and XMI. In EMF, you define class models using a metamodel called
+      Ecore, which is similar to UML. EMF provides tools for converting a UML model to Ecore. EMF can then generate Java
+      classes from your model, and supports persistence of those classes in the XMI format.</para>
+    
+    <para>The UIMA SDK provides tools for interoperability with XMI and EMF. These tools allow conversions of UIMA
+      Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format. This provides a
+      number of advantages, including:</para>
+    
+    <blockquote>
+      <para>You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically
+        convert it to a UIMA Type System.</para>
+      
+      <para>You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it
+        produces to XMI. This data is now in a form where it can easily be ingested by an EMF-based application.</para>
+    </blockquote>
+    
+    <para>More generally, we are adopting the well-documented, open standard XMI as the standard way to represent
+      UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard
+      enables other applications to more easily produce or consume these UIMA analysis results.</para>
+    
+    <para>For more information on XMI, see Grose et al. <emphasis>Mastering XMI. Java Programming with XMI, XML, and
+      UML.</emphasis> John Wiley &amp; Sons, Inc. 2002.</para>
+    
+    <para>For more information on EMF, see Budinsky et al. <emphasis>Eclipse Modeling Framework 2.0.</emphasis>
+      Addison-Wesley. 2006.</para>
+    
+    <para>For details of how the UIMA CAS is represented in XMI format, see <olink targetdoc="&uima_docs_ref;"
+        targetptr="ugr.ref.xmi"/> .</para>
+    
+  </section>
+  
+  <section id="ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_system">
+    <title>Converting an Ecore Model to or from a UIMA Type System</title>
+    
+    <para>The UIMA SDK provides the following two classes:</para>
+    
+    <para><emphasis role="bold"><literal>Ecore2UimaTypeSystem:</literal>
+      </emphasis> converts from an .ecore model developed using EMF to a UIMA-compliant
+      TypeSystem descriptor. This is a Java class that can be run as a standalone program or
+      invoked from another Java application. To run as a standalone program,
+      execute:</para>
+    
+    <para><command>java org.apache.uima.ecore.Ecore2UimaTypeSystem &lt;ecore
+      file&gt; &lt;output file&gt;</command></para>
+    
+    <para>The input .ecore file will be converted to a UIMA TypeSystem descriptor and written
+      to the specified output file. You can then use the resulting TypeSystem descriptor in
+      your UIMA application.</para>
+    
+    <para><emphasis role="bold"><literal>UimaTypeSystem2Ecore:</literal>
+      </emphasis> converts from a UIMA TypeSystem descriptor to an .ecore model. This is a
+      Java class that can be run as a standalone program or invoked from another Java
+      application. To run as a standalone program, execute:</para>
+    
+    <para><command>java org.apache.uima.ecore.UimaTypeSystem2Ecore
+      &lt;TypeSystem descriptor&gt; &lt;output file&gt;</command></para>
+    
+    <para>The input UIMA TypeSystem descriptor will be converted to an Ecore model file and
+      written to the specified output file. You can then use the resulting Ecore model in EMF
+      applications. The converted type system will include any
+      <literal>&lt;import...&gt;</literal>ed TypeSystems; the fact that they were
+      imported is currently not preserved.</para>
+    
+    <para>To run either of these converters, your classpath will need to include the UIMA jar
+      files as well as the following jar files from the EMF distribution: common.jar,
+      ecore.jar, and ecore.xmi.jar.</para>
+    
+    <para>Also, note that the uima-core.jar file contains the Ecore model file uima.ecore,
+      which defines the built-in UIMA types. You may need to use this file from your EMF
+      applications.</para>
+    
+  </section>
+  
+  <section id="ugr.tug.xmi_emf.using_xmi_cas_serialization">
+    <title>Using XMI CAS Serialization</title>
+    
+    <para>The UIMA SDK provides XMI support through the following two classes:</para>
+    
+    <para><emphasis role="bold"><literal>XmiCasSerializer:</literal></emphasis>
+      can be run from within a UIMA application to write out a CAS to the standard XMI format. The
+      XMI that is generated will be compliant with the Ecore model generated by
+      <literal>UimaTypeSystem2Ecore</literal>. An EMF application could use this Ecore
+      model to ingest and process the XMI produced by the XmiCasSerializer.</para>
+    
+    <para><emphasis role="bold"><literal>XmiCasDeserializer:</literal></emphasis>
+      can be run from within a UIMA application to read in an XMI document and populate a CAS. The
+      XMI must conform to the Ecore model generated by
+      <literal>UimaTypeSystem2Ecore</literal>.</para>
+    
+    <para>Also, the uimaj-examples Eclipse project contains some example code that shows
+      how to use the serializer and deserializer:
+
+    <blockquote>
+    <para><literal>org.apache.uima.examples.xmi.XmiWriterCasConsumer:</literal>
+      This is a CAS Consumer that writes each CAS to an output file in XMI format. It is analogous
+      to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it
+      uses the XMI serialization format.</para>
+    
+    <para><literal>org.apache.uima.examples.xmi.XmiCollectionReader:</literal>
+      This is a Collection Reader that reads a directory of XMI files and deserializes each of
+      them into a CAS. For example, this would allow you to build a Collection Processing
+      Engine that reads XMI files, which could contain some previous analysis results, and
+      then do further analysis.</para>
+    </blockquote></para>
+    
+    <para>Finally, in under the folder <literal>uimaj-examples/ecore_src</literal> is
+      the class
+      <literal>org.apache.uima.examples.xmi.XmiEcoreCasConsumer</literal>, which
+      writes each CAS to XMI format and also saves the Type System as an Ecore file. Since this
+      uses the <literal>UimaTypeSystem2Ecore</literal> converter, to compile it you must
+      add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar &ndash;
+      see ecore_src/readme.txt for instructions.</para>
+
+    <section id="ugr.tug.xmi_emf.xml_character_issues">
+    <title>Character Encoding Issues with XML Serialization</title>
+    
+    <para>Note that not all valid Unicode characters are valid XML characters, at least not in XML
+      1.0.  Moreover, it is possible to create characters in Java that are not even valid Unicode
+      characters, let alone XML characters.  As UIMA character data is translated directly into XML
+      character data on serialization, this may lead to issues.  UIMA will therefore check that the
+      character data that is being serialized is valid for the version of XML being used.  If 
+      non-serializable character data is encountered during serialization, an exception is thrown
+      and serialization fails (to avoid creating invalid XML data).  UIMA does not simply replace
+      the offending characters with some valid replacement character; the assumption being that
+      most applications would not like to have their data modified automatically.
+    </para>
+    
+    <para>If you know you are going to use XML serialization, and you would like to avoid such issues
+      on serialization, you should check any character data you create in UIMA ahead of time.  Issues
+      most often arise with the document text, as documents may originate at various sources, and
+      may be of varying quality.  So it's a particularly good idea to check the document text for
+      characters that will cause issues for serialization.
+    </para>
+    
+    <para>UIMA provides a handful of functions to assist you in checking Java character data.  Those
+      methods are located in 
+      <literal>org.apache.uima.internal.util.XMLUtils.checkForNonXmlCharacters()</literal>, with
+      several overloads.  Please check the javadocs for further information.
+    </para>
+    
+    <para>Please note that these issues are not specific to XMI serialization, they apply to the
+      older XCAS format in the same way.
+    </para>
+  
+    </section>
+  </section>
+  
+</chapter>
\ No newline at end of file

Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml Thu May  6 14:06:02 2010
@@ -0,0 +1,36 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<book lang="en">
+  <title>UIMA Tutorial and Developers&apos; Guides</title>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>
+
+  <toc/>
+  
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="annotator_analysis_engine_guide.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cpe.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.application.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.fc.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.aas.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.multi_views.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cas_multiplier.xml"/>
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.xmi_emf.xml"/>
+</book>