You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:06:04 UTC
svn commit: r941744 [7/7] - in
/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides: ./
src/ src/docbook/ src/docbook/images/
src/docbook/images/tutorials_and_users_guides/
src/docbook/images/tutorials_and_users_guides/tug.aae/ src/d...
Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.fc.xml Thu May 6 14:06:02 2010
@@ -0,0 +1,394 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/tutorials_and_users_guides/tug.fc/">
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.fc">
+ <title>Flow Controller Developer's Guide</title>
+
+ <para>A Flow Controller is a component that plugs into an Aggregate Analysis Engine. When a CAS is input to the
+ Aggregate, the Flow Controller determines the order in which the components of that aggregate are invoked on that
+ CAS. The ability to provide your own Flow Controller implementation is new as of release 2.0 of UIMA.</para>
+
+ <para>Flow Controllers may decide the flow dynamically, based on the contents of the CAS. So, as just one example,
+ you could develop a Flow Controller that first sends each CAS to a Language Identification Annotator and then,
+ based on the output of the Language Identification Annotator, routes that CAS to an Annotator that is specialized
+ for that particular language.</para>
+
+ <section id="ugr.tug.fc.developing_fc_code">
+ <title>Developing the Flow Controller Code</title>
+
+ <section id="ugr.tug.fc.fc_interface_overview">
+ <title>Flow Controller Interface Overview</title>
+
+ <para>Flow Controller implementations should extend from the
+ <literal>JCasFlowController_ImplBase</literal> or
+ <literal>CasFlowController_ImplBase</literal> classes, depending on which CAS interface they prefer
+ to use. As with other types of components, the Flow Controller ImplBase classes define optional
+ <literal>initialize</literal>, <literal>destroy</literal>, and <literal>reconfigure</literal>
+ methods. They also define the required method <literal>computeFlow</literal>.</para>
+
+ <para>The <literal>computeFlow</literal> method is called by the framework whenever a new CAS enters the
+ Aggregate Analysis Engine. It is given the CAS as an argument and must return an object which implements the
+ <literal>Flow</literal> interface (the Flow object). The Flow Controller developer must define this
+ object. It is the object that is responsible for routing this particular CAS through the components of the
+ Aggregate Analysis Engine. For convenience, the framework provides basic implementation of flow objects
+ in the classes CasFlow_ImplBase and JCasFlow_ImplBase; use the JCas one if you are using the JCas interface
+ to the CAS.</para>
+
+ <para>The framework then uses the Flow object and calls its <literal>next()</literal> method, which returns
+ a <literal>Step</literal> object (implemented by the UIMA Framework) that indicates what to do next with
+ this CAS next. There are three types of steps currently supported:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para><literal>SimpleStep</literal>, which specifies a single Analysis Engine that should receive
+ the CAS next.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>ParallelStep</literal>, which specifies that multiple Analysis Engines should
+ receive the CAS next, and that the relative order in which these Analysis Engines execute does not
+ matter. Logically, they can run in parallel. The runtime is not obligated to actually execute them in
+ parallel, however, and the current implementation will execute them serially in an arbitrary
+ order.</para>
+ </listitem>
+
+ <listitem>
+ <para><literal>FinalStep</literal>, which indicates that the flow is completed. </para>
+ </listitem>
+ </itemizedlist>
+
+ <para>After executing the step, the framework will call the Flow object's <literal>next()</literal>
+ method again to determine the next destination, and this will be repeated until the Flow Object indicates
+ that processing is complete by returning a <literal>FinalStep</literal>.</para>
+
+ <para>The Flow Controller has access to a <literal>FlowControllerContext</literal>, which is a subtype of
+ <literal>UimaContext</literal>. In addition to the configuration parameter and resource access
+ provided by a <literal>UimaContext</literal>, the <literal>FlowControllerContext</literal> also
+ gives access to the metadata for all of the Analysis Engines that the Flow Controller can route CASes to. Most
+ Flow Controllers will need to use this information to make routing decisions. You can get a handle to the
+ <literal>FlowControllerContext</literal> by calling the <literal>getContext()</literal> method
+ defined in <literal>JCasFlowController_ImplBase</literal> and
+ <literal>CasFlowController_ImplBase</literal>. Then, the
+ <literal>FlowControllerContext.getAnalysisEngineMetaDataMap</literal> method can be called to get a
+ map containing an entry for each of the Analysis Engines in the Aggregate. The keys in this map are the same as
+ the delegate analysis engine keys specified in the aggregate descriptor, and the values are the
+ corresponding <literal>AnalysisEngineMetaData</literal> objects.</para>
+
+ <para>Finally, the Flow Controller has optional methods <literal>addAnalysisEngines</literal> and
+ <literal>removeAnalysisEngines</literal>. These methods are intended to notify the Flow Controller if
+ new Analysis Engines are available to route CASes to, or if previously available Analysis Engines are no
+ longer available. However, the current version of the Apache UIMA framework does not support dynamically
+ adding or removing Analysis Engines to/from an aggregate, so these methods are not currently called. Future
+ versions may support this feature. </para>
+ </section>
+
+ <section id="ugr.tug.fc.example_code">
+ <title>Example Code</title>
+
+ <para>This section walks through the source code of an example Flow Controller that simluates a simple version
+ of the <quote>Whiteboard</quote> flow model. At each step of the flow, the Flow Controller looks it all of the
+ available Analysis Engines that have not yet run on this CAS, and picks one whose input requirements are
+ satisfied.</para>
+
+ <para>The Java class for the example is
+ <literal>org.apache.uima.examples.flow.WhiteboardFlowController</literal> and the source code is
+ included in the UIMA SDK under the <literal>examples/src</literal> directory.</para>
+
+ <section id="ugr.tug.fc.whiteboard">
+ <title>The WhiteboardFlowController Class</title>
+
+
+ <programlisting>public class WhiteboardFlowController
+ extends CasFlowController_ImplBase {
+ public Flow computeFlow(CAS aCAS)
+ throws AnalysisEngineProcessException {
+ WhiteboardFlow flow = new WhiteboardFlow();
+ // As of release 2.3.0, the following is not needed,
+ // because the framework does this automatically
+ // flow.setCas(aCAS);
+
+ return flow;
+ }
+
+ class WhiteboardFlow extends CasFlow_ImplBase {
+ // Discussed Later
+ }
+}</programlisting>
+
+ <para>The <literal>WhiteboardFlowController</literal> extends from
+ <literal>CasFlowController_ImplBase</literal> and implements the
+ <literal>computeFlow</literal> method. The implementation of the <literal>computeFlow</literal>
+ method is very simple; it just constructs a new <literal>WhiteboardFlow</literal> object that will be
+ responsible for routing this CAS. The framework will add a handle to that CAS
+ which it will later use to make its routing decisions.</para>
+
+ <para>Note that we will have one instance of <literal>WhiteboardFlow</literal> per CAS, so if there are
+ multiple CASes being simultaneously processed there will not be any confusion.</para>
+
+ </section>
+ <section id="ugr.tug.fc.whiteboardflow">
+ <title>The WhiteboardFlow Class</title>
+
+
+ <programlisting>class WhiteboardFlow extends CasFlow_ImplBase {
+ private Set mAlreadyCalled = new HashSet();
+
+ public Step next() throws AnalysisEngineProcessException {
+ // Get the CAS that this Flow object is responsible for routing.
+ // Each Flow instance is responsible for a single CAS.
+ CAS cas = getCas();
+
+ // iterate over available AEs
+ Iterator aeIter = getContext().getAnalysisEngineMetaDataMap().
+ entrySet().iterator();
+ while (aeIter.hasNext()) {
+ Map.Entry entry = (Map.Entry) aeIter.next();
+ // skip AEs that were already called on this CAS
+ String aeKey = (String) entry.getKey();
+ if (!mAlreadyCalled.contains(aeKey)) {
+ // check for satisfied input capabilities
+ //(i.e. the CAS contains at least one instance
+ // of each required input
+ AnalysisEngineMetaData md =
+ (AnalysisEngineMetaData) entry.getValue();
+ Capability[] caps = md.getCapabilities();
+ boolean satisfied = true;
+ for (int i = 0; i < caps.length; i++) {
+ satisfied = inputsSatisfied(caps[i].getInputs(), cas);
+ if (satisfied)
+ break;
+ }
+ if (satisfied) {
+ mAlreadyCalled.add(aeKey);
+ if (mLogger.isLoggable(Level.FINEST)) {
+ getContext().getLogger().log(Level.FINEST,
+ "Next AE is: " + aeKey);
+ }
+ return new SimpleStep(aeKey);
+ }
+ }
+ }
+ // no appropriate AEs to call - end of flow
+ getContext().getLogger().log(Level.FINEST, "Flow Complete.");
+ return new FinalStep();
+ }
+
+ private boolean inputsSatisfied(TypeOrFeature[] aInputs, CAS aCAS) {
+ //implementation detail; see the actual source code
+ }
+}</programlisting>
+
+ <para>Each instance of the <literal>WhiteboardFlowController</literal> is responsible for routing a
+ single CAS. A handle to the CAS instance is available by calling the <literal>getCas()</literal> method,
+ which is a standard method defined on the <literal>CasFlow_ImplBase </literal>superclass.</para>
+
+ <para>Each time the <literal>next</literal> method is called, the Flow object iterates over the metadata
+ of all of the available Analysis Engines (obtained via the call to <literal>getContext().
+ getAnalysisEngineMetaDataMap)</literal> and sees if the input types declared in an
+ AnalysisEngineMetaData object are satisfied by the CAS (that is, the CAS contains at least one instance of
+ each declared input type). The exact details of checking for instances of types in the CAS are not discussed
+ here – see the WhiteboardFlowController.java file for the complete source.</para>
+
+ <para>When the Flow object decides which AnalysisEngine should be called next, it indicates this by
+ creating a SimpleStep object with the key for that AnalysisEngine and returning it:</para>
+
+ <programlisting>return new SimpleStep(aeKey);</programlisting>
+
+ <para>The Flow object keeps a list of which Analysis Engines it has invoked in the
+ <literal>mAlreadyCalled</literal> field, and never invokes the same Analysis Engine twice. Note this
+ is not a hard requirement. It is acceptable to design a FlowController that invokes the same Analysis
+ Engine more than once. However, if you do this you must make sure that the flow will eventually
+ terminate.</para>
+
+ <para>If there are no Analysis Engines left whose input requirements are satisfied, the Flow object signals
+ the end of the flow by returning a FinalStep object:</para>
+
+ <programlisting>return new FinalStep();</programlisting>
+
+ <para>Also, note the use of the logger to write tracing messages indicating the decisions made by the Flow
+ Controller. This is a good practice that helps with debugging if the Flow Controller is behaving in an
+ unexpected way.</para>
+ </section>
+ </section>
+ </section>
+
+ <section id="ugr.tug.fc.creating_fc_descriptor">
+ <title>Creating the Flow Controller Descriptor</title>
+
+ <para>To create a Flow Controller Descriptor in the CDE, use File → New → Other
+ → UIMA → Flow Controller Descriptor File:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.5in" format="JPG" fileref="&imgroot;image002.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of Eclipse new object wizard showing Flow Controller</phrase></textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>This will bring up the Overview page for the Flow Controller Descriptor:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.5in" format="JPG" fileref="&imgroot;image004.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of Component Descriptor Editor Overview page for new Flow Controller</phrase></textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>Type in the Java class name that implements the Flow Controller, or use the <quote>Browse</quote> button
+ to select it. You must select a Java class that implements the <literal>FlowController</literal>
+ interface.</para>
+
+ <para>Flow Controller Descriptors are very similar to Primitive Analysis Engine Descriptors – for
+ example you can specify configuration parameters and external resources if you wish.</para>
+
+ <para>If you wish to edit a Flow Controller Descriptor by hand, see section <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor.flow_controller"/> for the syntax.</para>
+ </section>
+
+ <section id="ugr.tug.fc.adding_fc_to_aggregate">
+ <title>Adding a Flow Controller to an Aggregate Analysis Engine</title>
+ <titleabbrev>Adding Flow Controller to an Aggregate</titleabbrev>
+
+ <para>To use a Flow Controller you must add it to an Aggregate Analysis Engine. You can only have one Flow
+ Controller per Aggregate Analysis Engine. In the Component Descriptor Editor, the Flow Controller is
+ specified on the Aggregate page, as a choice in the flow control kind - pick <quote>User-defined Flow</quote>.
+ When you do, the Browse and Search buttons underneath become active, and allow you to specify an existing Flow
+ Controller Descriptor, which when you select it, will be imported into the aggregate descriptor.
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="4.5in" format="JPG" fileref="&imgroot;image006.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of Component Descriptor Editor Aggregate page showing selecting user-defined flow</phrase></textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>The key name is created automatically from the name element in the Flow Controller Descriptor being
+ imported. If you need to change this name, you can do so by switching to the <quote>Source</quote> view using the
+ bottom tabs, and editing the name in the XML source.</para>
+
+ <para>If you edit your Aggregate Analysis Engine Descriptor by hand, the syntax for adding a Flow Controller is:
+
+
+ <programlisting> <delegateAnalysisEngineSpecifiers>
+ ...
+ </delegateAnalysisEngineSpecifiers>
+ <emphasis role="bold"><flowController key=<quote>[String]</quote>>
+ <import .../>
+ </flowController></emphasis></programlisting></para>
+
+ <para>As usual, you can use either in import by location or import by name – see <olink
+ targetdoc="&uima_docs_ref;" targetptr="ugr.ref.xml.component_descriptor.imports"/>.</para>
+
+ <para>The key that you assign to the FlowController can be used elsewhere in the Aggregate Analysis Engine
+ Descriptor – in parameter overrides, resource bindings, and Sofa mappings.</para>
+ </section>
+
+ <section id="ugr.tug.fc.adding_fc_to_cpe">
+ <title>Adding a Flow Controller to a Collection Processing Engine</title>
+ <titleabbrev>Adding Flow Controller to CPE</titleabbrev>
+
+ <para>Flow Controllers cannot be added directly to Collection Processing Engines. To use a Flow Controller in a
+ CPE you first need to wrap the part of your CPE that requires complex flow control into an Aggregate Analysis
+ Engine, and then add the Aggregate Analysis Engine to your CPE. The CPE's deployment and error handling
+ options can then only be configured for the entire Aggregate Analysis Engine as a unit.</para>
+
+ </section>
+
+ <section id="ugr.tug.fc.using_fc_with_cas_multipliers">
+ <title>Using Flow Controllers with CAS Multipliers</title>
+
+ <para>If you want your Flow Controller to work inside an Aggregate Analysis Engine that contains a CAS Multiplier
+ (see <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm"/>), there are additional
+ things you must consider.</para>
+
+ <para>When your Flow Controller routes a CAS to a CAS Multiplier, the CAS Multiplier may produce new CASes that
+ then will also need to be routed by the Flow Controller. When a new output CAS is produced, the framework will call
+ the <literal>newCasProduced</literal> method on the Flow object that was managing the flow of the parent CAS
+ (the one that was input to the CAS Multiplier). The <literal>newCasProduced</literal> method must create a new Flow
+ object that will be responsible for routing the new output CAS.</para>
+
+ <para>In the <literal>CasFlow_ImplBase</literal> and <literal>JCasFlow_ImplBase</literal> classes, the
+ <literal>newCasProduced</literal> method is defined to throw an exception indicating that the Flow
+ Controller does not handle CAS Multipliers. If you want your Flow Controller to properly deal with CAS
+ Multipliers you must override this method.</para>
+
+ <para>If your Flow class extends <literal>CasFlow_ImplBase</literal>, the method signature to override is:
+ <programlisting>protected Flow newCasProduced(CAS newOutputCas, String producedBy)</programlisting>
+ </para>
+
+ <para>If your Flow class extends <literal>JCasFlow_ImplBase</literal>, the method signature to override is:
+ <programlisting>protected Flow newCasProduced(JCas newOutputCas, String producedBy)</programlisting>
+ </para>
+
+ <para>Also, there is a variant of <literal>FinalStep</literal> which can only be specified for output CASes
+ produced by CAS Multipliers within the Aggregate Analysis Engine containing the Flow Controller. This
+ version of <literal>FinalStep</literal> is produced by the calling the constructor with a
+ <literal>true</literal> argument, and it causes the CAS to be immediately released back to the pool. No
+ further processing will be done on it and it will not be output from the aggregate. This is the way that you can
+ build an Aggregate Analysis Engine that outputs some new CASes but not others. Note that if you never want any new
+ CASes to be output from the Aggregate Analysis Engine, you don't need to use this; instead just declare
+ <literal><outputsNewCASes>false</outputsNewCASes></literal> in your Aggregate Analysis
+ Engine Descriptor as described in <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.cm.aggregate_cms"/>.</para>
+
+ <para>For more information on how CAS Multipliers interact with Flow Controllers, see
+ <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.cm.cm_and_fc"/>.
+ </para>
+ </section>
+
+ <section id="ugr.tug.fc.continuing_when_exceptions_occur">
+ <title>Continuing the Flow When Exceptions Occur</title>
+ <para> If an exception occurs when processing a CAS, the framework may call the method
+ <programlisting>boolean continueOnFailure(String failedAeKey, Exception failure)</programlisting>
+ on the Flow object that was managing the flow of that CAS. If this method returns <literal>true</literal>, then
+ the framework may continue to call the <literal>next()</literal> method to continue routing the CAS. If this
+ method returns <literal>false</literal> (the default), the framework will not make any more calls to the
+ <literal>next()</literal> method. </para>
+ <para>In the case where the last Step was a ParallelStep, if at least one of the destinations resulted in a failure,
+ then <literal>continueOnFailure</literal> will be called to report one of the failures. If this method
+ returns true, but one of the other destinations in the ParallelStep resulted in a failure, then the
+ <literal>continueOnFailure</literal> method will be called again to report the next failure. This
+ continues until either this method returns false or there are no more failures. </para>
+ <para>Note that it is possible for processing of a CAS to be aborted without this method being called. This method
+ is only called when an attempt is being made to continue processing of the CAS following an exception, which may
+ be an application configuration decision.</para>
+ <para>In any case, if processing is aborted by the framework for any reason, including because
+ <literal>continueOnFailure</literal> returned false, the framework will call the
+ <literal>Flow.aborted()</literal> method to allow the Flow object to clean up any resources.</para>
+ <para>For an example of how to continue after an exception, see the example
+ code <literal>org.apache.uima.examples.flow.AdvancedFixedFlowController</literal>, in
+ the <literal>examples/src</literal> directory of the UIMA SDK. This exampe also demonstrates the use of
+ <literal>ParallelStep</literal>.</para>
+ </section>
+</chapter>
\ No newline at end of file
Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.multi_views.xml Thu May 6 14:06:02 2010
@@ -0,0 +1,696 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.mvs">
+ <title>Multiple CAS Views of an Artifact</title>
+ <titleabbrev>Multiple CAS Views</titleabbrev>
+
+ <para>UIMA provides an extension to the basic model of the CAS which supports analysis of
+ multiple views of the same artifact, all contained with the CAS. This chapter describes
+ the concepts, terminology, and the API and XML extensions that enable this.</para>
+
+ <para>Multiple CAS Views can simplify things when different versions of the artifact are
+ needed at different stages of the analysis. They are also key to enabling multimodal
+ analysis where the initial artifact is transformed from one modality to another, or where
+ the artifact itself is multimodal, such as the audio, video and closed-captioned text
+ associated with an MPEG object. Each representation of the artifact can be analyzed
+ independently with the standard UIMA programming model; in addition, multi-view
+ components and applications can be constructed.</para>
+
+ <para>UIMA supports this by augmenting the CAS with additional light-weight CAS objects,
+ one for each view, where these objects share most of the same underlying CAS, except for two
+ things: each view has its own set of indexed Feature Structures, and each view has its own
+ subject of analysis (Sofa) - its own version of the artifact being analyzed. The Feature
+ Structure instances themselves are in the shared part of the CAS; only the entries in the
+ indexes are unique for each CAS view.</para>
+
+ <para>All of these CAS view objects are kept together with the CAS, and passed as a unit
+ between components in a UIMA application. APIs exist which allow components and
+ applications to switch among the various view objects, as needed.</para>
+
+ <para>Feature Structures may be indexed in multiple views, if necessary. New methods on CAS
+ Views facilitate adding or removing Feature Structures to or from their index
+ repositories:</para>
+
+
+ <programlisting>aView.addFsToIndexes(aFeatureStructure)
+aView.removeFsFromIndexes(aFeatureStructure)</programlisting>
+
+ <para>specify the view in which this Feature Structure should be added to or removed from the
+ indexes.</para>
+
+ <section id="ugr.tug.mvs.cas_views_and_sofas">
+ <title>CAS Views and Sofas</title>
+
+ <para>Sofas (see <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas.sofa"/>) and CAS Views are linked. In this implementation,
+ every CAS view has one associated Sofa, and every Sofa has one associated CAS
+ View.</para>
+
+ <section id="ugr.tug.mvs.naming_views_sofas">
+ <title>Naming CAS Views and Sofas</title>
+
+ <para>The developer assigns a name to the View / Sofa, which is a simple string
+ (following the rules for Java identifiers, usually without periods, but see special
+ exception below). These names are declared in the component XML metadata, and are
+ used during assembly and by the runtime to enable switching among multiple Views of
+ the CAS at the same time.</para>
+ <note><para>The name is called the Sofa name, for historical reasons, but it applies
+ equally to the View. In the rest of this chapter, we'll refer to it as the Sofa
+ name.</para></note>
+
+ <para>Some applications contain components that expect a variable number of Sofas as
+ input or output. An example of a component that takes a variable number of input Sofas
+ could be one that takes several translations of a document and merges them, where each
+ translation was in a separate Sofa. </para>
+
+ <para> You can specify a variable number of input or output sofa names, where each name
+ has the same base part, by writing the base part of the name (with no periods), followed
+ by a period character and an asterisk character (.*). These denote sofas that have
+ names matching the base part up to the period; for example, names such as
+ <literal>base_name_part.TTX_3d</literal> would match a specification of
+ <literal>base_name_part.*</literal>.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.multi_view_and_single_view">
+ <title>Multi-View, Single-View components & applications</title>
+ <titleabbrev>Multi/Single View parts in Applications</titleabbrev>
+
+ <para>Components and applications can be written to be Multi-View or Single-View.
+ Most components used as primitive building blocks are expected to be Single-View.
+ UIMA provides capabilities to combine these kinds of components with Multi-View
+ components when assembling analysis aggregates or applications.</para>
+
+ <para>Single-View components and applications use only one subject of analysis, and
+ one CAS View. The code and descriptors for these components do not use the facilities
+ described in this chapter.</para>
+
+ <para>Conversely, Multi-View components and applications are aware of the
+ possibility of multiple Views and Sofas, and have code and XML descriptors that
+ create and manipulate them.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.multi_view_components">
+ <title>Multi-View Components</title>
+ <section id="ugr.tug.mvs.deciding_multi_view">
+ <title>How UIMA decides if a component is Multi-View</title>
+ <titleabbrev>Deciding: Multi-View</titleabbrev>
+
+ <para>Every UIMA component has an associated XML Component Descriptor. Multi-View
+ components are identified simply as those whose descriptors declare one or more Sofa
+ names in their Capability sections, as inputs or outputs. If a Component Descriptor
+ does not mention any input or output Sofa names, the framework treats that component
+ as a Single-View component.</para>
+
+ <para>A Multi-View component is passed a special kind of a CAS object, called a base CAS,
+ which it must use to switch to the particular view it wishes to process. The base CAS
+ object itself has no Sofa and no ability to use Indexes; only the views have that
+ capability.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.additional_capabilities">
+ <title>Multi-View: additional capabilities</title>
+
+ <para>Additional capabilities provided for components and applications aware of the
+ possibilities of multiple Views and Sofas include:</para>
+
+ <itemizedlist spacing="compact"><listitem><para>Creating new Views, and for
+ each, setting up the associated Sofa data</para></listitem>
+
+ <listitem><para>Getting a reference to an existing View and its associated Sofa, by
+ name </para></listitem>
+
+ <listitem><para>Specifying a view in which to index a particular Feature Structure
+ instance </para></listitem></itemizedlist>
+
+ </section>
+
+ <section id="ugr.tug.mvs.component_xml_metadata">
+ <title>Component XML metadata</title>
+
+ <para>Each Multi-View component that creates a Sofa or wants to switch to a specific
+ previously created Sofa must declare the name for the Sofa in the capabilities
+ section. For example, a component expecting as input a web document in html format and
+ creating a plain text document for further processing might declare:</para>
+
+
+ <programlisting><capabilities>
+ <capability>
+ <inputs/>
+ <outputs/>
+ <inputSofas>
+<emphasis role="bold"> <sofaName>rawContent</sofaName></emphasis>
+ </inputSofas>
+ <outputSofas>
+<emphasis role="bold"> <sofaName>detagContent</sofaName></emphasis>
+ </outputSofas>
+ </capability>
+</capabilities></programlisting>
+
+ <para>Details on this specification are found in <olink
+ targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor"/>. The Component Descriptor
+ Editor supports Sofa declarations on the <olink targetdoc="&uima_docs_tools;"
+ targetptr="ugr.tools.cde.capabilities"/>.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">
+ <title>Sofa Capabilities and APIs for Applications</title>
+ <titleabbrev>Sofa Capabilities & APIs for Apps</titleabbrev>
+
+ <para>In addition to components, applications can make use of these capabilities. When
+ an application creates a new CAS, it also creates the initial view of that CAS - and this
+ view is the object that is returned from the create call. Additional views beyond this
+ first one can be dynamically created at any time. The application can use the Sofa APIs
+ described in <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas"/> to specify the data to be analyzed.</para>
+
+ <para>If an Application creates a new CAS, the initial CAS that is created will be a view
+ named <quote>_InitialView</quote>. This name can be used in the application and in
+ Sofa Mapping (see the next section) to refer to this otherwise unnamed view.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sofa_name_mapping">
+ <title>Sofa Name Mapping</title>
+
+ <para>Sofa Name mapping is the mechanism which enables UIMA component developers to
+ choose locally meaningful Sofa names in their source code and let aggregate,
+ collection processing engine developers, and application developers connect output
+ Sofas created in one component to input Sofas required in another.</para>
+
+ <para>At a given aggregation level, the assembler or application developer defines
+ names for all the Sofas, and then specifies how these names map to the contained
+ components, using the Sofa Map.</para>
+
+ <para>Consider annotator code to create a new CAS view:</para>
+
+
+ <programlisting>CAS viewX = cas.createView("X");</programlisting>
+
+ <para>Or code to get an existing CAS view:</para>
+
+ <programlisting>CAS viewX = cas.getView("X");</programlisting>
+
+ <para>Without Sofa name mapping the SofaID for the new Sofa will be <quote>X</quote>.
+ However, if a name mapping for <quote>X</quote> has been specified by the aggregate or
+ CPE calling this annotator, the actual SofaID in the CAS can be different.</para>
+
+ <para>All Sofas in a CAS must have unique names. This is accomplished by mapping all
+ declared Sofas as described in the following sections. An attempt to create a Sofa with a
+ SofaID already in use will throw an exception.</para>
+
+ <para>Sofa name mapping must not use the <quote>.</quote> (period) character. Runtime Sofa
+ mapping maps names up to the <quote>.</quote> and appends the period and the following
+ characters to the mapped name.</para>
+
+ <para>To get a Java Iterator for all the views in a CAS:</para>
+
+ <programlisting>Iterator allViews = cas.getViewIterator();</programlisting>
+
+ <para>To get a Java Iterator for selected views in a CAS, for example, views whose name
+ is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix
+ can be any String:</para>
+
+ <programlisting>Iterator someViews = cas.getViewIterator(String namePrefix);</programlisting>
+
+ <note><para>Sofa name mapping is applied to namePrefix.</para></note>
+
+ <para>Sofa name mappings are not currently supported for remote Analysis Engines.
+ See <xref linkend="ugr.tug.mvs.name_mapping_remote_services"/>.</para>
+
+ <section id="ugr.tug.mvs.name_mapping_aggregate">
+ <title>Name Mapping in an Aggregate Descriptor</title>
+
+ <para>For each component of an Aggregate, name mapping specifies the conversion
+ between component Sofa names and names at the aggregate level.</para>
+
+ <para>Here's an example. Consider two Multi-View annotators to be assembled
+ into an aggregate which takes an audio segment consisting of spoken English and
+ produces a German text translation.</para>
+
+ <para>The first annotator takes an audio segment as input Sofa and produces a text
+ transcript as output Sofa. The annotator designer might choose these Sofa names to be
+ <quote>AudioInput</quote> and <quote>TranscribedText</quote>.</para>
+
+ <para>The second annotator is designed to translate text from English to German. This
+ developer might choose the input and output Sofa names to be
+ <quote>EnglishDocument</quote> and <quote>GermanDocument</quote>,
+ respectively.</para>
+
+ <para>In order to hook these two annotators together, the following section would be
+ added to the top level of the aggregate descriptor:</para>
+
+
+ <programlisting><![CDATA[<sofaMappings>
+ <sofaMapping>
+ <componentKey>SpeechToText</componentKey>
+ <componentSofaName>AudioInput</componentSofaName>
+ <aggregateSofaName>SegementedAudio</aggregateSofaName>
+ </sofaMapping>
+ <sofaMapping>
+ <componentKey>SpeechToText</componentKey>
+ <componentSofaName>TranscribedText</componentSofaName>
+ <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+ </sofaMapping>
+ <sofaMapping>
+ <componentKey>EnglishToGermanTranslator</componentKey>
+ <componentSofaName>EnglishDocument</componentSofaName>
+ <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+ </sofaMapping>
+ <sofaMapping>
+ <componentKey>EnglishToGermanTranslator</componentKey>
+ <componentSofaName>GermanDocument</componentSofaName>
+ <aggregateSofaName>GermanTranslation</aggregateSofaName>
+ </sofaMapping>
+</sofaMappings>]]></programlisting>
+
+ <para>The Component Descriptor Editor supports Sofa name mapping in aggregates and
+ simplifies the task. See <olink targetdoc="&uima_docs_tools;"
+ targetptr="ugr.tools.cde.capabilities.sofa_name_mapping"/> for details.</para>
+ </section>
+
+ <section id="ugr.tug.mvs.name_mapping_cpe"><title>Name Mapping in a CPE
+ Descriptor</title>
+
+ <para>The CPE descriptor aggregates together a Collection Reader and CAS Processors
+ (Annotators and CAS Consumers). Sofa mappings can be added to the following elements
+ of CPE descriptors: <literal><collectionIterator></literal>,
+ <literal><casInitializer></literal> and the
+ <literal><casProcessor></literal>. To be consistent with the
+ organization of CPE descriptors, the maps for the CPE descriptor are distributed
+ among the XML markup for each of the parts (collectionIterator, casInitializer,
+ casProcessor). Because of this the<literal>
+ <componentKey></literal> element is not needed. Finally, rather than
+ sub-elements for the parts, the XML markup for these uses attributes. See <olink
+ targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings"/>.</para>
+
+ <para>Here's an example. Let's use the aggregate from the previous section
+ in a collection processing engine. Here we will add a Collection Reader that outputs
+ audio segments in an output Sofa named <quote>nextSegment</quote>. Remember to
+ declare an output Sofa nextSegment in the collection reader description.
+ We'll add a CAS Consumer in the next section.</para>
+
+
+ <programlisting><collectionReader>
+ <collectionIterator>
+ <descriptor>
+ . . .
+ </descriptor>
+ <configurationParameterSettings>...</configurationParameterSettings>
+<emphasis role="bold"> <sofaNameMappings>
+ <sofaNameMapping componentSofaName="nextSegment"
+ cpeSofaName="SegementedAudio"/>
+ </sofaNameMappings>
+</emphasis> </collectionIterator>
+ <casInitializer/>
+<collectionReader></programlisting>
+
+ <para>At this point the CAS Processor section for the aggregate does not need any Sofa
+ mapping because the aggregate input Sofa has the same name,
+ <quote>SegementedAudio</quote>, as is being produced by the Collection
+ Reader.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.specifying_cas_view_for_single_view">
+ <title>Specifying the CAS View for a Single-View Component</title>
+ <titleabbrev>CAS View for Single-View Parts</titleabbrev>
+
+ <para>Single-View components receive a Sofa named <quote>_InitialView</quote>, or
+ a Sofa that is mapped to this name.</para>
+
+ <para>For example, assume that the CAS Consumer to be used in our CPE is a Single-View
+ component that expects the analysis results associated with the input CAS, and that
+ we want it to use the results from the translated German text Sofa. The following
+ mapping added to the CAS Processor section for the CPE will instruct the CPE to get the
+ CAS view for the German text Sofa and pass it to the CAS Consumer:</para>
+
+
+ <programlisting><casProcessor>
+ . . .
+ <emphasis role="bold"><sofaNameMappings>
+ <sofaNameMapping componentSofaName="_InitialView"
+ cpeSofaName="GermanTranslation"/>
+ <sofaNameMappings>
+</emphasis></casProcessor></programlisting>
+
+ <para id="ugr.tug.mvs.sofa_mapping_leav_out_name">An alternative syntax for
+ this kind of mapping is to simply leave out the component sofa name in this
+ case.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.name_mapping_application">
+ <title>Name Mapping in a UIMA Application</title>
+
+ <para>Applications which instantiate UIMA components directly using the
+ UIMAFramework methods can also create a top level Sofa mapping using the
+ <quote>additional parameters</quote> capability.</para>
+
+
+ <programlisting>//create a "root" UIMA context for your whole application
+
+UimaContextAdmin rootContext =
+ UIMAFramework.newUimaContext(UIMAFramework.getLogger(),
+ UIMAFramework.newDefaultResourceManager(),
+ UIMAFramework.newConfigurationManager());
+
+input = new XMLInputSource("test.xml");
+desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input);
+
+//setup sofa name mappings using the api
+
+HashMap sofamappings = new HashMap();
+sofamappings.put("localName1", "globalName1");
+sofamappings.put("localName2", "globalName2");
+
+//create a UIMA Context for the new AE we are about to create
+
+//first argument is unique key among all AEs used in the application
+UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap);
+
+//instantiate AE, passing the UIMA Context through the additional
+//parameters map
+
+Map additionalParams = new HashMap();
+additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);
+
+AnalysisEngine ae =
+ UIMAFramework.produceAnalysisEngine(desc,additionalParams);</programlisting>
+
+ <para>Sofa mappings are applied from the inside out, i.e., local to global. First, any
+ aggregate mappings are applied, then any CPE mappings, and finally, any specified
+ using this <quote>additional parameters</quote> capability.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.name_mapping_remote_services">
+ <title>Name Mapping for Remote Services</title>
+
+ <para>Currently, no client-side Sofa mapping information is passed from a UIMA client
+ to a remote service. This can cause complications for UIMA services in a Multi-View
+ application.</para>
+
+ <para>Remote Multi-View services will work only if the service is Single-View, or if the
+ Sofa names expected by the service exactly match the Sofa names produced by the client.</para>
+
+ <para>If your application requires Sofa mappings for a remote Analysis Engine, you
+ can wrap your remotely deployed AE in an aggregate (on the remote side), and specify
+ the necessary Sofa mappings in the descriptor for that aggregate.</para>
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.jcas_extensions_for_multi_views">
+ <title>JCas extensions for Multiple Views</title>
+
+ <para>The JCas interface to the CAS can be used with any / all views, as well as the base CAS
+ sent to Multi-View components. You can always get a JCas object from an existing CAS
+ object by using the method getJCas(); this call will create the JCas if it doesn't
+ already exist. If it does exist, it just returns the existing JCas that corresponds to
+ the CAS.</para>
+
+ <para>JCas implements the getView(...) method, enabling switching to other named
+ views, just like the corresponding method on the CAS. The JCas version, however,
+ returns JCas objects, instead of CAS objects, corresponding to the view.</para>
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application">
+ <title>Sample Multi-View Application</title>
+
+ <para>The UIMA SDK contains a simple Sofa example application which demonstrates many
+ Sofa specific concepts and methods. The source code for the application driver is in
+ <literal>examples/src/org/apache/uima/examples/SofaExampleApplication.java</literal>
+ and the Multi-View annotator is given in
+ <literal>SofaExampleAnnotator.java</literal> in the same directory.</para>
+
+ <para>This sample application demonstrates a language translator annotator which
+ expects an input text Sofa with an English document and creates an output text Sofa
+ containing a German translation. Some of the key Sofa concepts illustrated here
+ include:</para>
+
+ <itemizedlist spacing="compact"><listitem><para>Sofa creation.</para>
+ </listitem>
+
+ <listitem><para>Access of multiple CAS views.</para></listitem>
+
+ <listitem><para>Unique feature structure index space for each view.</para>
+ </listitem>
+
+ <listitem><para>Feature structures containing cross references between
+ annotations in different CAS views.</para></listitem>
+
+ <listitem><para>The strong affinity of annotations with a specific Sofa. </para>
+ </listitem></itemizedlist>
+
+ <section id="ugr.tug.mvs.sample_application.descriptor">
+ <title>Annotator Descriptor</title>
+
+ <para>The annotator descriptor in
+ <literal>examples/descriptors/analysis_engine/SofaExampleAnnotator.xml</literal>
+ declares an input Sofa named <quote>EnglishDocument</quote> and an output Sofa
+ named <quote>GermanDocument</quote>. A custom type
+ <quote>CrossAnnotation</quote> is also defined:</para>
+
+
+ <programlisting><![CDATA[<typeDescription>
+ <name>sofa.test.CrossAnnotation</name>
+ <description/>
+ <supertypeName>uima.tcas.Annotation</supertypeName>
+ <features>
+ <featureDescription>
+ <name>otherAnnotation</name>
+ <description/>
+ <rangeTypeName>uima.tcas.Annotation</rangeTypeName>
+ </featureDescription>
+ </features>
+</typeDescription>]]></programlisting>
+
+ <para>The <literal>CrossAnnotation</literal> type is derived from
+ <literal>uima.tcas.Annotation </literal>and includes one new feature: a
+ reference to another annotation.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application.setup">
+ <title>Application Setup</title>
+
+ <para>The application driver instantiates an analysis engine,
+ <literal>seAnnotator</literal>, from the annotator descriptor, obtains a new
+ base CAS using that engine's CAS definition, and creates the expected input
+ Sofa using:</para>
+
+
+ <programlisting>CAS cas = seAnnotator.newCAS();
+CAS aView = cas.createView("EnglishDocument");</programlisting>
+
+ <para>Since <literal>seAnnotator</literal> is a primitive component, and no Sofa
+ mapping has been defined, the SofaID will be <quote>EnglishDocument</quote>.
+ Local Sofa data is set using:</para>
+
+
+ <programlisting>aView.setDocumentText("this beer is good");</programlisting>
+
+ <para>At this point the CAS contains all necessary inputs for the translation
+ annotator and its process method is called.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application.annotator_processing">
+ <title>Annotator Processing</title>
+
+ <para>Annotator processing consists of parsing the English document into individual
+ words, doing word-by-word translation and concatenating the translations into a
+ German translation. Analysis metadata on the English Sofa will be an annotation for
+ each English word. Analysis metadata on the German Sofa will be a
+ <literal>CrossAnnotation</literal> for each German word, where the
+ <literal>otherAnnotation</literal> feature will be a reference to the associated
+ English annotation.</para>
+
+ <para>Code of interest includes two CAS views:</para>
+
+
+ <programlisting>// get View of the English text Sofa
+englishView = aCas.getView("EnglishDocument");
+
+// Create the output German text Sofa
+germanView = aCas.createView("GermanDocument");</programlisting>
+
+ <para>the indexing of annotations with the appropriate view:</para>
+
+
+ <programlisting>englishView.addFsToIndexes(engAnnot);
+. . .
+germanView.addFsToIndexes(germAnnot);</programlisting>
+
+ <para>and the combining of metadata belonging to different Sofas in the same feature
+ structure:</para>
+
+
+ <programlisting>// add link to English text
+germAnnot.setFeatureValue(other, engAnnot);</programlisting>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application.accessing_results">
+ <title>Accessing the results of analysis</title>
+
+ <para>The application needs to get the results of analysis, which may be in different
+ views. Analysis results for each Sofa are dumped independently by iterating over all
+ annotations for each associated CAS view. For the English Sofa:</para>
+
+
+ <programlisting>//get annotation iterator for this CAS
+FSIndex anIndex = aView.getAnnotationIndex();
+FSIterator anIter = anIndex.iterator();
+while (anIter.isValid()) {
+ AnnotationFS annot = (AnnotationFS) anIter.get();
+ System.out.println(" " + annot.getType().getName()
+ + ": " + annot.getCoveredText());
+ anIter.moveToNext();
+}</programlisting>
+
+ <para>Iterating over all German annotations looks the same, except for the
+ following:</para>
+
+
+ <programlisting>if (annot.getType() == cross) {
+ AnnotationFS crossAnnot =
+ (AnnotationFS) annot.getFeatureValue(other);
+ System.out.println(" other annotation feature: "
+ + crossAnnot.getCoveredText());
+}</programlisting>
+
+ <para>Of particular interest here is the built-in Annotation type method
+ <literal>getCoveredText()</literal>. This method uses the
+ <quote>begin</quote> and <quote>end</quote> features of the annotation to create
+ a substring from the CAS document. The SofaRef feature of the annotation is used to
+ identify the correct Sofa's data from which to create the substring.</para>
+
+ <para>The example program output is:</para>
+
+
+ <programlisting>---Printing all annotations for English Sofa---
+uima.tcas.DocumentAnnotation: this beer is good
+uima.tcas.Annotation: this
+uima.tcas.Annotation: beer
+uima.tcas.Annotation: is
+uima.tcas.Annotation: good
+
+---Printing all annotations for German Sofa---
+uima.tcas.DocumentAnnotation: das bier ist gut
+sofa.test.CrossAnnotation: das
+ other annotation feature: this
+sofa.test.CrossAnnotation: bier
+ other annotation feature: beer
+sofa.test.CrossAnnotation: ist
+ other annotation feature: is
+sofa.test.CrossAnnotation: gut
+ other annotation feature: good</programlisting>
+
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.views_api_summary">
+ <title>Views API Summary</title>
+
+ <para>The recommended way to deliver a particular CAS view to a <emphasis role="bold-italic">Single-View</emphasis> component is to use by Sofa-mapping in
+ the CPE and/or aggregate descriptors.</para>
+
+ <para>For <emphasis role="bold-italic">Multi-View </emphasis> components or
+ applications, the following methods are used to create or get a reference to a CAS view
+ for a particular Sofa:</para>
+
+ <para>Creating a new View:</para>
+
+
+ <programlisting>JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping);
+CAS newView = aCAS .createView(String localNameOfTheViewBeforeMapping);</programlisting>
+
+ <para>Getting a View from a CAS or JCas:</para>
+
+
+ <programlisting><?db-font-size 80% ?>JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping);
+CAS myView = aCAS .getView(String localNameOfTheViewBeforeMapping);
+Iterator allViews = aCasOrJCas.getViewIterator();
+Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);</programlisting>
+
+ <para>The following methods are useful for all annotators and applications:</para>
+
+ <para>Setting Sofa data for a CAS or JCas:</para>
+
+
+ <programlisting>aCasOrJCas.setDocumentText(String docText);
+aCasOrJCas.setSofaDataString(String docText, String mimeType);
+aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType);
+aCasOrJCas.setSofaDataURI(String uri, String mimeType);</programlisting>
+
+ <para>Getting Sofa data for a particular CAS or JCas:</para>
+
+
+ <programlisting>String doc = aCasOrJCas.getDocumentText();
+String doc = aCasOrJCas.getSofaDataString();
+FeatureStructure array = aCasOrJCas.getSofaDataArray();
+String uri = aCasOrJCas.getSofaDataURI();
+InputStream is = aCasOrJCas.getSofaDataStream();</programlisting>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sofa_incompatibilities_v1_v2">
+ <title>Sofa Incompatibilities between UIMA version 1 and version 2</title>
+ <titleabbrev>Sofa Incompatibilities: V1 and V2</titleabbrev>
+
+ <para>A major change in version 2 is related to the support of Single-View components
+ and applications. Given an analysis engine, <literal>ae</literal>, the API
+
+ <programlisting>CAS cas = ae.newCas();</programlisting>
+ used to return the base CAS. Now it returns a view of the Sofa named
+ <quote>_InitialView</quote>. This Sofa will actually only be created if any Sofa data
+ is set for this view. The initial view is used for Single-View applications and
+ Multi-View annotators with no Sofa mapping.</para>
+
+ <para>The process method of Multi-View annotators receive the base CAS, however the base
+ CAS no longer has an index repository to hold <quote>global</quote> data. Global data
+ needs to be put in a specific named CAS view of your choice.</para>
+
+ <para>Because of these changes, the following scenarios will break with v2.0 clients:
+
+ <itemizedlist spacing="compact"><listitem><para>Any version 1.x services (you
+ must migrate the services to version 2).</para></listitem>
+
+ <listitem><para>Applications or components explicitly referencing
+ <quote>_DefaultTextSofaName</quote> in code or descriptors.</para>
+ </listitem>
+
+ <listitem><para>Multi-View applications using the Base CAS index repository.
+ </para></listitem></itemizedlist></para>
+ </section>
+</chapter>
\ No newline at end of file
Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tug.xmi_emf.xml Thu May 6 14:06:02 2010
@@ -0,0 +1,186 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent">
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.xmi_emf">
+ <title>XMI and EMF Interoperability</title>
+ <titleabbrev>XMI & EMF</titleabbrev>
+
+ <section id="ugr.tug.xmi_emf.overview">
+ <title>Overview</title>
+
+ <para>In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph.
+ There are established standards in this area
+ – specifically, <trademark class="registered">UML</trademark> is an <trademark class="trade">
+ OMG</trademark> standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML
+ representation of object graphs.</para>
+
+ <para>Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based
+ application development, and it is based on UML and XMI. In EMF, you define class models using a metamodel called
+ Ecore, which is similar to UML. EMF provides tools for converting a UML model to Ecore. EMF can then generate Java
+ classes from your model, and supports persistence of those classes in the XMI format.</para>
+
+ <para>The UIMA SDK provides tools for interoperability with XMI and EMF. These tools allow conversions of UIMA
+ Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format. This provides a
+ number of advantages, including:</para>
+
+ <blockquote>
+ <para>You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically
+ convert it to a UIMA Type System.</para>
+
+ <para>You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it
+ produces to XMI. This data is now in a form where it can easily be ingested by an EMF-based application.</para>
+ </blockquote>
+
+ <para>More generally, we are adopting the well-documented, open standard XMI as the standard way to represent
+ UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard
+ enables other applications to more easily produce or consume these UIMA analysis results.</para>
+
+ <para>For more information on XMI, see Grose et al. <emphasis>Mastering XMI. Java Programming with XMI, XML, and
+ UML.</emphasis> John Wiley & Sons, Inc. 2002.</para>
+
+ <para>For more information on EMF, see Budinsky et al. <emphasis>Eclipse Modeling Framework 2.0.</emphasis>
+ Addison-Wesley. 2006.</para>
+
+ <para>For details of how the UIMA CAS is represented in XMI format, see <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xmi"/> .</para>
+
+ </section>
+
+ <section id="ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_system">
+ <title>Converting an Ecore Model to or from a UIMA Type System</title>
+
+ <para>The UIMA SDK provides the following two classes:</para>
+
+ <para><emphasis role="bold"><literal>Ecore2UimaTypeSystem:</literal>
+ </emphasis> converts from an .ecore model developed using EMF to a UIMA-compliant
+ TypeSystem descriptor. This is a Java class that can be run as a standalone program or
+ invoked from another Java application. To run as a standalone program,
+ execute:</para>
+
+ <para><command>java org.apache.uima.ecore.Ecore2UimaTypeSystem <ecore
+ file> <output file></command></para>
+
+ <para>The input .ecore file will be converted to a UIMA TypeSystem descriptor and written
+ to the specified output file. You can then use the resulting TypeSystem descriptor in
+ your UIMA application.</para>
+
+ <para><emphasis role="bold"><literal>UimaTypeSystem2Ecore:</literal>
+ </emphasis> converts from a UIMA TypeSystem descriptor to an .ecore model. This is a
+ Java class that can be run as a standalone program or invoked from another Java
+ application. To run as a standalone program, execute:</para>
+
+ <para><command>java org.apache.uima.ecore.UimaTypeSystem2Ecore
+ <TypeSystem descriptor> <output file></command></para>
+
+ <para>The input UIMA TypeSystem descriptor will be converted to an Ecore model file and
+ written to the specified output file. You can then use the resulting Ecore model in EMF
+ applications. The converted type system will include any
+ <literal><import...></literal>ed TypeSystems; the fact that they were
+ imported is currently not preserved.</para>
+
+ <para>To run either of these converters, your classpath will need to include the UIMA jar
+ files as well as the following jar files from the EMF distribution: common.jar,
+ ecore.jar, and ecore.xmi.jar.</para>
+
+ <para>Also, note that the uima-core.jar file contains the Ecore model file uima.ecore,
+ which defines the built-in UIMA types. You may need to use this file from your EMF
+ applications.</para>
+
+ </section>
+
+ <section id="ugr.tug.xmi_emf.using_xmi_cas_serialization">
+ <title>Using XMI CAS Serialization</title>
+
+ <para>The UIMA SDK provides XMI support through the following two classes:</para>
+
+ <para><emphasis role="bold"><literal>XmiCasSerializer:</literal></emphasis>
+ can be run from within a UIMA application to write out a CAS to the standard XMI format. The
+ XMI that is generated will be compliant with the Ecore model generated by
+ <literal>UimaTypeSystem2Ecore</literal>. An EMF application could use this Ecore
+ model to ingest and process the XMI produced by the XmiCasSerializer.</para>
+
+ <para><emphasis role="bold"><literal>XmiCasDeserializer:</literal></emphasis>
+ can be run from within a UIMA application to read in an XMI document and populate a CAS. The
+ XMI must conform to the Ecore model generated by
+ <literal>UimaTypeSystem2Ecore</literal>.</para>
+
+ <para>Also, the uimaj-examples Eclipse project contains some example code that shows
+ how to use the serializer and deserializer:
+
+ <blockquote>
+ <para><literal>org.apache.uima.examples.xmi.XmiWriterCasConsumer:</literal>
+ This is a CAS Consumer that writes each CAS to an output file in XMI format. It is analogous
+ to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it
+ uses the XMI serialization format.</para>
+
+ <para><literal>org.apache.uima.examples.xmi.XmiCollectionReader:</literal>
+ This is a Collection Reader that reads a directory of XMI files and deserializes each of
+ them into a CAS. For example, this would allow you to build a Collection Processing
+ Engine that reads XMI files, which could contain some previous analysis results, and
+ then do further analysis.</para>
+ </blockquote></para>
+
+ <para>Finally, in under the folder <literal>uimaj-examples/ecore_src</literal> is
+ the class
+ <literal>org.apache.uima.examples.xmi.XmiEcoreCasConsumer</literal>, which
+ writes each CAS to XMI format and also saves the Type System as an Ecore file. Since this
+ uses the <literal>UimaTypeSystem2Ecore</literal> converter, to compile it you must
+ add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar –
+ see ecore_src/readme.txt for instructions.</para>
+
+ <section id="ugr.tug.xmi_emf.xml_character_issues">
+ <title>Character Encoding Issues with XML Serialization</title>
+
+ <para>Note that not all valid Unicode characters are valid XML characters, at least not in XML
+ 1.0. Moreover, it is possible to create characters in Java that are not even valid Unicode
+ characters, let alone XML characters. As UIMA character data is translated directly into XML
+ character data on serialization, this may lead to issues. UIMA will therefore check that the
+ character data that is being serialized is valid for the version of XML being used. If
+ non-serializable character data is encountered during serialization, an exception is thrown
+ and serialization fails (to avoid creating invalid XML data). UIMA does not simply replace
+ the offending characters with some valid replacement character; the assumption being that
+ most applications would not like to have their data modified automatically.
+ </para>
+
+ <para>If you know you are going to use XML serialization, and you would like to avoid such issues
+ on serialization, you should check any character data you create in UIMA ahead of time. Issues
+ most often arise with the document text, as documents may originate at various sources, and
+ may be of varying quality. So it's a particularly good idea to check the document text for
+ characters that will cause issues for serialization.
+ </para>
+
+ <para>UIMA provides a handful of functions to assist you in checking Java character data. Those
+ methods are located in
+ <literal>org.apache.uima.internal.util.XMLUtils.checkForNonXmlCharacters()</literal>, with
+ several overloads. Please check the javadocs for further information.
+ </para>
+
+ <para>Please note that these issues are not specific to XMI serialization, they apply to the
+ older XCAS format in the same way.
+ </para>
+
+ </section>
+ </section>
+
+</chapter>
\ No newline at end of file
Added: uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml?rev=941744&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-tutorials-and-users-guides/src/docbook/tutorials_and_users_guides.xml Thu May 6 14:06:02 2010
@@ -0,0 +1,36 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd">
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<book lang="en">
+ <title>UIMA Tutorial and Developers' Guides</title>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="../../target/docbook-shared/common_book_info_ibm_c.xml"/>
+
+ <toc/>
+
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="annotator_analysis_engine_guide.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cpe.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.application.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.fc.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.aas.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.multi_views.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.cas_multiplier.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="tug.xmi_emf.xml"/>
+</book>