You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2008/08/28 23:28:16 UTC
svn commit: r689997 [29/32] - in /incubator/uima/uimaj/trunk/uima-docbooks:
./ src/ src/docbook/overview_and_setup/ src/docbook/references/
src/docbook/tools/ src/docbook/tutorials_and_users_guides/
src/docbook/uima/organization/ src/olink/references/
Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.multi_views.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.multi_views.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.multi_views.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.multi_views.xml Thu Aug 28 14:28:14 2008
@@ -1,696 +1,696 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY % uimaents SYSTEM "../entities.ent">
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.tug.mvs">
- <title>Multiple CAS Views of an Artifact</title>
- <titleabbrev>Multiple CAS Views</titleabbrev>
-
- <para>UIMA provides an extension to the basic model of the CAS which supports analysis of
- multiple views of the same artifact, all contained with the CAS. This chapter describes
- the concepts, terminology, and the API and XML extensions that enable this.</para>
-
- <para>Multiple CAS Views can simplify things when different versions of the artifact are
- needed at different stages of the analysis. They are also key to enabling multimodal
- analysis where the initial artifact is transformed from one modality to another, or where
- the artifact itself is multimodal, such as the audio, video and closed-captioned text
- associated with an MPEG object. Each representation of the artifact can be analyzed
- independently with the standard UIMA programming model; in addition, multi-view
- components and applications can be constructed.</para>
-
- <para>UIMA supports this by augmenting the CAS with additional light-weight CAS objects,
- one for each view, where these objects share most of the same underlying CAS, except for two
- things: each view has its own set of indexed Feature Structures, and each view has its own
- subject of analysis (Sofa) - its own version of the artifact being analyzed. The Feature
- Structure instances themselves are in the shared part of the CAS; only the entries in the
- indexes are unique for each CAS view.</para>
-
- <para>All of these CAS view objects are kept together with the CAS, and passed as a unit
- between components in a UIMA application. APIs exist which allow components and
- applications to switch among the various view objects, as needed.</para>
-
- <para>Feature Structures may be indexed in multiple views, if necessary. New methods on CAS
- Views facilitate adding or removing Feature Structures to or from their index
- repositories:</para>
-
-
- <programlisting>aView.addFsToIndexes(aFeatureStructure)
-aView.removeFsFromIndexes(aFeatureStructure)</programlisting>
-
- <para>specify the view in which this Feature Structure should be added to or removed from the
- indexes.</para>
-
- <section id="ugr.tug.mvs.cas_views_and_sofas">
- <title>CAS Views and Sofas</title>
-
- <para>Sofas (see <olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.aas.sofa"/>) and CAS Views are linked. In this implementation,
- every CAS view has one associated Sofa, and every Sofa has one associated CAS
- View.</para>
-
- <section id="ugr.tug.mvs.naming_views_sofas">
- <title>Naming CAS Views and Sofas</title>
-
- <para>The developer assigns a name to the View / Sofa, which is a simple string
- (following the rules for Java identifiers, usually without periods, but see special
- exception below). These names are declared in the component XML metadata, and are
- used during assembly and by the runtime to enable switching among multiple Views of
- the CAS at the same time.</para>
- <note><para>The name is called the Sofa name, for historical reasons, but it applies
- equally to the View. In the rest of this chapter, we'll refer to it as the Sofa
- name.</para></note>
-
- <para>Some applications contain components that expect a variable number of Sofas as
- input or output. An example of a component that takes a variable number of input Sofas
- could be one that takes several translations of a document and merges them, where each
- translation was in a separate Sofa. </para>
-
- <para> You can specify a variable number of input or output sofa names, where each name
- has the same base part, by writing the base part of the name (with no periods), followed
- by a period character and an asterisk character (.*). These denote sofas that have
- names matching the base part up to the period; for example, names such as
- <literal>base_name_part.TTX_3d</literal> would match a specification of
- <literal>base_name_part.*</literal>.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.multi_view_and_single_view">
- <title>Multi-View, Single-View components & applications</title>
- <titleabbrev>Multi/Single View parts in Applications</titleabbrev>
-
- <para>Components and applications can be written to be Multi-View or Single-View.
- Most components used as primitive building blocks are expected to be Single-View.
- UIMA provides capabilities to combine these kinds of components with Multi-View
- components when assembling analysis aggregates or applications.</para>
-
- <para>Single-View components and applications use only one subject of analysis, and
- one CAS View. The code and descriptors for these components do not use the facilities
- described in this chapter.</para>
-
- <para>Conversely, Multi-View components and applications are aware of the
- possibility of multiple Views and Sofas, and have code and XML descriptors that
- create and manipulate them.</para>
-
- </section>
- </section>
-
- <section id="ugr.tug.mvs.multi_view_components">
- <title>Multi-View Components</title>
- <section id="ugr.tug.mvs.deciding_multi_view">
- <title>How UIMA decides if a component is Multi-View</title>
- <titleabbrev>Deciding: Multi-View</titleabbrev>
-
- <para>Every UIMA component has an associated XML Component Descriptor. Multi-View
- components are identified simply as those whose descriptors declare one or more Sofa
- names in their Capability sections, as inputs or outputs. If a Component Descriptor
- does not mention any input or output Sofa names, the framework treats that component
- as a Single-View component.</para>
-
- <para>A Multi-View component is passed a special kind of a CAS object, called a base CAS,
- which it must use to switch to the particular view it wishes to process. The base CAS
- object itself has no Sofa and no ability to use Indexes; only the views have that
- capability.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.additional_capabilities">
- <title>Multi-View: additional capabilities</title>
-
- <para>Additional capabilities provided for components and applications aware of the
- possibilities of multiple Views and Sofas include:</para>
-
- <itemizedlist spacing="compact"><listitem><para>Creating new Views, and for
- each, setting up the associated Sofa data</para></listitem>
-
- <listitem><para>Getting a reference to an existing View and its associated Sofa, by
- name </para></listitem>
-
- <listitem><para>Specifying a view in which to index a particular Feature Structure
- instance </para></listitem></itemizedlist>
-
- </section>
-
- <section id="ugr.tug.mvs.component_xml_metadata">
- <title>Component XML metadata</title>
-
- <para>Each Multi-View component that creates a Sofa or wants to switch to a specific
- previously created Sofa must declare the name for the Sofa in the capabilities
- section. For example, a component expecting as input a web document in html format and
- creating a plain text document for further processing might declare:</para>
-
-
- <programlisting><capabilities>
- <capability>
- <inputs/>
- <outputs/>
- <inputSofas>
-<emphasis role="bold"> <sofaName>rawContent</sofaName></emphasis>
- </inputSofas>
- <outputSofas>
-<emphasis role="bold"> <sofaName>detagContent</sofaName></emphasis>
- </outputSofas>
- </capability>
-</capabilities></programlisting>
-
- <para>Details on this specification are found in <olink
- targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.component_descriptor"/>. The Component Descriptor
- Editor supports Sofa declarations on the <olink targetdoc="&uima_docs_tools;"
- targetptr="ugr.tools.cde.capabilities"/>.</para>
-
- </section>
- </section>
-
- <section id="ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">
- <title>Sofa Capabilities and APIs for Applications</title>
- <titleabbrev>Sofa Capabilities & APIs for Apps</titleabbrev>
-
- <para>In addition to components, applications can make use of these capabilities. When
- an application creates a new CAS, it also creates the initial view of that CAS - and this
- view is the object that is returned from the create call. Additional views beyond this
- first one can be dynamically created at any time. The application can use the Sofa APIs
- described in <olink targetdoc="&uima_docs_tutorial_guides;"
- targetptr="ugr.tug.aas"/> to specify the data to be analyzed.</para>
-
- <para>If an Application creates a new CAS, the initial CAS that is created will be a view
- named <quote>_InitialView</quote>. This name can be used in the application and in
- Sofa Mapping (see the next section) to refer to this otherwise unnamed view.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.sofa_name_mapping">
- <title>Sofa Name Mapping</title>
-
- <para>Sofa Name mapping is the mechanism which enables UIMA component developers to
- choose locally meaningful Sofa names in their source code and let aggregate,
- collection processing engine developers, and application developers connect output
- Sofas created in one component to input Sofas required in another.</para>
-
- <para>At a given aggregation level, the assembler or application developer defines
- names for all the Sofas, and then specifies how these names map to the contained
- components, using the Sofa Map.</para>
-
- <para>Consider annotator code to create a new CAS view:</para>
-
-
- <programlisting>CAS viewX = cas.createView("X");</programlisting>
-
- <para>Or code to get an existing CAS view:</para>
-
- <programlisting>CAS viewX = cas.getView("X");</programlisting>
-
- <para>Without Sofa name mapping the SofaID for the new Sofa will be <quote>X</quote>.
- However, if a name mapping for <quote>X</quote> has been specified by the aggregate or
- CPE calling this annotator, the actual SofaID in the CAS can be different.</para>
-
- <para>All Sofas in a CAS must have unique names. This is accomplished by mapping all
- declared Sofas as described in the following sections. An attempt to create a Sofa with a
- SofaID already in use will throw an exception.</para>
-
- <para>Sofa name mapping must not use the <quote>.</quote> (period) character. Runtime Sofa
- mapping maps names up to the <quote>.</quote> and appends the period and the following
- characters to the mapped name.</para>
-
- <para>To get a Java Iterator for all the views in a CAS:</para>
-
- <programlisting>Iterator allViews = cas.getViewIterator();</programlisting>
-
- <para>To get a Java Iterator for selected views in a CAS, for example, views whose name
- is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix
- can be any String:</para>
-
- <programlisting>Iterator someViews = cas.getViewIterator(String namePrefix);</programlisting>
-
- <note><para>Sofa name mapping is applied to namePrefix.</para></note>
-
- <para>Sofa name mappings are not currently supported for remote Analysis Engines.
- See <xref linkend="ugr.tug.mvs.name_mapping_remote_services"/>.</para>
-
- <section id="ugr.tug.mvs.name_mapping_aggregate">
- <title>Name Mapping in an Aggregate Descriptor</title>
-
- <para>For each component of an Aggregate, name mapping specifies the conversion
- between component Sofa names and names at the aggregate level.</para>
-
- <para>Here's an example. Consider two Multi-View annotators to be assembled
- into an aggregate which takes an audio segment consisting of spoken English and
- produces a German text translation.</para>
-
- <para>The first annotator takes an audio segment as input Sofa and produces a text
- transcript as output Sofa. The annotator designer might choose these Sofa names to be
- <quote>AudioInput</quote> and <quote>TranscribedText</quote>.</para>
-
- <para>The second annotator is designed to translate text from English to German. This
- developer might choose the input and output Sofa names to be
- <quote>EnglishDocument</quote> and <quote>GermanDocument</quote>,
- respectively.</para>
-
- <para>In order to hook these two annotators together, the following section would be
- added to the top level of the aggregate descriptor:</para>
-
-
- <programlisting><![CDATA[<sofaMappings>
- <sofaMapping>
- <componentKey>SpeechToText</componentKey>
- <componentSofaName>AudioInput</componentSofaName>
- <aggregateSofaName>SegementedAudio</aggregateSofaName>
- </sofaMapping>
- <sofaMapping>
- <componentKey>SpeechToText</componentKey>
- <componentSofaName>TranscribedText</componentSofaName>
- <aggregateSofaName>EnglishTranscript</aggregateSofaName>
- </sofaMapping>
- <sofaMapping>
- <componentKey>EnglishToGermanTranslator</componentKey>
- <componentSofaName>EnglishDocument</componentSofaName>
- <aggregateSofaName>EnglishTranscript</aggregateSofaName>
- </sofaMapping>
- <sofaMapping>
- <componentKey>EnglishToGermanTranslator</componentKey>
- <componentSofaName>GermanDocument</componentSofaName>
- <aggregateSofaName>GermanTranslation</aggregateSofaName>
- </sofaMapping>
-</sofaMappings>]]></programlisting>
-
- <para>The Component Descriptor Editor supports Sofa name mapping in aggregates and
- simplifies the task. See <olink targetdoc="&uima_docs_tools;"
- targetptr="ugr.tools.cde.capabilities.sofa_name_mapping"/> for details.</para>
- </section>
-
- <section id="ugr.tug.mvs.name_mapping_cpe"><title>Name Mapping in a CPE
- Descriptor</title>
-
- <para>The CPE descriptor aggregates together a Collection Reader and CAS Processors
- (Annotators and CAS Consumers). Sofa mappings can be added to the following elements
- of CPE descriptors: <literal><collectionIterator></literal>,
- <literal><casInitializer></literal> and the
- <literal><casProcessor></literal>. To be consistent with the
- organization of CPE descriptors, the maps for the CPE descriptor are distributed
- among the XML markup for each of the parts (collectionIterator, casInitializer,
- casProcessor). Because of this the<literal>
- <componentKey></literal> element is not needed. Finally, rather than
- sub-elements for the parts, the XML markup for these uses attributes. See <olink
- targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings"/>.</para>
-
- <para>Here's an example. Let's use the aggregate from the previous section
- in a collection processing engine. Here we will add a Collection Reader that outputs
- audio segments in an output Sofa named <quote>nextSegment</quote>. Remember to
- declare an output Sofa nextSegment in the collection reader description.
- We'll add a CAS Consumer in the next section.</para>
-
-
- <programlisting><collectionReader>
- <collectionIterator>
- <descriptor>
- . . .
- </descriptor>
- <configurationParameterSettings>...</configurationParameterSettings>
-<emphasis role="bold"> <sofaNameMappings>
- <sofaNameMapping componentSofaName="nextSegment"
- cpeSofaName="SegementedAudio"/>
- </sofaNameMappings>
-</emphasis> </collectionIterator>
- <casInitializer/>
-<collectionReader></programlisting>
-
- <para>At this point the CAS Processor section for the aggregate does not need any Sofa
- mapping because the aggregate input Sofa has the same name,
- <quote>SegementedAudio</quote>, as is being produced by the Collection
- Reader.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.specifying_cas_view_for_single_view">
- <title>Specifying the CAS View for a Single-View Component</title>
- <titleabbrev>CAS View for Single-View Parts</titleabbrev>
-
- <para>Single-View components receive a Sofa named <quote>_InitialView</quote>, or
- a Sofa that is mapped to this name.</para>
-
- <para>For example, assume that the CAS Consumer to be used in our CPE is a Single-View
- component that expects the analysis results associated with the input CAS, and that
- we want it to use the results from the translated German text Sofa. The following
- mapping added to the CAS Processor section for the CPE will instruct the CPE to get the
- CAS view for the German text Sofa and pass it to the CAS Consumer:</para>
-
-
- <programlisting><casProcessor>
- . . .
- <emphasis role="bold"><sofaNameMappings>
- <sofaNameMapping componentSofaName="_InitialView"
- cpeSofaName="GermanTranslation"/>
- <sofaNameMappings>
-</emphasis></casProcessor></programlisting>
-
- <para id="ugr.tug.mvs.sofa_mapping_leav_out_name">An alternative syntax for
- this kind of mapping is to simply leave out the component sofa name in this
- case.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.name_mapping_application">
- <title>Name Mapping in a UIMA Application</title>
-
- <para>Applications which instantiate UIMA components directly using the
- UIMAFramework methods can also create a top level Sofa mapping using the
- <quote>additional parameters</quote> capability.</para>
-
-
- <programlisting>//create a "root" UIMA context for your whole application
-
-UimaContextAdmin rootContext =
- UIMAFramework.newUimaContext(UIMAFramework.getLogger(),
- UIMAFramework.newDefaultResourceManager(),
- UIMAFramework.newConfigurationManager());
-
-input = new XMLInputSource("test.xml");
-desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input);
-
-//setup sofa name mappings using the api
-
-HashMap sofamappings = new HashMap();
-sofamappings.put("localName1", "globalName1");
-sofamappings.put("localName2", "globalName2");
-
-//create a UIMA Context for the new AE we are about to create
-
-//first argument is unique key among all AEs used in the application
-UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap);
-
-//instantiate AE, passing the UIMA Context through the additional
-//parameters map
-
-Map additionalParams = new HashMap();
-additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);
-
-AnalysisEngine ae =
- UIMAFramework.produceAnalysisEngine(desc,additionalParams);</programlisting>
-
- <para>Sofa mappings are applied from the inside out, i.e., local to global. First, any
- aggregate mappings are applied, then any CPE mappings, and finally, any specified
- using this <quote>additional parameters</quote> capability.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.name_mapping_remote_services">
- <title>Name Mapping for Remote Services</title>
-
- <para>Currently, no client-side Sofa mapping information is passed from a UIMA client
- to a remote service. This can cause complications for UIMA services in a Multi-View
- application.</para>
-
- <para>Remote Multi-View services will work only if the service is Single-View, or if the
- Sofa names expected by the service exactly match the Sofa names produced by the client.</para>
-
- <para>If your application requires Sofa mappings for a remote Analysis Engine, you
- can wrap your remotely deployed AE in an aggregate (on the remote side), and specify
- the necessary Sofa mappings in the descriptor for that aggregate.</para>
- </section>
- </section>
-
- <section id="ugr.tug.mvs.jcas_extensions_for_multi_views">
- <title>JCas extensions for Multiple Views</title>
-
- <para>The JCas interface to the CAS can be used with any / all views, as well as the base CAS
- sent to Multi-View components. You can always get a JCas object from an existing CAS
- object by using the method getJCas(); this call will create the JCas if it doesn't
- already exist. If it does exist, it just returns the existing JCas that corresponds to
- the CAS.</para>
-
- <para>JCas implements the getView(...) method, enabling switching to other named
- views, just like the corresponding method on the CAS. The JCas version, however,
- returns JCas objects, instead of CAS objects, corresponding to the view.</para>
- </section>
-
- <section id="ugr.tug.mvs.sample_application">
- <title>Sample Multi-View Application</title>
-
- <para>The UIMA SDK contains a simple Sofa example application which demonstrates many
- Sofa specific concepts and methods. The source code for the application driver is in
- <literal>examples/src/org/apache/uima/examples/SofaExampleApplication.java</literal>
- and the Multi-View annotator is given in
- <literal>SofaExampleAnnotator.java</literal> in the same directory.</para>
-
- <para>This sample application demonstrates a language translator annotator which
- expects an input text Sofa with an English document and creates an output text Sofa
- containing a German translation. Some of the key Sofa concepts illustrated here
- include:</para>
-
- <itemizedlist spacing="compact"><listitem><para>Sofa creation.</para>
- </listitem>
-
- <listitem><para>Access of multiple CAS views.</para></listitem>
-
- <listitem><para>Unique feature structure index space for each view.</para>
- </listitem>
-
- <listitem><para>Feature structures containing cross references between
- annotations in different CAS views.</para></listitem>
-
- <listitem><para>The strong affinity of annotations with a specific Sofa. </para>
- </listitem></itemizedlist>
-
- <section id="ugr.tug.mvs.sample_application.descriptor">
- <title>Annotator Descriptor</title>
-
- <para>The annotator descriptor in
- <literal>examples/descriptors/analysis_engine/SofaExampleAnnotator.xml</literal>
- declares an input Sofa named <quote>EnglishDocument</quote> and an output Sofa
- named <quote>GermanDocument</quote>. A custom type
- <quote>CrossAnnotation</quote> is also defined:</para>
-
-
- <programlisting><![CDATA[<typeDescription>
- <name>sofa.test.CrossAnnotation</name>
- <description/>
- <supertypeName>uima.tcas.Annotation</supertypeName>
- <features>
- <featureDescription>
- <name>otherAnnotation</name>
- <description/>
- <rangeTypeName>uima.tcas.Annotation</rangeTypeName>
- </featureDescription>
- </features>
-</typeDescription>]]></programlisting>
-
- <para>The <literal>CrossAnnotation</literal> type is derived from
- <literal>uima.tcas.Annotation </literal>and includes one new feature: a
- reference to another annotation.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.sample_application.setup">
- <title>Application Setup</title>
-
- <para>The application driver instantiates an analysis engine,
- <literal>seAnnotator</literal>, from the annotator descriptor, obtains a new
- base CAS using that engine's CAS definition, and creates the expected input
- Sofa using:</para>
-
-
- <programlisting>CAS cas = seAnnotator.newCAS();
-CAS aView = cas.createView("EnglishDocument");</programlisting>
-
- <para>Since <literal>seAnnotator</literal> is a primitive component, and no Sofa
- mapping has been defined, the SofaID will be <quote>EnglishDocument</quote>.
- Local Sofa data is set using:</para>
-
-
- <programlisting>aView.setDocumentText("this beer is good");</programlisting>
-
- <para>At this point the CAS contains all necessary inputs for the translation
- annotator and its process method is called.</para>
-
- </section>
-
- <section id="ugr.tug.mvs.sample_application.annotator_processing">
- <title>Annotator Processing</title>
-
- <para>Annotator processing consists of parsing the English document into individual
- words, doing word-by-word translation and concatenating the translations into a
- German translation. Analysis metadata on the English Sofa will be an annotation for
- each English word. Analysis metadata on the German Sofa will be a
- <literal>CrossAnnotation</literal> for each German word, where the
- <literal>otherAnnotation</literal> feature will be a reference to the associated
- English annotation.</para>
-
- <para>Code of interest includes two CAS views:</para>
-
-
- <programlisting>// get View of the English text Sofa
-englishView = aCas.getView("EnglishDocument");
-
-// Create the output German text Sofa
-germanView = aCas.createView("GermanDocument");</programlisting>
-
- <para>the indexing of annotations with the appropriate view:</para>
-
-
- <programlisting>englishView.addFsToIndexes(engAnnot);
-. . .
-germanView.addFsToIndexes(germAnnot);</programlisting>
-
- <para>and the combining of metadata belonging to different Sofas in the same feature
- structure:</para>
-
-
- <programlisting>// add link to English text
-germAnnot.setFeatureValue(other, engAnnot);</programlisting>
-
- </section>
-
- <section id="ugr.tug.mvs.sample_application.accessing_results">
- <title>Accessing the results of analysis</title>
-
- <para>The application needs to get the results of analysis, which may be in different
- views. Analysis results for each Sofa are dumped independently by iterating over all
- annotations for each associated CAS view. For the English Sofa:</para>
-
-
- <programlisting>//get annotation iterator for this CAS
-FSIndex anIndex = aView.getAnnotationIndex();
-FSIterator anIter = anIndex.iterator();
-while (anIter.isValid()) {
- AnnotationFS annot = (AnnotationFS) anIter.get();
- System.out.println(" " + annot.getType().getName()
- + ": " + annot.getCoveredText());
- anIter.moveToNext();
-}</programlisting>
-
- <para>Iterating over all German annotations looks the same, except for the
- following:</para>
-
-
- <programlisting>if (annot.getType() == cross) {
- AnnotationFS crossAnnot =
- (AnnotationFS) annot.getFeatureValue(other);
- System.out.println(" other annotation feature: "
- + crossAnnot.getCoveredText());
-}</programlisting>
-
- <para>Of particular interest here is the built-in Annotation type method
- <literal>getCoveredText()</literal>. This method uses the
- <quote>begin</quote> and <quote>end</quote> features of the annotation to create
- a substring from the CAS document. The SofaRef feature of the annotation is used to
- identify the correct Sofa's data from which to create the substring.</para>
-
- <para>The example program output is:</para>
-
-
- <programlisting>---Printing all annotations for English Sofa---
-uima.tcas.DocumentAnnotation: this beer is good
-uima.tcas.Annotation: this
-uima.tcas.Annotation: beer
-uima.tcas.Annotation: is
-uima.tcas.Annotation: good
-
----Printing all annotations for German Sofa---
-uima.tcas.DocumentAnnotation: das bier ist gut
-sofa.test.CrossAnnotation: das
- other annotation feature: this
-sofa.test.CrossAnnotation: bier
- other annotation feature: beer
-sofa.test.CrossAnnotation: ist
- other annotation feature: is
-sofa.test.CrossAnnotation: gut
- other annotation feature: good</programlisting>
-
- </section>
- </section>
-
- <section id="ugr.tug.mvs.views_api_summary">
- <title>Views API Summary</title>
-
- <para>The recommended way to deliver a particular CAS view to a <emphasis role="bold-italic">Single-View</emphasis> component is to use by Sofa-mapping in
- the CPE and/or aggregate descriptors.</para>
-
- <para>For <emphasis role="bold-italic">Multi-View </emphasis> components or
- applications, the following methods are used to create or get a reference to a CAS view
- for a particular Sofa:</para>
-
- <para>Creating a new View:</para>
-
-
- <programlisting>JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping);
-CAS newView = aCAS .createView(String localNameOfTheViewBeforeMapping);</programlisting>
-
- <para>Getting a View from a CAS or JCas:</para>
-
-
- <programlisting><?db-font-size 80% ?>JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping);
-CAS myView = aCAS .getView(String localNameOfTheViewBeforeMapping);
-Iterator allViews = aCasOrJCas.getViewIterator();
-Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);</programlisting>
-
- <para>The following methods are useful for all annotators and applications:</para>
-
- <para>Setting Sofa data for a CAS or JCas:</para>
-
-
- <programlisting>aCasOrJCas.setDocumentText(String docText);
-aCasOrJCas.setSofaDataString(String docText, String mimeType);
-aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType);
-aCasOrJCas.setSofaDataURI(String uri, String mimeType);</programlisting>
-
- <para>Getting Sofa data for a particular CAS or JCas:</para>
-
-
- <programlisting>String doc = aCasOrJCas.getDocumentText();
-String doc = aCasOrJCas.getSofaDataString();
-FeatureStructure array = aCasOrJCas.getSofaDataArray();
-String uri = aCasOrJCas.getSofaDataURI();
-InputStream is = aCasOrJCas.getSofaDataStream();</programlisting>
-
- </section>
-
- <section id="ugr.tug.mvs.sofa_incompatibilities_v1_v2">
- <title>Sofa Incompatibilities between UIMA version 1 and version 2</title>
- <titleabbrev>Sofa Incompatibilities: V1 and V2</titleabbrev>
-
- <para>A major change in version 2 is related to the support of Single-View components
- and applications. Given an analysis engine, <literal>ae</literal>, the API
-
- <programlisting>CAS cas = ae.newCas();</programlisting>
- used to return the base CAS. Now it returns a view of the Sofa named
- <quote>_InitialView</quote>. This Sofa will actually only be created if any Sofa data
- is set for this view. The initial view is used for Single-View applications and
- Multi-View annotators with no Sofa mapping.</para>
-
- <para>The process method of Multi-View annotators receive the base CAS, however the base
- CAS no longer has an index repository to hold <quote>global</quote> data. Global data
- needs to be put in a specific named CAS view of your choice.</para>
-
- <para>Because of these changes, the following scenarios will break with v2.0 clients:
-
- <itemizedlist spacing="compact"><listitem><para>Any version 1.x services (you
- must migrate the services to version 2).</para></listitem>
-
- <listitem><para>Applications or components explicitly referencing
- <quote>_DefaultTextSofaName</quote> in code or descriptors.</para>
- </listitem>
-
- <listitem><para>Multi-View applications using the Base CAS index repository.
- </para></listitem></itemizedlist></para>
- </section>
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../entities.ent">
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.mvs">
+ <title>Multiple CAS Views of an Artifact</title>
+ <titleabbrev>Multiple CAS Views</titleabbrev>
+
+ <para>UIMA provides an extension to the basic model of the CAS which supports analysis of
+ multiple views of the same artifact, all contained with the CAS. This chapter describes
+ the concepts, terminology, and the API and XML extensions that enable this.</para>
+
+ <para>Multiple CAS Views can simplify things when different versions of the artifact are
+ needed at different stages of the analysis. They are also key to enabling multimodal
+ analysis where the initial artifact is transformed from one modality to another, or where
+ the artifact itself is multimodal, such as the audio, video and closed-captioned text
+ associated with an MPEG object. Each representation of the artifact can be analyzed
+ independently with the standard UIMA programming model; in addition, multi-view
+ components and applications can be constructed.</para>
+
+ <para>UIMA supports this by augmenting the CAS with additional light-weight CAS objects,
+ one for each view, where these objects share most of the same underlying CAS, except for two
+ things: each view has its own set of indexed Feature Structures, and each view has its own
+ subject of analysis (Sofa) - its own version of the artifact being analyzed. The Feature
+ Structure instances themselves are in the shared part of the CAS; only the entries in the
+ indexes are unique for each CAS view.</para>
+
+ <para>All of these CAS view objects are kept together with the CAS, and passed as a unit
+ between components in a UIMA application. APIs exist which allow components and
+ applications to switch among the various view objects, as needed.</para>
+
+ <para>Feature Structures may be indexed in multiple views, if necessary. New methods on CAS
+ Views facilitate adding or removing Feature Structures to or from their index
+ repositories:</para>
+
+
+ <programlisting>aView.addFsToIndexes(aFeatureStructure)
+aView.removeFsFromIndexes(aFeatureStructure)</programlisting>
+
+ <para>specify the view in which this Feature Structure should be added to or removed from the
+ indexes.</para>
+
+ <section id="ugr.tug.mvs.cas_views_and_sofas">
+ <title>CAS Views and Sofas</title>
+
+ <para>Sofas (see <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas.sofa"/>) and CAS Views are linked. In this implementation,
+ every CAS view has one associated Sofa, and every Sofa has one associated CAS
+ View.</para>
+
+ <section id="ugr.tug.mvs.naming_views_sofas">
+ <title>Naming CAS Views and Sofas</title>
+
+ <para>The developer assigns a name to the View / Sofa, which is a simple string
+ (following the rules for Java identifiers, usually without periods, but see special
+ exception below). These names are declared in the component XML metadata, and are
+ used during assembly and by the runtime to enable switching among multiple Views of
+ the CAS at the same time.</para>
+ <note><para>The name is called the Sofa name, for historical reasons, but it applies
+ equally to the View. In the rest of this chapter, we'll refer to it as the Sofa
+ name.</para></note>
+
+ <para>Some applications contain components that expect a variable number of Sofas as
+ input or output. An example of a component that takes a variable number of input Sofas
+ could be one that takes several translations of a document and merges them, where each
+ translation was in a separate Sofa. </para>
+
+ <para> You can specify a variable number of input or output sofa names, where each name
+ has the same base part, by writing the base part of the name (with no periods), followed
+ by a period character and an asterisk character (.*). These denote sofas that have
+ names matching the base part up to the period; for example, names such as
+ <literal>base_name_part.TTX_3d</literal> would match a specification of
+ <literal>base_name_part.*</literal>.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.multi_view_and_single_view">
+ <title>Multi-View, Single-View components & applications</title>
+ <titleabbrev>Multi/Single View parts in Applications</titleabbrev>
+
+ <para>Components and applications can be written to be Multi-View or Single-View.
+ Most components used as primitive building blocks are expected to be Single-View.
+ UIMA provides capabilities to combine these kinds of components with Multi-View
+ components when assembling analysis aggregates or applications.</para>
+
+ <para>Single-View components and applications use only one subject of analysis, and
+ one CAS View. The code and descriptors for these components do not use the facilities
+ described in this chapter.</para>
+
+ <para>Conversely, Multi-View components and applications are aware of the
+ possibility of multiple Views and Sofas, and have code and XML descriptors that
+ create and manipulate them.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.multi_view_components">
+ <title>Multi-View Components</title>
+ <section id="ugr.tug.mvs.deciding_multi_view">
+ <title>How UIMA decides if a component is Multi-View</title>
+ <titleabbrev>Deciding: Multi-View</titleabbrev>
+
+ <para>Every UIMA component has an associated XML Component Descriptor. Multi-View
+ components are identified simply as those whose descriptors declare one or more Sofa
+ names in their Capability sections, as inputs or outputs. If a Component Descriptor
+ does not mention any input or output Sofa names, the framework treats that component
+ as a Single-View component.</para>
+
+ <para>A Multi-View component is passed a special kind of a CAS object, called a base CAS,
+ which it must use to switch to the particular view it wishes to process. The base CAS
+ object itself has no Sofa and no ability to use Indexes; only the views have that
+ capability.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.additional_capabilities">
+ <title>Multi-View: additional capabilities</title>
+
+ <para>Additional capabilities provided for components and applications aware of the
+ possibilities of multiple Views and Sofas include:</para>
+
+ <itemizedlist spacing="compact"><listitem><para>Creating new Views, and for
+ each, setting up the associated Sofa data</para></listitem>
+
+ <listitem><para>Getting a reference to an existing View and its associated Sofa, by
+ name </para></listitem>
+
+ <listitem><para>Specifying a view in which to index a particular Feature Structure
+ instance </para></listitem></itemizedlist>
+
+ </section>
+
+ <section id="ugr.tug.mvs.component_xml_metadata">
+ <title>Component XML metadata</title>
+
+ <para>Each Multi-View component that creates a Sofa or wants to switch to a specific
+ previously created Sofa must declare the name for the Sofa in the capabilities
+ section. For example, a component expecting as input a web document in html format and
+ creating a plain text document for further processing might declare:</para>
+
+
+ <programlisting><capabilities>
+ <capability>
+ <inputs/>
+ <outputs/>
+ <inputSofas>
+<emphasis role="bold"> <sofaName>rawContent</sofaName></emphasis>
+ </inputSofas>
+ <outputSofas>
+<emphasis role="bold"> <sofaName>detagContent</sofaName></emphasis>
+ </outputSofas>
+ </capability>
+</capabilities></programlisting>
+
+ <para>Details on this specification are found in <olink
+ targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor"/>. The Component Descriptor
+ Editor supports Sofa declarations on the <olink targetdoc="&uima_docs_tools;"
+ targetptr="ugr.tools.cde.capabilities"/>.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.sofa_capabilities_and_apis_for_apps">
+ <title>Sofa Capabilities and APIs for Applications</title>
+ <titleabbrev>Sofa Capabilities & APIs for Apps</titleabbrev>
+
+ <para>In addition to components, applications can make use of these capabilities. When
+ an application creates a new CAS, it also creates the initial view of that CAS - and this
+ view is the object that is returned from the create call. Additional views beyond this
+ first one can be dynamically created at any time. The application can use the Sofa APIs
+ described in <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas"/> to specify the data to be analyzed.</para>
+
+ <para>If an Application creates a new CAS, the initial CAS that is created will be a view
+ named <quote>_InitialView</quote>. This name can be used in the application and in
+ Sofa Mapping (see the next section) to refer to this otherwise unnamed view.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sofa_name_mapping">
+ <title>Sofa Name Mapping</title>
+
+ <para>Sofa Name mapping is the mechanism which enables UIMA component developers to
+ choose locally meaningful Sofa names in their source code and let aggregate,
+ collection processing engine developers, and application developers connect output
+ Sofas created in one component to input Sofas required in another.</para>
+
+ <para>At a given aggregation level, the assembler or application developer defines
+ names for all the Sofas, and then specifies how these names map to the contained
+ components, using the Sofa Map.</para>
+
+ <para>Consider annotator code to create a new CAS view:</para>
+
+
+ <programlisting>CAS viewX = cas.createView("X");</programlisting>
+
+ <para>Or code to get an existing CAS view:</para>
+
+ <programlisting>CAS viewX = cas.getView("X");</programlisting>
+
+ <para>Without Sofa name mapping the SofaID for the new Sofa will be <quote>X</quote>.
+ However, if a name mapping for <quote>X</quote> has been specified by the aggregate or
+ CPE calling this annotator, the actual SofaID in the CAS can be different.</para>
+
+ <para>All Sofas in a CAS must have unique names. This is accomplished by mapping all
+ declared Sofas as described in the following sections. An attempt to create a Sofa with a
+ SofaID already in use will throw an exception.</para>
+
+ <para>Sofa name mapping must not use the <quote>.</quote> (period) character. Runtime Sofa
+ mapping maps names up to the <quote>.</quote> and appends the period and the following
+ characters to the mapped name.</para>
+
+ <para>To get a Java Iterator for all the views in a CAS:</para>
+
+ <programlisting>Iterator allViews = cas.getViewIterator();</programlisting>
+
+ <para>To get a Java Iterator for selected views in a CAS, for example, views whose name
+ is either exactly equal to namePrefix or is of the form namePrefix.suffix, where suffix
+ can be any String:</para>
+
+ <programlisting>Iterator someViews = cas.getViewIterator(String namePrefix);</programlisting>
+
+ <note><para>Sofa name mapping is applied to namePrefix.</para></note>
+
+ <para>Sofa name mappings are not currently supported for remote Analysis Engines.
+ See <xref linkend="ugr.tug.mvs.name_mapping_remote_services"/>.</para>
+
+ <section id="ugr.tug.mvs.name_mapping_aggregate">
+ <title>Name Mapping in an Aggregate Descriptor</title>
+
+ <para>For each component of an Aggregate, name mapping specifies the conversion
+ between component Sofa names and names at the aggregate level.</para>
+
+ <para>Here's an example. Consider two Multi-View annotators to be assembled
+ into an aggregate which takes an audio segment consisting of spoken English and
+ produces a German text translation.</para>
+
+ <para>The first annotator takes an audio segment as input Sofa and produces a text
+ transcript as output Sofa. The annotator designer might choose these Sofa names to be
+ <quote>AudioInput</quote> and <quote>TranscribedText</quote>.</para>
+
+ <para>The second annotator is designed to translate text from English to German. This
+ developer might choose the input and output Sofa names to be
+ <quote>EnglishDocument</quote> and <quote>GermanDocument</quote>,
+ respectively.</para>
+
+ <para>In order to hook these two annotators together, the following section would be
+ added to the top level of the aggregate descriptor:</para>
+
+
+ <programlisting><![CDATA[<sofaMappings>
+ <sofaMapping>
+ <componentKey>SpeechToText</componentKey>
+ <componentSofaName>AudioInput</componentSofaName>
+ <aggregateSofaName>SegementedAudio</aggregateSofaName>
+ </sofaMapping>
+ <sofaMapping>
+ <componentKey>SpeechToText</componentKey>
+ <componentSofaName>TranscribedText</componentSofaName>
+ <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+ </sofaMapping>
+ <sofaMapping>
+ <componentKey>EnglishToGermanTranslator</componentKey>
+ <componentSofaName>EnglishDocument</componentSofaName>
+ <aggregateSofaName>EnglishTranscript</aggregateSofaName>
+ </sofaMapping>
+ <sofaMapping>
+ <componentKey>EnglishToGermanTranslator</componentKey>
+ <componentSofaName>GermanDocument</componentSofaName>
+ <aggregateSofaName>GermanTranslation</aggregateSofaName>
+ </sofaMapping>
+</sofaMappings>]]></programlisting>
+
+ <para>The Component Descriptor Editor supports Sofa name mapping in aggregates and
+ simplifies the task. See <olink targetdoc="&uima_docs_tools;"
+ targetptr="ugr.tools.cde.capabilities.sofa_name_mapping"/> for details.</para>
+ </section>
+
+ <section id="ugr.tug.mvs.name_mapping_cpe"><title>Name Mapping in a CPE
+ Descriptor</title>
+
+ <para>The CPE descriptor aggregates together a Collection Reader and CAS Processors
+ (Annotators and CAS Consumers). Sofa mappings can be added to the following elements
+ of CPE descriptors: <literal><collectionIterator></literal>,
+ <literal><casInitializer></literal> and the
+ <literal><casProcessor></literal>. To be consistent with the
+ organization of CPE descriptors, the maps for the CPE descriptor are distributed
+ among the XML markup for each of the parts (collectionIterator, casInitializer,
+ casProcessor). Because of this the<literal>
+ <componentKey></literal> element is not needed. Finally, rather than
+ sub-elements for the parts, the XML markup for these uses attributes. See <olink
+ targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual.sofa_name_mappings"/>.</para>
+
+ <para>Here's an example. Let's use the aggregate from the previous section
+ in a collection processing engine. Here we will add a Collection Reader that outputs
+ audio segments in an output Sofa named <quote>nextSegment</quote>. Remember to
+ declare an output Sofa nextSegment in the collection reader description.
+ We'll add a CAS Consumer in the next section.</para>
+
+
+ <programlisting><collectionReader>
+ <collectionIterator>
+ <descriptor>
+ . . .
+ </descriptor>
+ <configurationParameterSettings>...</configurationParameterSettings>
+<emphasis role="bold"> <sofaNameMappings>
+ <sofaNameMapping componentSofaName="nextSegment"
+ cpeSofaName="SegementedAudio"/>
+ </sofaNameMappings>
+</emphasis> </collectionIterator>
+ <casInitializer/>
+<collectionReader></programlisting>
+
+ <para>At this point the CAS Processor section for the aggregate does not need any Sofa
+ mapping because the aggregate input Sofa has the same name,
+ <quote>SegementedAudio</quote>, as is being produced by the Collection
+ Reader.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.specifying_cas_view_for_single_view">
+ <title>Specifying the CAS View for a Single-View Component</title>
+ <titleabbrev>CAS View for Single-View Parts</titleabbrev>
+
+ <para>Single-View components receive a Sofa named <quote>_InitialView</quote>, or
+ a Sofa that is mapped to this name.</para>
+
+ <para>For example, assume that the CAS Consumer to be used in our CPE is a Single-View
+ component that expects the analysis results associated with the input CAS, and that
+ we want it to use the results from the translated German text Sofa. The following
+ mapping added to the CAS Processor section for the CPE will instruct the CPE to get the
+ CAS view for the German text Sofa and pass it to the CAS Consumer:</para>
+
+
+ <programlisting><casProcessor>
+ . . .
+ <emphasis role="bold"><sofaNameMappings>
+ <sofaNameMapping componentSofaName="_InitialView"
+ cpeSofaName="GermanTranslation"/>
+ <sofaNameMappings>
+</emphasis></casProcessor></programlisting>
+
+ <para id="ugr.tug.mvs.sofa_mapping_leav_out_name">An alternative syntax for
+ this kind of mapping is to simply leave out the component sofa name in this
+ case.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.name_mapping_application">
+ <title>Name Mapping in a UIMA Application</title>
+
+ <para>Applications which instantiate UIMA components directly using the
+ UIMAFramework methods can also create a top level Sofa mapping using the
+ <quote>additional parameters</quote> capability.</para>
+
+
+ <programlisting>//create a "root" UIMA context for your whole application
+
+UimaContextAdmin rootContext =
+ UIMAFramework.newUimaContext(UIMAFramework.getLogger(),
+ UIMAFramework.newDefaultResourceManager(),
+ UIMAFramework.newConfigurationManager());
+
+input = new XMLInputSource("test.xml");
+desc = UIMAFramework.getXMLParser().parseAnalysisEngineDescription(input);
+
+//setup sofa name mappings using the api
+
+HashMap sofamappings = new HashMap();
+sofamappings.put("localName1", "globalName1");
+sofamappings.put("localName2", "globalName2");
+
+//create a UIMA Context for the new AE we are about to create
+
+//first argument is unique key among all AEs used in the application
+UimaContextAdmin childContext = rootContext.createChild("myAE", sofamap);
+
+//instantiate AE, passing the UIMA Context through the additional
+//parameters map
+
+Map additionalParams = new HashMap();
+additionalParams.put(Resource.PARAM_UIMA_CONTEXT, childContext);
+
+AnalysisEngine ae =
+ UIMAFramework.produceAnalysisEngine(desc,additionalParams);</programlisting>
+
+ <para>Sofa mappings are applied from the inside out, i.e., local to global. First, any
+ aggregate mappings are applied, then any CPE mappings, and finally, any specified
+ using this <quote>additional parameters</quote> capability.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.name_mapping_remote_services">
+ <title>Name Mapping for Remote Services</title>
+
+ <para>Currently, no client-side Sofa mapping information is passed from a UIMA client
+ to a remote service. This can cause complications for UIMA services in a Multi-View
+ application.</para>
+
+ <para>Remote Multi-View services will work only if the service is Single-View, or if the
+ Sofa names expected by the service exactly match the Sofa names produced by the client.</para>
+
+ <para>If your application requires Sofa mappings for a remote Analysis Engine, you
+ can wrap your remotely deployed AE in an aggregate (on the remote side), and specify
+ the necessary Sofa mappings in the descriptor for that aggregate.</para>
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.jcas_extensions_for_multi_views">
+ <title>JCas extensions for Multiple Views</title>
+
+ <para>The JCas interface to the CAS can be used with any / all views, as well as the base CAS
+ sent to Multi-View components. You can always get a JCas object from an existing CAS
+ object by using the method getJCas(); this call will create the JCas if it doesn't
+ already exist. If it does exist, it just returns the existing JCas that corresponds to
+ the CAS.</para>
+
+ <para>JCas implements the getView(...) method, enabling switching to other named
+ views, just like the corresponding method on the CAS. The JCas version, however,
+ returns JCas objects, instead of CAS objects, corresponding to the view.</para>
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application">
+ <title>Sample Multi-View Application</title>
+
+ <para>The UIMA SDK contains a simple Sofa example application which demonstrates many
+ Sofa specific concepts and methods. The source code for the application driver is in
+ <literal>examples/src/org/apache/uima/examples/SofaExampleApplication.java</literal>
+ and the Multi-View annotator is given in
+ <literal>SofaExampleAnnotator.java</literal> in the same directory.</para>
+
+ <para>This sample application demonstrates a language translator annotator which
+ expects an input text Sofa with an English document and creates an output text Sofa
+ containing a German translation. Some of the key Sofa concepts illustrated here
+ include:</para>
+
+ <itemizedlist spacing="compact"><listitem><para>Sofa creation.</para>
+ </listitem>
+
+ <listitem><para>Access of multiple CAS views.</para></listitem>
+
+ <listitem><para>Unique feature structure index space for each view.</para>
+ </listitem>
+
+ <listitem><para>Feature structures containing cross references between
+ annotations in different CAS views.</para></listitem>
+
+ <listitem><para>The strong affinity of annotations with a specific Sofa. </para>
+ </listitem></itemizedlist>
+
+ <section id="ugr.tug.mvs.sample_application.descriptor">
+ <title>Annotator Descriptor</title>
+
+ <para>The annotator descriptor in
+ <literal>examples/descriptors/analysis_engine/SofaExampleAnnotator.xml</literal>
+ declares an input Sofa named <quote>EnglishDocument</quote> and an output Sofa
+ named <quote>GermanDocument</quote>. A custom type
+ <quote>CrossAnnotation</quote> is also defined:</para>
+
+
+ <programlisting><![CDATA[<typeDescription>
+ <name>sofa.test.CrossAnnotation</name>
+ <description/>
+ <supertypeName>uima.tcas.Annotation</supertypeName>
+ <features>
+ <featureDescription>
+ <name>otherAnnotation</name>
+ <description/>
+ <rangeTypeName>uima.tcas.Annotation</rangeTypeName>
+ </featureDescription>
+ </features>
+</typeDescription>]]></programlisting>
+
+ <para>The <literal>CrossAnnotation</literal> type is derived from
+ <literal>uima.tcas.Annotation </literal>and includes one new feature: a
+ reference to another annotation.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application.setup">
+ <title>Application Setup</title>
+
+ <para>The application driver instantiates an analysis engine,
+ <literal>seAnnotator</literal>, from the annotator descriptor, obtains a new
+ base CAS using that engine's CAS definition, and creates the expected input
+ Sofa using:</para>
+
+
+ <programlisting>CAS cas = seAnnotator.newCAS();
+CAS aView = cas.createView("EnglishDocument");</programlisting>
+
+ <para>Since <literal>seAnnotator</literal> is a primitive component, and no Sofa
+ mapping has been defined, the SofaID will be <quote>EnglishDocument</quote>.
+ Local Sofa data is set using:</para>
+
+
+ <programlisting>aView.setDocumentText("this beer is good");</programlisting>
+
+ <para>At this point the CAS contains all necessary inputs for the translation
+ annotator and its process method is called.</para>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application.annotator_processing">
+ <title>Annotator Processing</title>
+
+ <para>Annotator processing consists of parsing the English document into individual
+ words, doing word-by-word translation and concatenating the translations into a
+ German translation. Analysis metadata on the English Sofa will be an annotation for
+ each English word. Analysis metadata on the German Sofa will be a
+ <literal>CrossAnnotation</literal> for each German word, where the
+ <literal>otherAnnotation</literal> feature will be a reference to the associated
+ English annotation.</para>
+
+ <para>Code of interest includes two CAS views:</para>
+
+
+ <programlisting>// get View of the English text Sofa
+englishView = aCas.getView("EnglishDocument");
+
+// Create the output German text Sofa
+germanView = aCas.createView("GermanDocument");</programlisting>
+
+ <para>the indexing of annotations with the appropriate view:</para>
+
+
+ <programlisting>englishView.addFsToIndexes(engAnnot);
+. . .
+germanView.addFsToIndexes(germAnnot);</programlisting>
+
+ <para>and the combining of metadata belonging to different Sofas in the same feature
+ structure:</para>
+
+
+ <programlisting>// add link to English text
+germAnnot.setFeatureValue(other, engAnnot);</programlisting>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sample_application.accessing_results">
+ <title>Accessing the results of analysis</title>
+
+ <para>The application needs to get the results of analysis, which may be in different
+ views. Analysis results for each Sofa are dumped independently by iterating over all
+ annotations for each associated CAS view. For the English Sofa:</para>
+
+
+ <programlisting>//get annotation iterator for this CAS
+FSIndex anIndex = aView.getAnnotationIndex();
+FSIterator anIter = anIndex.iterator();
+while (anIter.isValid()) {
+ AnnotationFS annot = (AnnotationFS) anIter.get();
+ System.out.println(" " + annot.getType().getName()
+ + ": " + annot.getCoveredText());
+ anIter.moveToNext();
+}</programlisting>
+
+ <para>Iterating over all German annotations looks the same, except for the
+ following:</para>
+
+
+ <programlisting>if (annot.getType() == cross) {
+ AnnotationFS crossAnnot =
+ (AnnotationFS) annot.getFeatureValue(other);
+ System.out.println(" other annotation feature: "
+ + crossAnnot.getCoveredText());
+}</programlisting>
+
+ <para>Of particular interest here is the built-in Annotation type method
+ <literal>getCoveredText()</literal>. This method uses the
+ <quote>begin</quote> and <quote>end</quote> features of the annotation to create
+ a substring from the CAS document. The SofaRef feature of the annotation is used to
+ identify the correct Sofa's data from which to create the substring.</para>
+
+ <para>The example program output is:</para>
+
+
+ <programlisting>---Printing all annotations for English Sofa---
+uima.tcas.DocumentAnnotation: this beer is good
+uima.tcas.Annotation: this
+uima.tcas.Annotation: beer
+uima.tcas.Annotation: is
+uima.tcas.Annotation: good
+
+---Printing all annotations for German Sofa---
+uima.tcas.DocumentAnnotation: das bier ist gut
+sofa.test.CrossAnnotation: das
+ other annotation feature: this
+sofa.test.CrossAnnotation: bier
+ other annotation feature: beer
+sofa.test.CrossAnnotation: ist
+ other annotation feature: is
+sofa.test.CrossAnnotation: gut
+ other annotation feature: good</programlisting>
+
+ </section>
+ </section>
+
+ <section id="ugr.tug.mvs.views_api_summary">
+ <title>Views API Summary</title>
+
+ <para>The recommended way to deliver a particular CAS view to a <emphasis role="bold-italic">Single-View</emphasis> component is to use by Sofa-mapping in
+ the CPE and/or aggregate descriptors.</para>
+
+ <para>For <emphasis role="bold-italic">Multi-View </emphasis> components or
+ applications, the following methods are used to create or get a reference to a CAS view
+ for a particular Sofa:</para>
+
+ <para>Creating a new View:</para>
+
+
+ <programlisting>JCas newView = aJCas.createView(String localNameOfTheViewBeforeMapping);
+CAS newView = aCAS .createView(String localNameOfTheViewBeforeMapping);</programlisting>
+
+ <para>Getting a View from a CAS or JCas:</para>
+
+
+ <programlisting><?db-font-size 80% ?>JCas myView = aJCas.getView(String localNameOfTheViewBeforeMapping);
+CAS myView = aCAS .getView(String localNameOfTheViewBeforeMapping);
+Iterator allViews = aCasOrJCas.getViewIterator();
+Iterator someViews = aCasOrJCas.getViewIterator(String localViewNamePrefix);</programlisting>
+
+ <para>The following methods are useful for all annotators and applications:</para>
+
+ <para>Setting Sofa data for a CAS or JCas:</para>
+
+
+ <programlisting>aCasOrJCas.setDocumentText(String docText);
+aCasOrJCas.setSofaDataString(String docText, String mimeType);
+aCasOrJCas.setSofaDataArray(FeatureStructure array, String mimeType);
+aCasOrJCas.setSofaDataURI(String uri, String mimeType);</programlisting>
+
+ <para>Getting Sofa data for a particular CAS or JCas:</para>
+
+
+ <programlisting>String doc = aCasOrJCas.getDocumentText();
+String doc = aCasOrJCas.getSofaDataString();
+FeatureStructure array = aCasOrJCas.getSofaDataArray();
+String uri = aCasOrJCas.getSofaDataURI();
+InputStream is = aCasOrJCas.getSofaDataStream();</programlisting>
+
+ </section>
+
+ <section id="ugr.tug.mvs.sofa_incompatibilities_v1_v2">
+ <title>Sofa Incompatibilities between UIMA version 1 and version 2</title>
+ <titleabbrev>Sofa Incompatibilities: V1 and V2</titleabbrev>
+
+ <para>A major change in version 2 is related to the support of Single-View components
+ and applications. Given an analysis engine, <literal>ae</literal>, the API
+
+ <programlisting>CAS cas = ae.newCas();</programlisting>
+ used to return the base CAS. Now it returns a view of the Sofa named
+ <quote>_InitialView</quote>. This Sofa will actually only be created if any Sofa data
+ is set for this view. The initial view is used for Single-View applications and
+ Multi-View annotators with no Sofa mapping.</para>
+
+ <para>The process method of Multi-View annotators receive the base CAS, however the base
+ CAS no longer has an index repository to hold <quote>global</quote> data. Global data
+ needs to be put in a specific named CAS view of your choice.</para>
+
+ <para>Because of these changes, the following scenarios will break with v2.0 clients:
+
+ <itemizedlist spacing="compact"><listitem><para>Any version 1.x services (you
+ must migrate the services to version 2).</para></listitem>
+
+ <listitem><para>Applications or components explicitly referencing
+ <quote>_DefaultTextSofaName</quote> in code or descriptors.</para>
+ </listitem>
+
+ <listitem><para>Multi-View applications using the Base CAS index repository.
+ </para></listitem></itemizedlist></para>
+ </section>
</chapter>
\ No newline at end of file
Propchange: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.multi_views.xml
------------------------------------------------------------------------------
svn:eol-style = native
Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.xmi_emf.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.xmi_emf.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.xmi_emf.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.xmi_emf.xml Thu Aug 28 14:28:14 2008
@@ -1,153 +1,153 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY % uimaents SYSTEM "../entities.ent">
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.tug.xmi_emf">
- <title>XMI and EMF Interoperability</title>
- <titleabbrev>XMI & EMF</titleabbrev>
-
- <section id="ugr.tug.xmi_emf.overview">
- <title>Overview</title>
-
- <para>In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph.
- There are established standards in this area
- – specifically, <trademark class="registered">UML</trademark> is an <trademark class="trade">
- OMG</trademark> standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML
- representation of object graphs.</para>
-
- <para>Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based
- application development, and it is based on UML and XMI. In EMF, you define class models using a metamodel called
- Ecore, which is similar to UML. EMF provides tools for converting a UML model to Ecore. EMF can then generate Java
- classes from your model, and supports persistence of those classes in the XMI format.</para>
-
- <para>The UIMA SDK provides tools for interoperability with XMI and EMF. These tools allow conversions of UIMA
- Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format. This provides a
- number of advantages, including:</para>
-
- <blockquote>
- <para>You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically
- convert it to a UIMA Type System.</para>
-
- <para>You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it
- produces to XMI. This data is now in a form where it can easily be ingested by an EMF-based application.</para>
- </blockquote>
-
- <para>More generally, we are adopting the well-documented, open standard XMI as the standard way to represent
- UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard
- enables other applications to more easily produce or consume these UIMA analysis results.</para>
-
- <para>For more information on XMI, see Grose et al. <emphasis>Mastering XMI. Java Programming with XMI, XML, and
- UML.</emphasis> John Wiley & Sons, Inc. 2002.</para>
-
- <para>For more information on EMF, see Budinsky et al. <emphasis>Eclipse Modeling Framework 2.0.</emphasis>
- Addison-Wesley. 2006.</para>
-
- <para>For details of how the UIMA CAS is represented in XMI format, see <olink targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.xmi"/> .</para>
-
- </section>
-
- <section id="ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_system">
- <title>Converting an Ecore Model to or from a UIMA Type System</title>
-
- <para>The UIMA SDK provides the following two classes:</para>
-
- <para><emphasis role="bold"><literal>Ecore2UimaTypeSystem:</literal>
- </emphasis> converts from an .ecore model developed using EMF to a UIMA-compliant
- TypeSystem descriptor. This is a Java class that can be run as a standalone program or
- invoked from another Java application. To run as a standalone program,
- execute:</para>
-
- <para><command>java org.apache.uima.ecore.Ecore2UimaTypeSystem <ecore
- file> <output file></command></para>
-
- <para>The input .ecore file will be converted to a UIMA TypeSystem descriptor and written
- to the specified output file. You can then use the resulting TypeSystem descriptor in
- your UIMA application.</para>
-
- <para><emphasis role="bold"><literal>UimaTypeSystem2Ecore:</literal>
- </emphasis> converts from a UIMA TypeSystem descriptor to an .ecore model. This is a
- Java class that can be run as a standalone program or invoked from another Java
- application. To run as a standalone program, execute:</para>
-
- <para><command>java org.apache.uima.ecore.UimaTypeSystem2Ecore
- <TypeSystem descriptor> <output file></command></para>
-
- <para>The input UIMA TypeSystem descriptor will be converted to an Ecore model file and
- written to the specified output file. You can then use the resulting Ecore model in EMF
- applications. The converted type system will include any
- <literal><import...></literal>ed TypeSystems; the fact that they were
- imported is currently not preserved.</para>
-
- <para>To run either of these converters, your classpath will need to include the UIMA jar
- files as well as the following jar files from the EMF distribution: common.jar,
- ecore.jar, and ecore.xmi.jar.</para>
-
- <para>Also, note that the uima-core.jar file contains the Ecore model file uima.ecore,
- which defines the built-in UIMA types. You may need to use this file from your EMF
- applications.</para>
-
- </section>
-
- <section id="ugr.tug.xmi_emf.using_xmi_cas_serialization">
- <title>Using XMI CAS Serialization</title>
-
- <para>The UIMA SDK provides XMI support through the following two classes:</para>
-
- <para><emphasis role="bold"><literal>XmiCasSerializer:</literal></emphasis>
- can be run from within a UIMA application to write out a CAS to the standard XMI format. The
- XMI that is generated will be compliant with the Ecore model generated by
- <literal>UimaTypeSystem2Ecore</literal>. An EMF application could use this Ecore
- model to ingest and process the XMI produced by the XmiCasSerializer.</para>
-
- <para><emphasis role="bold"><literal>XmiCasDeserializer:</literal></emphasis>
- can be run from within a UIMA application to read in an XMI document and populate a CAS. The
- XMI must conform to the Ecore model generated by
- <literal>UimaTypeSystem2Ecore</literal>.</para>
-
- <para>Also, the uimaj-examples Eclipse project contains some example code that shows
- how to use the serializer and deserializer:
-
- <blockquote>
- <para><literal>org.apache.uima.examples.xmi.XmiWriterCasConsumer:</literal>
- This is a CAS Consumer that writes each CAS to an output file in XMI format. It is analogous
- to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it
- uses the XMI serialization format.</para>
-
- <para><literal>org.apache.uima.examples.xmi.XmiCollectionReader:</literal>
- This is a Collection Reader that reads a directory of XMI files and deserializes each of
- them into a CAS. For example, this would allow you to build a Collection Processing
- Engine that reads XMI files, which could contain some previous analysis results, and
- then do further analysis.</para>
- </blockquote></para>
-
- <para>Finally, in under the folder <literal>uimaj-examples/ecore_src</literal> is
- the class
- <literal>org.apache.uima.examples.xmi.XmiEcoreCasConsumer</literal>, which
- writes each CAS to XMI format and also saves the Type System as an Ecore file. Since this
- uses the <literal>UimaTypeSystem2Ecore</literal> converter, to compile it you must
- add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar –
- see ecore_src/readme.txt for instructions.</para>
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../entities.ent">
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.tug.xmi_emf">
+ <title>XMI and EMF Interoperability</title>
+ <titleabbrev>XMI & EMF</titleabbrev>
+
+ <section id="ugr.tug.xmi_emf.overview">
+ <title>Overview</title>
+
+ <para>In traditional object-oriented terms, a UIMA Type System is a class model and a UIMA CAS is an object graph.
+ There are established standards in this area
+ – specifically, <trademark class="registered">UML</trademark> is an <trademark class="trade">
+ OMG</trademark> standard for class models and XMI (XML Metadata Interchange) is an OMG standard for the XML
+ representation of object graphs.</para>
+
+ <para>Furthermore, the Eclipse Modeling Framework (EMF) is an open-source framework for model-based
+ application development, and it is based on UML and XMI. In EMF, you define class models using a metamodel called
+ Ecore, which is similar to UML. EMF provides tools for converting a UML model to Ecore. EMF can then generate Java
+ classes from your model, and supports persistence of those classes in the XMI format.</para>
+
+ <para>The UIMA SDK provides tools for interoperability with XMI and EMF. These tools allow conversions of UIMA
+ Type Systems to and from Ecore models, as well as conversions of UIMA CASes to and from XMI format. This provides a
+ number of advantages, including:</para>
+
+ <blockquote>
+ <para>You can define a model using a UML Editor, such as Rational Rose or EclipseUML, and then automatically
+ convert it to a UIMA Type System.</para>
+
+ <para>You can take an existing UIMA application, convert its type system to Ecore, and save the CASes it
+ produces to XMI. This data is now in a form where it can easily be ingested by an EMF-based application.</para>
+ </blockquote>
+
+ <para>More generally, we are adopting the well-documented, open standard XMI as the standard way to represent
+ UIMA-compliant analysis results (replacing the UIMA-specific XCAS format). This use of an open standard
+ enables other applications to more easily produce or consume these UIMA analysis results.</para>
+
+ <para>For more information on XMI, see Grose et al. <emphasis>Mastering XMI. Java Programming with XMI, XML, and
+ UML.</emphasis> John Wiley & Sons, Inc. 2002.</para>
+
+ <para>For more information on EMF, see Budinsky et al. <emphasis>Eclipse Modeling Framework 2.0.</emphasis>
+ Addison-Wesley. 2006.</para>
+
+ <para>For details of how the UIMA CAS is represented in XMI format, see <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xmi"/> .</para>
+
+ </section>
+
+ <section id="ugr.tug.xmi_emf.converting_ecore_to_from_uima_type_system">
+ <title>Converting an Ecore Model to or from a UIMA Type System</title>
+
+ <para>The UIMA SDK provides the following two classes:</para>
+
+ <para><emphasis role="bold"><literal>Ecore2UimaTypeSystem:</literal>
+ </emphasis> converts from an .ecore model developed using EMF to a UIMA-compliant
+ TypeSystem descriptor. This is a Java class that can be run as a standalone program or
+ invoked from another Java application. To run as a standalone program,
+ execute:</para>
+
+ <para><command>java org.apache.uima.ecore.Ecore2UimaTypeSystem <ecore
+ file> <output file></command></para>
+
+ <para>The input .ecore file will be converted to a UIMA TypeSystem descriptor and written
+ to the specified output file. You can then use the resulting TypeSystem descriptor in
+ your UIMA application.</para>
+
+ <para><emphasis role="bold"><literal>UimaTypeSystem2Ecore:</literal>
+ </emphasis> converts from a UIMA TypeSystem descriptor to an .ecore model. This is a
+ Java class that can be run as a standalone program or invoked from another Java
+ application. To run as a standalone program, execute:</para>
+
+ <para><command>java org.apache.uima.ecore.UimaTypeSystem2Ecore
+ <TypeSystem descriptor> <output file></command></para>
+
+ <para>The input UIMA TypeSystem descriptor will be converted to an Ecore model file and
+ written to the specified output file. You can then use the resulting Ecore model in EMF
+ applications. The converted type system will include any
+ <literal><import...></literal>ed TypeSystems; the fact that they were
+ imported is currently not preserved.</para>
+
+ <para>To run either of these converters, your classpath will need to include the UIMA jar
+ files as well as the following jar files from the EMF distribution: common.jar,
+ ecore.jar, and ecore.xmi.jar.</para>
+
+ <para>Also, note that the uima-core.jar file contains the Ecore model file uima.ecore,
+ which defines the built-in UIMA types. You may need to use this file from your EMF
+ applications.</para>
+
+ </section>
+
+ <section id="ugr.tug.xmi_emf.using_xmi_cas_serialization">
+ <title>Using XMI CAS Serialization</title>
+
+ <para>The UIMA SDK provides XMI support through the following two classes:</para>
+
+ <para><emphasis role="bold"><literal>XmiCasSerializer:</literal></emphasis>
+ can be run from within a UIMA application to write out a CAS to the standard XMI format. The
+ XMI that is generated will be compliant with the Ecore model generated by
+ <literal>UimaTypeSystem2Ecore</literal>. An EMF application could use this Ecore
+ model to ingest and process the XMI produced by the XmiCasSerializer.</para>
+
+ <para><emphasis role="bold"><literal>XmiCasDeserializer:</literal></emphasis>
+ can be run from within a UIMA application to read in an XMI document and populate a CAS. The
+ XMI must conform to the Ecore model generated by
+ <literal>UimaTypeSystem2Ecore</literal>.</para>
+
+ <para>Also, the uimaj-examples Eclipse project contains some example code that shows
+ how to use the serializer and deserializer:
+
+ <blockquote>
+ <para><literal>org.apache.uima.examples.xmi.XmiWriterCasConsumer:</literal>
+ This is a CAS Consumer that writes each CAS to an output file in XMI format. It is analogous
+ to the XCasWriter CAS Consumer that has existed in prior UIMA versions, except that it
+ uses the XMI serialization format.</para>
+
+ <para><literal>org.apache.uima.examples.xmi.XmiCollectionReader:</literal>
+ This is a Collection Reader that reads a directory of XMI files and deserializes each of
+ them into a CAS. For example, this would allow you to build a Collection Processing
+ Engine that reads XMI files, which could contain some previous analysis results, and
+ then do further analysis.</para>
+ </blockquote></para>
+
+ <para>Finally, in under the folder <literal>uimaj-examples/ecore_src</literal> is
+ the class
+ <literal>org.apache.uima.examples.xmi.XmiEcoreCasConsumer</literal>, which
+ writes each CAS to XMI format and also saves the Type System as an Ecore file. Since this
+ uses the <literal>UimaTypeSystem2Ecore</literal> converter, to compile it you must
+ add to your classpath the EMF jars common.jar, ecore.jar, and ecore.xmi.jar –
+ see ecore_src/readme.txt for instructions.</para>
<section id="ugr.tug.xmi_emf.xml_character_issues">
<title>Character Encoding Issues with XML Serialization</title>
@@ -181,6 +181,6 @@
</para>
</section>
- </section>
-
+ </section>
+
</chapter>
\ No newline at end of file
Propchange: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/tutorials_and_users_guides/tug.xmi_emf.xml
------------------------------------------------------------------------------
svn:eol-style = native