You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:01:57 UTC
svn commit: r941739 [3/5] - in
/uima/uimaj/branches/mavenAlign/uima-docbook-references: ./ src/
src/docbook/ src/docbook/images/ src/docbook/images/references/
src/docbook/images/references/ref.cas/
src/docbook/images/references/ref.javadocs/ src/docbo...
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xmi.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xmi.xml?rev=941739&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xmi.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xmi.xml Thu May 6 14:01:56 2010
@@ -0,0 +1,373 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.ref.xmi">
+ <title>XMI CAS Serialization Reference</title>
+
+ <para>This is the specification for the mapping of the UIMA CAS into the XMI (XML Metadata
+ Interchange<footnote><para> For details on XMI see Grose et al. <emphasis>Mastering
+ XMI. Java Programming with XMI, XML, and UML. </emphasis>John Wiley & Sons, Inc.
+ 2002.</para></footnote>) format. XMI is an OMG standard for expressing object graphs in
+ XML. The UIMA SDK provides support for XMI through the classes
+ <literal>org.apache.uima.cas.impl.XmiCasSerializer</literal> and
+ <literal>org.apache.uima.cas.impl.XmiCasDeserializer</literal>.</para>
+
+ <section id="ugr.ref.xmi.xmi_tag">
+ <title>XMI Tag</title>
+
+ <para>The outermost tag is <XMI> and must include a version number and XML
+ namespace attribute:
+
+
+ <programlisting><xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI">
+ <!-- CAS Contents here -->
+</xmi:XMI></programlisting></para>
+
+ <para>XML namespaces<footnote><para>http://www.w3.org/TR/xml-names11/</para>
+ </footnote> are used throughout. The <quote>xmi</quote> namespace prefix is used to
+ identify elements and attributes that are defined by the XMI specification. The XMI
+ document will also define one namespace prefix for each CAS namespace, as described in
+ the next section.</para>
+
+ </section>
+
+ <section id="ugr.ref.xmi.feature_structures">
+ <title>Feature Structures</title>
+
+ <para>UIMA Feature Structures are mapped to XML elements. The name of the element is
+ formed from the CAS type name, making use of XML namespaces as follows.</para>
+
+ <para>The CAS type namespace is converted to an XML namespace URI by the following rule:
+ replace all dots with slashes, prepend http:///, and append .ecore.</para>
+
+ <para>This mapping was chosen because it is the default mapping used by the Eclipse
+ Modeling Framework (EMF)<footnote><para> For details on EMF and Ecore see Budinsky et
+ al. <emphasis>Eclipse Modeling Framework 2.0</emphasis>. Addison-Wesley.
+ 2006.</para></footnote> to create namespace URIs from Java package names. The use of
+ the http scheme is a common convention, and does not imply any HTTP communication. The
+ .ecore suffix is due to the fact that the recommended type system definition for a
+ namespace is an ECore model, see <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.xmi_emf"/>.</para>
+
+ <para>Consider the CAS type name <quote>org.myproj.Foo</quote>. The CAS namespace
+ (<quote>org.myorg.</quote>) is converted to the XML namespace URI is
+ http:///org/myproj.ecore.</para>
+
+ <para>The XML element name is then formed by concatenating the XML namespace prefix
+ (which is an arbitrary token, but typically we use the last component of the CAS
+ namespace) with the type name (excluding the namespace).</para>
+
+ <para>So the example <quote>org.myproj.Foo</quote> FeatureStructure is written to
+ XMI as:
+
+
+ <programlisting><xmi:XMI
+ xmi:version="2.0"
+ xmlns:xmi="http://www.omg.org/XMI"
+ xmlns:myproj="http:///org/myproj.ecore">
+ ...
+ <myproj:Foo xmi:id="1"/>
+ ...
+</xmi:XMI></programlisting></para>
+
+ <para>The xmi:id attribute is only required if this object will be referred to from
+ elsewhere in the XMI document. If provided, the xmi:id must be unique for each
+ feature.</para>
+
+ <para>All namespace prefixes (e.g. <quote>myproj</quote>) in this example must be
+ bound to URIs using the <quote>xmlns...</quote> attribute, as defined by the XML
+ namespaces specification.</para>
+ </section>
+
+ <section id="ugr.ref.xmi.primitive_features">
+ <title>Primitive Features</title>
+
+ <para>CAS features of primitive types (String, Boolean, Byte, Short, Integer, Long ,
+ Float, or Double) can be mapped either to XML attributes or XML elements. For example, a
+ CAS FeatureStructure of type org.myproj.Foo, with features:
+
+
+ <programlisting>begin = 14
+end = 19
+myFeature = "bar"</programlisting>
+ could be mapped to:
+
+
+ <programlisting><xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
+ xmlns:myproj="http:///org/myproj.ecore">
+ ...
+ <myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/>
+ ...
+</xmi:XMI></programlisting>
+ or equivalently:
+
+
+ <programlisting><![CDATA[<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
+ xmlns:myproj="http:///org/myproj.ecore">
+ ...
+ <myproj:Foo xmi:id="1">
+ <begin>14</begin>
+ <end>19</end>
+ <myFeature>bar</myFeature>
+ </myproj:Foo>
+ ...
+</xmi:XMI>]]></programlisting></para>
+
+ <para>The attribute serialization is preferred for compactness, but either
+ representation is allowable. Mixing the two styles is allowed; some features can be
+ represented as attributes and others as elements.</para>
+
+ </section>
+
+ <section id="ugr.ref.xmi.reference_features">
+ <title>Reference Features</title>
+
+ <para>CAS features that are references to other feature structures (excluding arrays
+ and lists, which are handled separately) are serialized as ID references.</para>
+
+ <para>If we add to the previous CAS example a feature structure of type org.myproj.Baz,
+ with feature <quote>myFoo</quote> that is a reference to the Foo object, the
+ serialization would be:
+
+
+ <programlisting><![CDATA[<xmi:XMI xmi:version="2.0" xmlns:xmi="http://www.omg.org/XMI"
+ xmlns:myproj="http:///org/myproj.ecore">
+ ...
+ <myproj:Foo xmi:id="1" begin="14" end="19" myFeature="bar"/>
+ <myproj:Baz xmi:id="2" myFoo="1"/>
+ ...
+</xmi:XMI>]]></programlisting></para>
+
+ <para>As with primitive-valued features, it is permitted to use an element rather than an
+ attribute. However, the syntax is slightly different:</para>
+
+
+ <programlisting><myproj:Baz xmi:id="2">
+ <myFoo href="#1"/>
+<myproj.Baz></programlisting>
+
+ <para>Note that in the attribute representation, a reference feature is
+ indistinguishable from an integer-valued feature, so the meaning cannot be
+ determined without prior knowledge of the type system. The element representation is
+ unambiguous.</para>
+
+ </section>
+
+ <section id="ugr.ref.xmi.array_and_list_features">
+ <title>Array and List Features</title>
+
+ <para>For a CAS feature whose range type is one of the CAS array or list types, the XMI serialization depends on the
+ setting of the <quote>multipleReferencesAllowed</quote> attribute for that feature in the UIMA Type System
+ Description (see <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor.type_system.features"/>).</para>
+
+ <para>An array or list with multipleReferencesAllowed = false (the default) is serialized as a
+ <quote>multi-valued</quote> property in XMI. An array or list with multipleReferencesAllowed = true is
+ serialized as a first-class object. Details are described below.</para>
+
+ <section id="ugr.ref.xmi.array_and_list_features.as_multi_valued_properties">
+ <title>Arrays and Lists as Multi-Valued Properties</title>
+
+ <para>In XMI, a multi-valued property is the most natural XMI representation for most cases. Consider the
+ example where the FeatureStructure of type org.myproj.Baz has a feature myIntArray whose value is the
+ integer array {2,4,6}. This can be mapped to:
+
+ <programlisting><myproj:Baz xmi:id="3" myIntArray="2 4 6"/></programlisting> or
+ equivalently:
+
+
+ <programlisting><myproj:Baz xmi:id="3">
+ <myIntArray>2</myIntArray>
+ <myIntArray>4</myIntArray>
+ <myIntArray>6</myIntArray>
+</myproj:Baz></programlisting>
+ </para>
+
+ <para>Note that String arrays whose elements contain embedded spaces MUST use the latter mapping.</para>
+
+ <para>FSArray or FSList features are serialized in a similar way. For example an FSArray feature that contains
+ references to the elements with xmi:id's <quote>13</quote> and <quote>42</quote> could be
+ serialized as:
+
+ <programlisting><myproj:Baz xmi:id="3" myFsArray="13 42"/></programlisting> or:
+
+
+ <programlisting><myproj:Baz xmi:id="3">
+ <myFsArray href="#13"/>
+ <myFsArray href="#42"/>
+</myproj:Baz></programlisting>
+ </para>
+ </section>
+
+ <section id="ugr.ref.xmi.array_and_list_features.as_1st_class_objects">
+ <title>Arrays and Lists as First-Class Objects</title>
+
+ <para>The multi-valued-property representation described in the previous section does not allow multiple
+ references to an array or list object. Therefore, it cannot be used for features that are defined to allow
+ multiple references (i.e. features for which multipleReferencesAllowed = true in the Type System
+ Description).</para>
+
+ <para>When multipleReferencesAllowed is set to true, array and list features are serialized as references,
+ and the array or list objects are serialized as separate objects in the XMI. Consider again the example where
+ the FeatureStructure of type org.myproj.Baz has a feature myIntArray whose value is the integer array
+ {2,4,6}. If myIntArray is defined with multipleReferencesAllowed=true, the serialization will be as
+ follows:
+
+ <programlisting><myproj:Baz xmi:id="3" myIntArray="4"/></programlisting> or:
+
+
+ <programlisting><myproj:Baz xmi:id="3">
+ <myIntArray href="#4"/>
+</myproj:Baz></programlisting>
+ with the array object serialized as
+
+ <programlisting><cas:IntegerArray xmi:id="4" elements="2 4 6"/></programlisting> or:
+
+
+ <programlisting><cas:IntegerArray xmi:id="4">
+ <elements>2</elements>
+ <elements>4</elements>
+ <elements>6</elements>
+</cas:IntegerArray></programlisting></para>
+
+ <para>Note that in this case, the XML element name is formed from the CAS type name (e.g.
+ <quote><literal>uima.cas.IntegerArray</literal></quote>) in the same way as for other
+ FeatureStructures. The elements of the array are serialized either as a space-separated attribute named
+ <quote>elements</quote> or as a series of child elements named <quote>elements</quote>.</para>
+
+ <para>List nodes are just standard FeatureStructures with <quote>head</quote> and <quote>tail</quote>
+ features, and are serialized using the normal FeatureStructure serialization. For example, an
+ IntegerList with the values 2, 4, and 6 would be serialized as the four objects:
+
+
+ <programlisting><cas:NonEmptyIntegerList xmi:id="10" head="2" tail="11"/>
+<cas:NonEmptyIntegerList xmi:id="11" head="4" tail="12"/>
+<cas:NonEmptyIntegerList xmi:id="12" head="6" tail="13"/>
+<cas:EmptyIntegerList xmi:id"13"/></programlisting></para>
+
+ <para>This representation of arrays allows multiple references to an array of list. It also allows a feature
+ with range type TOP to refer to an array or list. However, it is a very unnatural representation in XMI and does
+ not support interoperability with other XMI-based systems, so we instead recommend using the
+ multi-valued-property representation described in the previous section whenever it is possible.</para>
+
+ </section>
+
+ <section id="ugr.ref.xmi.null_array_list_elements">
+ <title>Null Array/List Elements</title>
+
+ <para>In UIMA, an element of an FSArray or FSList may be null. In XMI, multi-valued properties do not permit null
+ values. As a workaround for this, we use a dummy instance of the special type cas:NULL, which has xmi:id 0.
+ For example, in the following example the <quote>myFsArray</quote> feature refers to an FSArray whose
+ second element is null:
+
+
+ <programlisting><cas:NULL xmi:id="0"/>
+<myproj:Baz xmi:id="3">
+ <myFsArray href="#13"/>
+ <myFsArray href="#0"/>
+ <myFsArray href="#42"/>
+</myproj:Baz></programlisting></para>
+
+ </section>
+
+ </section>
+
+ <section id="ugr.ref.xmi.sofas_views">
+ <title>Subjects of Analysis (Sofas) and Views</title>
+
+ <para>A UIMA CAS contain one or more subjects of analysis (Sofas). These are serialized no
+ differently from any other feature structure. For example:
+
+
+ <programlisting><?xml version="1.0"?>
+<xmi:XMI xmi:version="2.0" xmlns:xmi=http://www.omg.org/XMI
+ xmlns:cas="http:///uima/cas.ecore">
+ <cas:Sofa xmi:id="1" sofaNum="1"
+ text="the quick brown fox jumps over the lazy dog."/>
+</xmi:XMI></programlisting></para>
+
+ <para>Each Sofa defines a separate View. Feature Structures in the CAS can be members of
+ one or more views. (A Feature Structure that is a member of a view is indexed in its
+ IndexRepository, but that is an implementation detail.)</para>
+
+ <para>In the XMI serialization, views will be represented as first-class objects. Each
+ View has an (optional) <quote>sofa</quote> feature, which references a sofa, and
+ multi-valued reference to the members of the View. For example:</para>
+
+
+ <programlisting><cas:View sofa="1" members="3 7 21 39 61"/></programlisting>
+
+ <para>Here the integers 3, 7, 21, 39, and 61 refer to the xmi:id fields of the objects that
+ are members of this view.</para>
+ </section>
+
+ <section id="ugr.ref.xmi.linking_to_ecore_type_system">
+ <title>Linking an XMI Document to its Ecore Type System</title>
+ <titleabbrev>Linking XMI docs to Ecore Type System</titleabbrev>
+
+ <para>If the CAS Type System has been saved to an Ecore file (as described in <olink
+ targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.xmi_emf"/>), it is possible to store a
+ link from an XMI document to that Ecore type system. This is done using an xsi:schemaLocation attribute
+ on the root XMI element.</para>
+
+ <para>The xsi:schemaLocation attribute is a space-separated list that represents a
+ mapping from namespace URI (e.g. http:///org/myproj.ecore) to the physical URI of the
+ .ecore file containing the type system for that namespace. For example:
+
+
+ <programlisting>xsi:schemaLocation=
+ "http:///org/myproj.ecore file:/c:/typesystems/myproj.ecore"</programlisting>
+ would indicate that the definition for the org.myproj CAS types is contained in the file
+ <literal>c:/typesystems/myproj.ecore</literal>. You can specify a different
+ mapping for each of your CAS namespaces, using a space separated list. For details see
+ Budinsky et al. <emphasis>Eclipse Modeling Framework</emphasis>.</para>
+ </section>
+
+ <section id="ugr.ref.xmi.delta">
+ <title>Delta CAS XMI Format</title>
+ <titleabbrev>Delta CAS XMI Format</titleabbrev>
+ <para>
+ The Delta CAS XMI serialization format is designed primarily to reduce the overhead serialization when calling annotators
+ configured as services. Only Feature Structures and Views that are new or modified by the service
+ are serialized and returned by the service.
+ </para>
+ <para>
+ The classes <literal>org.apache.uima.cas.impl.XmiCasSerializer</literal> and
+ <literal>org.apache.uima.cas.impl.XmiCasDeserializer</literal> support serialization of only the modifications to the CAS.
+ A caller is expected to set a marker to indicate the point from which changes to the CAS are to be tracked.
+ </para>
+ <para>
+ A Delta CAS XMI document contains only the Feature Structures and Views that have been added or modified.
+ The new and modified Feature Structures are represented in exactly the format as in a complete CAS serialization.
+ The <literal> cas:View </literal> element has been extended with three additional attributes to represent modifications to
+ View membership. These new attributes are <literal>added_members</literal>, <literal>deleted_members</literal> and
+ <literal>reindexed_members</literal>. For example:
+ </para>
+ <programlisting><cas:View sofa="1" added_members="63 77" deleted_member="7 61" reindexed_members="39" /></programlisting>
+ <para>
+ Here the integers 63, 77 represent xmi:id fields of the objects that have been newly added members to this View,
+ 7 and 61 are xmi:id fields of the objects that have been removed from this view and 39 is the xmi:id of an object to be reindexed in this view.
+ </para>
+ </section>
+</chapter>
\ No newline at end of file