You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2010/05/06 16:01:57 UTC
svn commit: r941739 [1/5] - in
/uima/uimaj/branches/mavenAlign/uima-docbook-references: ./ src/
src/docbook/ src/docbook/images/ src/docbook/images/references/
src/docbook/images/references/ref.cas/
src/docbook/images/references/ref.javadocs/ src/docbo...
Author: schor
Date: Thu May 6 14:01:56 2010
New Revision: 941739
URL: http://svn.apache.org/viewvc?rev=941739&view=rev
Log:
[UIMA-1757] split uima-docbooks, rework for docbkx
Added:
uima/uimaj/branches/mavenAlign/uima-docbook-references/pom.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.cas/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.cas/image001.png (with props)
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.javadocs/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.javadocs/image002.jpg (with props)
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.pear/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.pear/image002.jpg (with props)
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/image002.png (with props)
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.cas.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.javadocs.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.jcas.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.pear.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xmi.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.xml.cpe_descriptor.xml
uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/references.xml
Modified:
uima/uimaj/branches/mavenAlign/uima-docbook-references/ (props changed)
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-references/
------------------------------------------------------------------------------
--- svn:ignore (added)
+++ svn:ignore Thu May 6 14:01:56 2010
@@ -0,0 +1,2 @@
+target
+.project
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/pom.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/pom.xml?rev=941739&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-references/pom.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-references/pom.xml Thu May 6 14:01:56 2010
@@ -0,0 +1,65 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<project xmlns="http://maven.apache.org/POM/4.0.0"
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
+ <modelVersion>4.0.0</modelVersion>
+
+ <parent>
+ <groupId>org.apache.uima</groupId>
+ <artifactId>parent-pom-docbook</artifactId>
+ <version>1-SNAPSHOT</version>
+ <relativePath/>
+ </parent>
+
+ <artifactId>uima-docbook-references</artifactId>
+ <packaging>pom</packaging>
+ <version>2.3.1-SNAPSHOT</version>
+ <name>Apache UIMA SDK Documentation - references</name>
+ <url>${uimaWebsiteUrl}</url>
+
+ <!-- Special inheritance note
+ even though the <scm> element that follows is exactly the
+ same as those in super poms, it cannot be inherited because
+ there is some special code that computes the connection elements
+ from the chain of parent poms, if this is omitted.
+
+ Keeping this a bit factored allows cutting/pasting the <scm>
+ element, and just changing the following two properties -->
+ <scm>
+ <connection>
+ scm:svn:http://svn.apache.org/repos/asf/uima/${uimaScmRoot}/trunk/${uimaScmProject}
+ </connection>
+ <developerConnection>
+ scm:svn:https://svn.apache.org/repos/asf/uima/${uimaScmRoot}/trunk/${uimaScmProject}
+ </developerConnection>
+ <url>
+ http://svn.apache.org/viewvc/uima/${uimaScmRoot}/trunk/${uimaScmProject}
+ </url>
+ </scm>
+
+ <properties>
+ <uimaScmRoot>uimaj</uimaScmRoot>
+ <uimaScmProject>${project.artifactId}</uimaScmProject>
+ <!-- next property is the name of the top file under src/docbook without trailing .xml -->
+ <bookNameRoot>references</bookNameRoot>
+ </properties>
+
+</project>
\ No newline at end of file
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.cas/image001.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.cas/image001.png?rev=941739&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.cas/image001.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.javadocs/image002.jpg
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.javadocs/image002.jpg?rev=941739&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.javadocs/image002.jpg
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.pear/image002.jpg
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.pear/image002.jpg?rev=941739&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.pear/image002.jpg
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/image002.png
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/image002.png?rev=941739&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/images/references/ref.xml.cpe_descriptor/image002.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.cas.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.cas.xml?rev=941739&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.cas.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.cas.xml Thu May 6 14:01:56 2010
@@ -0,0 +1,962 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/references/ref.cas/" >
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.ref.cas">
+ <title>CAS Reference</title>
+
+ <para>The CAS (Common Analysis System) is the part of the Unstructured Information
+ Management Architecture (UIMA) that is concerned with creating and handling the data
+ that annotators manipulate.</para>
+
+ <para>Java users typically use the JCas (Java interface to the CAS) when manipulating
+ objects in the CAS. This chapter describes an alternative interface to the CAS which
+ allows discovery and specification of types and features at run time. It is recommended
+ for use when the using code cannot know ahead of time the type system it will be dealing
+ with.</para>
+
+ <para>Use of the CAS as described here is also recommended (or necessary) when components add
+ to the definitions of types of other components. This UIMA feature allows users to add features
+ to a type that was already defined elsewhere. When this feature is used in conjunction with the
+ JCas, it can lead to problems with class loading. This is because different JCas representations
+ of a single type are generated by the different components, and only one of them is loaded
+ (unless you are using Pear descriptors). Note:
+ we do not recommend that you add features to pre-existing types. A type should be defined in one
+ place only, and then there is no problem with using the JCas. However, if you do use this feature,
+ do not use the JCas. Similarly, if you distribute your components for inclusion in somebody else's
+ UIMA application, and you're not sure that they won't add features to your types, do not use the
+ JCas for the same reasons.
+ </para>
+
+ <para>CASes passed to Annotator Components are either a base CAS or a regular CAS. Base CASes
+ are only passed to Multi-View components - they are like regular CASes, but do not have user
+ accessible indexes or Sofas. They are used by the component only for switching to other CAS
+ views, which are regular CASes.</para>
+
+ <section id="ugr.ref.cas.javadocs">
+ <title>Javadocs</title>
+
+ <para>The subdirectory <literal>docs/api</literal> contains the documentation
+ details of all the classes, methods, and constants for the APIs discussed here. Please
+ refer to this for details on the methods, classes and constants, specifically in the
+ packages <literal>org.apache.uima.cas.*</literal>.</para>
+ </section>
+
+ <section id="ugr.ref.cas.overview">
+ <title>CAS Overview</title>
+
+ <para>There are three<footnote><para>A fourth part, the Subject of Analysis,
+ is discussed in <olink targetdoc="&uima_docs_tutorial_guides;"
+ targetptr="ugr.tug.aas"/>.</para></footnote> main parts to the CAS: the type system, data creation and
+ manipulation, and indexing. We will start with a brief
+ description of these components.</para>
+ <section id="ugr.ref.cas.type_system">
+ <title>The Type System</title>
+
+ <para>The type system specifies what kind of data you will be able to manipulate in your
+ annotators. The type system defines two kinds of entities, types and features. Types
+ are arranged in a single inheritance tree and define the kinds of entities (objects)
+ you can manipulate in the CAS. Features optionally specify slots or fields within a
+ type. The correspondence to Java is to equate a CAS Type to a Java Class, and the CAS
+ Features to fields within the type. A critical difference is that CAS types have no
+ methods; they are just data structures with named slots (features). These features can
+ have as values primitive things like integers, floating point numbers, and strings,
+ and they also can hold references to other instances of objects in the CAS. We call
+ instances of the data structures declared by the type system <quote>feature
+ structures</quote> (not to be confused with <quote>features</quote>). Feature
+ structures are similar to the many variants of record structures found in computer
+ science.<footnote><para> The name <quote>feature structure</quote> comes from
+ terminology used in linguistics.</para></footnote></para>
+
+ <para>Each CAS Type defines a supertype; it is a subtype of that supertype. This means
+ that any features that the supertype defines are features of the subtype; in other
+ words, it inherits its supertype's features. Only single inheritance is
+ supported; a type's feature set is the union of all of the features in its
+ supertype hierarchy. There is a built-in type called uima.cas.TOP; this is the top,
+ root node of the inheritance tree. It defines no features.</para>
+
+ <para>The values that can be stored in features are either built-in primitive values or
+ references to other feature structures. The primitive values are
+ <literal>boolean</literal>, <literal>byte</literal>,
+ <literal>short</literal> (16 bit integers), <literal>integer</literal> (32
+ bit), <literal>long</literal> (64 bit), <literal>float</literal> (32 bit),
+ <literal>double</literal> (64 bit floats) and strings; the official names of these
+ are <literal>uima.cas.Boolean</literal>, <literal>uima.cas.Byte</literal>,
+ <literal>uima.cas.Short</literal>, <literal>uima.cas.Integer</literal>,
+ <literal>uima.cas.Long</literal>, <literal>uima.cas.Float</literal>
+ ,<literal> uima.cas.Double</literal> and <literal>uima.cas.String</literal>
+ . The strings are Java strings, and characters are Java characters. Technically, this means
+ that characters are UTF-16 code points, which is not quite the same as a Unicode character.
+ This distinction should make no difference for almost all applications.
+ The CAS also defines other basic built-in types for arrays of these, plus arrays of
+ references to other objects, called <literal>uima.cas.IntegerArray</literal>
+ ,<literal> uima.cas.FloatArray</literal>,
+ <literal>uima.cas.StringArray</literal>,
+ <literal>uima.cas.FSArray</literal>, etc.</para>
+
+ <para>The CAS also defines a built-in type called
+ <literal>uima.tcas.Annotation</literal> which inherits from
+ <literal>uima.cas.AnnotationBase</literal> which in turn inherits from
+ <literal>uima.cas.TOP</literal>. There are two features defined by this type,
+ called <literal>begin</literal> and <literal>end</literal>, both of which are
+ integer valued.</para>
+
+ </section>
+
+ <section id="ugr.ref.cas.creating_accessing_manipulating_data">
+ <title>Creating, accessing and manipulating data</title>
+ <titleabbrev>Creating/Accessing/Changing data</titleabbrev>
+
+ <para>
+ Creating and accessing data in the CAS requires knowledge about the types and features
+ defined in the type system. The idea is similar to other data access APIs, such as the XML
+ DOM or SAX APIs, or database access APIs such as JDBC. Contrary to those APIs, however, the
+ CAS does not use the names of type system entities directly in the APIs. Rather, you use
+ the type system to access type and feature entities by name, then use these entities in the
+ data manipulation APIs. This can be compared to the Java reflection APIs: the type system
+ is comparable to the Java class loader, and the type and feature objects to the
+ <literal>java.lang.Class</literal> and <literal>java.lang.reflect.Field</literal> classes.
+ </para>
+
+ <para>
+ Why does it have to be this complicated? You wouldn't normally use reflection to create a
+ Java object, either. As mentioned earlier, the JCas provides the more straightforward
+ method to manipulate CAS data. The CAS access methods described here need only be used for
+ generic types of applications that need to be able to handle any kind of data (e.g., generic
+ tooling) or when the JCas may not be used for other reasons. The generic kinds of applications
+ are exactly the ones where you would use the reflection API in Java as well.
+ </para>
+
+ </section>
+
+ <section id="ugr.ref.cas.creating_using_indexes">
+ <title>Creating and using indexes</title>
+
+ <para>Each view of a CAS provides a set of indexes for that view. Instances of feature
+ structures can be added to a view's indexes. These indexes provide
+ the only way for other annotators to locate existing data in the CAS. The only way for an
+ annotator to use data that another annotator has created is by using an index (or the
+ method <literal>getAllIndexedFS</literal> of the object <literal>FSIndexRepository</literal>) to
+ retrieve feature structures the first annotator created. If you want the data you
+ create to be visible to other annotators, you must explicitly call methods which
+ add it to the indexes — you must index it.</para>
+
+ <para>Indexes are named and are associated with a CAS Type; they are used to index
+ instances of that CAS type (including instances of that type's subtypes). If
+ you are using multiple views (see <olink
+ targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.mvs"/>),
+ each view contains a separate instantiation of all of the indexes.
+ To access an index, you
+ minimally need to know its name. A CAS view provides an index repository which you can
+ query for indexes for that view. Once you have a handle to an index, you can get
+ information about the feature structures in the index, the size of the index, as well
+ as an iterator over the feature structures.</para>
+
+ <para>Indexes are defined in the XML descriptor metadata for the application. Each CAS
+ View has its own, separate instantiation of indexes based on these definitions,
+ kept in the view's index repository. When you obtain an index, it is always from a
+ particular CAS view. When you index an item, it is always added to all indexes where it
+ belongs, within just one repository. You can specify different repositories
+ (associated with different CAS views) to use; a given Feature Structure instance
+ may be indexed in more
+ than one CAS View.</para>
+
+ <para>Iterators allow you to enumerate the feature structures in an index. FS iterators
+ provide two kinds of APIs: the regular Java iterator API, and a specific FS iterator API
+ where the usual Java iterator APIs (<literal>hasNext()</literal> and <literal>next()</literal>)
+ are replaced by <literal>isValid()</literal>, <literal>moveToNext()</literal> (which does
+ not return an element) and <literal>get()</literal>. Which API style you use is up to you,
+ but we do not recommend mixing the styles as the results are sometimes unexpected. If you
+ just want to iterate over an index from start to finish, either style is equally appropriate.
+ If you also use <literal>moveTo(FeatureStructure fs)</literal> and
+ <literal>moveToPrevious()</literal>, it is better to use the special FS iterator style.
+ </para>
+ <note><para>The reason to not mix these styles is that you might be thinking that
+ next() followed by moveToPrevious() would always work. This is not true, because
+ next() returns the "current" element, and advances to the next position, which might be
+ beyond the last element. At that point, the interator becomes "invalid", and by the iterator
+ contracts, moveToNext and moveToPrevious are not allowed on "invalid" iterators;
+ when an iterator is not valid, all bets are off. But you can
+ call these methods on the iterator — moveToFirst(), moveToLast(), or moveTo(FS) — to reset it.</para></note>
+
+ <para>Indexes are created by specifying them in the annotator's or
+ aggregate's resource descriptor. An index specification includes its name,
+ the CAS type being indexed, the kind of index it is, and an (optional) ordering
+ relation on the feature structures to be indexed. At startup time, all index
+ specifications are combined; duplicate definitions (having the same name) are
+ allowed only if their definitions are the same. </para>
+
+ <para>Feature structure instances need to be explicitly added to the index repository by a
+ method call. Feature structures that are not indexed will not be visible to other
+ annotators, (unless they are located via being referenced by some other feature of
+ another feature structure, which is indexed, or through a chain of these).</para>
+
+ <para>The framework defines an unnamed bag index which indexes all types. The
+ only access provided for this index is the getAllIndexedFS(type) method on the
+ index repository, which returns an iterator over all indexed instances of the
+ specified type (including its subtypes) for that CAS View.
+ </para>
+
+ <para>The framework defines one standard, built-in annotation index, called
+ AnnotationIndex, which indexes the <literal>uima.tcas.Annotation</literal>
+ type: all feature structures of type <literal>uima.tcas.Annotation</literal> or
+ its subtypes are automatically indexed with this built-in index.</para>
+
+ <para>The ordering relation used by this index is to first order by the value of the
+ <quote>begin</quote> features (in ascending order) and then by the value of the
+ <quote>end</quote> feature (in descending order). This ordering insures that
+ longer annotations starting at the same spot come before shorter ones. For Subjects
+ of Analysis other than Text, this may not be an appropriate index.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.ref.cas.builtin_types">
+ <title>Built-in CAS Types</title>
+
+ <para>The CAS has two kinds of built-in types – primitive and non-primitive. The
+ primitive types are:
+
+ <itemizedlist spacing="compact">
+ <listitem><para>uima.cas.Boolean</para></listitem>
+ <listitem><para>uima.cas.Byte</para></listitem>
+ <listitem><para>uima.cas.Short</para></listitem>
+ <listitem><para>uima.cas.Integer</para></listitem>
+ <listitem><para>uima.cas.Long</para></listitem>
+ <listitem><para>uima.cas.Float</para></listitem>
+ <listitem><para>uima.cas.Double</para></listitem>
+ <listitem><para>uima.cas.String</para></listitem>
+ </itemizedlist></para>
+
+ <para>The <literal>Byte, Short, Integer, </literal>and<literal> Long</literal> are
+ all signed integer types, of length 8, 16, 32, and 64 bits. The
+ <literal>Double</literal> type is 64 bit floating point. The
+ <literal>String</literal> type can be sub-typed to create sets of allowed values; see
+ <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor.type_system.string_subtypes"/>.
+ These types can be used to specify the range of a String-valued feature. They act like
+ Strings, but have additional checking to insure the setting of values into them
+ conforms to one of the allowed values. Note that the other primitive types cannot be used
+ as a supertype for another type definition; only
+ <literal>uima.cas.String</literal> can be sub-typed.</para>
+
+ <para>The non-primitive types exist in a type hierarchy; the top of the hierarchy is the
+ type <literal>uima.cas.TOP</literal>. All other non-primitive types inherit from
+ some supertype.</para>
+
+ <para>There are 9 built-in array types. These arrays have a size specified when they are
+ created; the size is fixed at creation time. They are named:
+
+ <itemizedlist spacing="compact">
+ <listitem><para>uima.cas.BooleanArray</para></listitem>
+ <listitem><para>uima.cas.ByteArray</para></listitem>
+ <listitem><para>uima.cas.ShortArray</para></listitem>
+ <listitem><para>uima.cas.IntegerArray</para></listitem>
+ <listitem><para>uima.cas.LongArray</para></listitem>
+ <listitem><para>uima.cas.FloatArray</para></listitem>
+ <listitem><para>uima.cas.DoubleArray</para></listitem>
+ <listitem><para>uima.cas.StringArray</para></listitem>
+ <listitem><para>uima.cas.FSArray</para></listitem>
+ </itemizedlist></para>
+
+ <para>The <literal>uima.cas.FSArray</literal> type is an array whose elements are
+ arbitrary other feature structures (instances of non-primitive types).</para>
+
+ <para>There are 3 built-in types associated with the artifact being analyzed:
+
+ <itemizedlist spacing="compact">
+ <listitem><para>uima.cas.AnnotationBase</para></listitem>
+ <listitem><para>uima.tcas.Annotation</para></listitem>
+ <listitem><para>uima.tcas.DocumentAnnotation</para></listitem>
+ </itemizedlist></para>
+
+ <para>The <literal>AnnotationBase</literal> type defines one system-used feature
+ which specifies for an annotation the subject of analysis (Sofa) to which it refers. The
+ Annotation type extends from this and defines 2 features, taking
+ <literal>uima.cas.Integer</literal> values, called <literal>begin</literal>
+ and <literal>end</literal>. The <literal>begin</literal> feature typically
+ identifies the start of a span of text the annotation covers; the
+ <literal>end</literal> feature identifies the end. The values refer to character
+ offsets; the starting index is 0. An annotation of the word <quote>CAS</quote> in a text
+ <quote>CAS Reference</quote> would have a start index of 0, and an end index of 3; the
+ difference between end and start is the length of the span the annotation refers
+ to.</para>
+
+ <para>Annotations are always with respect to some Sofa (Subject of Analysis – see
+ <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="ugr.tug.aas"/>
+ .</para>
+ <note><para>Artifacts which are not text strings may have a different interpretation of
+ the meaning of begin and end, or may define their own kind of annotation, extending from
+ <literal>AnnotationBase</literal>. </para></note>
+
+ <para id="ugr.ref.cas.document_annotation">The <literal>DocumentAnnotation</literal> type has one special instance. It is
+ a subtype of the Annotation type, and the built-in definition defines one feature,
+ <literal>language</literal>, which is a string indicating the language of the
+ document in the CAS. The value of this language feature is used by the system to control
+ flow among annotators when the <quote>CapabilityLanguageFlow</quote> mode is used,
+ allowing the flow to skip over annotators that don't process particular
+ languages. Users may extend this type by adding additional features to it, using the XML
+ Descriptor element for defining a type.</para>
+
+ <note><para>
+ We do <emphasis>not</emphasis> recommend extending the <literal>DocumentAnnotation</literal>
+ type. If you do, you must <emphasis>not</emphasis> use the JCas, for the reasons stated
+ earlier.
+ </para></note>
+
+ <para>Each CAS view has a different associated instance of the
+ <literal>DocumentAnnotation</literal> type. On the CAS, use
+ <literal>getDocumentationAnnotation()</literal> to access the
+ <literal>DocumentAnnotation</literal>.</para>
+
+ <para>There are also built-in types supporting linked lists, similar to the ones available in
+ Java and other programming languages. Their use is
+ constrained by the usual properties of linked lists: not very space efficient, no (efficient)
+ random access, but an easy choice if you don't know how long your list will be ahead of time. The
+ implementation is type specific; there are different list building objects for each of
+ the primitive types, plus one for general feature structures. Here are the type names:
+ <itemizedlist spacing="compact">
+ <listitem><para>uima.cas.FloatList</para></listitem>
+ <listitem><para>uima.cas.IntegerList</para></listitem>
+ <listitem><para>uima.cas.StringList</para></listitem>
+ <listitem><para>uima.cas.FSList</para>
+ <para></para></listitem>
+ <listitem><para>uima.cas.EmptyFloatList</para></listitem>
+ <listitem><para>uima.cas.EmptyIntegerList</para></listitem>
+ <listitem><para>uima.cas.EmptyStringList</para></listitem>
+ <listitem><para>uima.cas.EmptyFSList</para>
+ <para></para></listitem>
+ <listitem><para>uima.cas.NonEmptyFloatList</para></listitem>
+ <listitem><para>uima.cas.NonEmptyIntegerList</para></listitem>
+ <listitem><para>uima.cas.NonEmptyStringList</para></listitem>
+ <listitem><para>uima.cas.NonEmptyFSList</para></listitem>
+
+ </itemizedlist></para>
+
+ <para>For the primitive types <literal>Float</literal>,
+ <literal>Integer</literal>, <literal>String</literal> and
+ <literal>FeatureStructure</literal>, there is a base type, for instance,
+ <literal>uima.cas.FloatList</literal>. For each of these, there are two subtypes,
+ corresponding to a non-empty element, and a marker that serves to indicate the end of the
+ list, or an empty list. The non-empty types define two features –
+ <literal>head</literal> and <literal>tail</literal>. The head feature holds the
+ particular value for that part of the list. The tail refers to the next list object
+ (either a non-empty one or the empty version to indicate the end of the list).</para>
+
+ <para>There are no other built-in types. Users are free to define their own type systems,
+ building upon these types.</para>
+
+ </section>
+
+ <section id="ugr.ref.cas.accessing_the_type_system">
+ <title>Accessing the type system</title>
+
+ <para>
+ During annotator processing, or outside an annotator, access the type system by calling
+ <literal>CAS.getTypeSystem()</literal>.
+ </para>
+
+ <para>However, CAS annotators implement an additional method,
+ <literal>typeSystemInit()</literal>, which is called by the UIMA framework before the
+ annotator's process method. This method, implemented by the annotator writer,
+ is passed a reference to the CAS's type system metadata. The method typically uses
+ the type system APIs to obtain type and feature objects corresponding to all the types
+ and features the annotator will be using in its process method. This initialization
+ step should not be done during an annotator's initialize method since the type
+ system can change after the initialize method is called; it should not be done during the
+ process method, since this is presumably work that is identical for each incoming
+ document, and so should be performed only when the type system changes (which will be a
+ rare event). The UIMA framework guarantees it will call the <literal>typeSystemInit
+ </literal>method of an annotator whenever the type system changes, before calling the
+ annotator's <literal>process()</literal> method.</para>
+
+ <para>The initialization done by <literal>typeSystemInit()</literal> is done by the
+ UIMA framework when you use the JCas APIs; you only need to provide a
+ <literal>typeSystemInit()</literal> method, as described here, when you are not using
+ the JCas approach.</para>
+
+ <section id="ugr.ref.cas.type_system.printer_example">
+ <title>TypeSystemPrinter example</title>
+
+ <para>Here is a code fragment that, given a CAS Type System, will print a list of all
+ types.</para>
+
+
+ <programlisting>// Get all type names from the type system
+// and print them to stdout.
+private void listTypes1(TypeSystem ts) {
+ // Get an iterator over types
+ Iterator typeIterator = ts.getTypeIterator();
+ Type t;
+ System.out.println("Types in the type system:");
+ while (typeIterator.hasNext()) {
+ // Retrieve a type...
+ t = (Type) typeIterator.next();
+ // ...and print its name.
+ System.out.println(t.getName());
+ }
+ System.out.println();
+}</programlisting>
+
+ <para>This method is passed the type system as a parameter. From the type system, we can
+ get an iterator
+ over all known types. If you run this against a CAS created with no additional
+ user-defined types, we should see something like this on the console:</para>
+
+ <programlisting>Types in the type system:
+uima.cas.Boolean
+uima.cas.Byte
+uima.cas.Short
+uima.cas.Integer
+uima.cas.Long
+uima.cas.ArrayBase
+...
+ </programlisting>
+
+ <para>If the type system had user-defined types these would show up too. Note that some
+ of these types are not directly creatable – they are types used by the framework
+ in the type hierarchy (e.g. uima.cas.ArrayBase).</para>
+
+ <para>CAS type names include a name-space prefix. The components of a type name are
+ separated by the dot (.). A type name component must start with a Unicode letter,
+ followed by an arbitrary sequence of letters, digits and the underscore (_). By
+ convention, the last component of a type name starts with an uppercase letter, the
+ rest start with a lowercase letter.</para>
+
+ <para>Listing the type names is mildly useful, but it would be even better if we could see
+ the inheritance relation between the types. The following code prints the
+ inheritance tree in indented format.</para>
+
+
+ <programlisting>private static final int INDENT = 2;
+private void listTypes2(TypeSystem ts) {
+ // Get the root of the inheritance tree.
+ Type top = ts.getTopType();
+ // Recursively print the tree.
+ printInheritanceTree(ts, top, 0);
+}
+
+private void printInheritanceTree(TypeSystem ts, Type type, int level) {
+ indent(level); // Print indentation.
+ System.out.println(type.getName());
+ // Get a vector of the immediate subtypes.
+ Vector subTypes =
+ ts.getDirectlySubsumedTypes(type);
+ ++level; // Increase the indentation level.
+ for (int i = 0; i < subTypes.size(); i++) {
+ // Print the subtypes.
+ printInheritanceTree(ts, (Type) subTypes.get(i), level);
+ }
+}
+
+// A simple, inefficient indenter
+private void indent(int level) {
+ int spaces = level * INDENT;
+ for (int i = 0; i < spaces; i++) {
+ System.out.print(" ");
+ }
+}</programlisting>
+
+ <para> This example shows that you can traverse the type hierarchy by starting at the top
+ with TypeSystem.getTopType and by retrieving subtypes with
+ <literal>TypeSystem.getDirectlySubsumedTypes()</literal>.</para>
+
+ <para>The Javadocs also have APIs that allow you to access the features, as well as what
+ the allowed value type is for that feature. Here is sample code which prints out all the
+ features of all the types, together with the allowed value types (the feature
+ <quote>range</quote>). Each feature has a <quote>domain</quote> which is the type
+ where it is defined, as well as a <quote>range</quote>.
+
+
+ <programlisting>private void listFeatures2(TypeSystem ts) {
+ Iterator featureIterator = ts.getFeatures();
+ Feature f;
+ System.out.println("Features in the type system:");
+ while (featureIterator.hasNext()) {
+ f = (Feature) featureIterator.next();
+ System.out.println(
+ f.getShortName() + ": " +
+ f.getDomain() + " -> " + f.getRange());
+ }
+ System.out.println();
+}</programlisting></para>
+
+ <para>We can ask a feature object for its domain (the type it is defined on) and its range
+ (the type of the value of the feature). The terminology derives from the fact that
+ features can be viewed as functions on subspaces of the object space.</para>
+
+ </section>
+
+ <section id="ugr.ref.cas.cas_apis_create_modify_feature_structures">
+ <title>Using the CAS APIs to create and modify feature structures</title>
+ <titleabbrev>Using CAS APIs: Feature Structures</titleabbrev>
+
+ <para>Assume a type system declaration that defines two types: Entity and Person.
+ Entity has no features defined within it but inherits from uima.tcas.Annotation
+ – so it has the begin and end features. Person is, in turn, a subtype of Entity,
+ and adds firstName and lastName features. CAS type systems are declaratively
+ specified using XML; the format of this XML is described in <olink
+ targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor.type_system"/>.
+
+
+ <programlisting><![CDATA[<!-- Type System Definition -->
+<typeSystemDescription>
+ <types>
+ <typeDescription>
+ <name>com.xyz.proj.Entity</name>
+ <description />
+ <supertypeName>uima.tcas.Annotation</supertypeName>
+ </typeDescription>
+ <typeDescription>
+ <name>Person</name>
+ <description />
+ <supertypeName>com.xyz.proj.Entity </supertypeName>
+ <features>
+ <featureDescription>
+ <name>firstName</name>
+ <description />
+ <rangeTypeName>uima.cas.String</rangeTypeName>
+ </featureDescription>
+ <featureDescription>
+ <name>lastName</name>
+ <description />
+ <rangeTypeName>uima.cas.String</rangeTypeName>
+ </featureDescription>
+ </features>
+ </typeDescription>
+ </types>
+</typeSystemDescription>]]></programlisting></para>
+
+ <para>
+ To be able to access types and features, we need to know their names. The CAS interface defines
+ constants that hold the names of built-in feature names, such as, e.g.,
+ <literal>CAS.TYPE_NAME_INTEGER</literal>. It is good programming practice to create such
+ constants for the types and features you define, for your own use as well as for others who will
+ be using your annotators.
+ </para>
+
+
+ <programlisting>/** Entity type name constant. */
+public static final String ENTITY_TYPE_NAME = "com.xyz.proj.Entity";
+
+/** Person type name constant. */
+public static final String PERSON_TYPE_NAME = "com. xyz.proj.Person";
+
+/** First name feature name constant. */
+public static final String FIRST_NAME_FEAT_NAME = "firstName";
+
+/** Last name feature name constant. */
+public static final String LAST_NAME_FEAT_NAME = "lastName";</programlisting>
+
+ <para>Next we define type and feature member variables; these will hold the values of the
+ type and feature objects needed by the CAS APIs, to be assigned during
+ <literal>typeSystemInit()</literal>.</para>
+
+
+ <programlisting>// Type system object variables
+private Type entityType;
+private Type personType;
+private Feature firstNameFeature;
+private Feature lastNameFeature;
+private Type stringType;</programlisting>
+
+ <para>The type system does not throw an exception if we ask for something that is
+ not known, it simply returns null; therefore the code checks for this and throws a proper
+ exception. We require all these types and features to be defined for the annotator to
+ work. One might imagine situations where certain computations are predicated on some type
+ or feature being defined in the type system, but that is not the case here.</para>
+
+
+ <programlisting>// Get a type object corresponding to a name.
+// If it doesn't exist, throw an exception.
+private Type initType(String typeName)
+ throws AnnotatorInitializationException {
+ Type type = ts.getType(typeName);
+ if (type == null) {
+ throw new AnnotatorInitializationException(
+ AnnotatorInitializationException.TYPE_NOT_FOUND,
+ new Object[] { this.getClass().getName(), typeName });
+ }
+ return type;
+}
+
+// We add similar code for retrieving feature objects.
+// Get a feature object from a name and a type object.
+// If it doesn't exist, throw an exception.
+private Feature initFeature(String featName, Type type)
+ throws AnnotatorInitializationException {
+ Feature feat = type.getFeatureByBaseName(featName);
+ if (feat == null) {
+ throw new AnnotatorInitializationException(
+ AnnotatorInitializationException.FEATURE_NOT_FOUND,
+ new Object[] { this.getClass().getName(), featName });
+ }
+ return feat;
+}</programlisting>
+
+ <para>Using these two functions, code for initializing the type system described
+ above would be:
+
+
+ <programlisting>public void typeSystemInit(TypeSystem aTypeSystem)
+ throws AnalysisEngineProcessException {
+ this.typeSystem = aTypeSystem;
+ // Set type system member variables.
+ this.entityType = initType(ENTITY_TYPE_NAME);
+ this.personType = initType(PERSON_TYPE_NAME);
+ this.firstNameFeature =
+ initFeature(FIRST_NAME_FEAT_NAME, personType);
+ this.lastNameFeature =
+ initFeature(LAST_NAME_FEAT_NAME, personType);
+ this.stringType = initType(CAS.TYPE_NAME_STRING);
+}</programlisting></para>
+
+ <para>Note that we initialize the string type by using a type name constant from the
+ CAS.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.ref.cas.creating_feature_structures">
+ <title>Creating feature structures</title>
+
+ <para>To create feature structures in JCas, we use the Java <quote>new</quote>
+ operator. In the CAS, we use one of several different API methods on the CAS object,
+ depending on which of the 10 basic kinds of feature structures we are creating (a plain
+ feature structure, or an instance of the built-in primitive type arrays or FSArray).
+ There are is also a method to create an instance of a
+ <literal>uima.tcas.Annotation</literal>, setting the begin and end
+ values.</para>
+
+ <para>Once a feature structure is created, it needs to be added to the CAS indexes (unless
+ it will be accessed via some reference from another accessible feature structure). The
+ CAS provides this API: Assuming aCAS holds a reference to a CAS, and token holds a
+ reference to a newly created feature structure, here's the code to add that
+ feature structure to all the relevant CAS indexes:</para>
+
+
+ <programlisting> // Add the token to the index repository.
+ aCAS.addFsToIndexes(token);</programlisting>
+
+ <para>There is also a corresponding <literal>removeFsFromIndexes(token)</literal>
+ method on CAS objects.</para>
+
+ <para>Because some of the indexes (the Sorted and Set types) use comparators defined
+ on particular values of the features of an indexed type, if you change the values of
+ those features being used in the index key, the correct way to do this is to
+ <orderedlist spacing="compact">
+ <listitem><para>remove the item from all indexes where it is indexed, in all views
+ where it is indexed,</para>
+ </listitem>
+ <listitem><para>update the value of the features being used as keys,</para></listitem>
+ <listitem><para>add the item back to the indexes, in all views.</para></listitem>
+ </orderedlist></para>
+ </section>
+
+ <section id="ugr.ref.cas.accessing_modifying_features_of_feature_structures">
+ <title>Accessing or modifying features of feature structures</title>
+ <titleabbrev>Accessing or modifying Features</titleabbrev>
+
+ <para>Values of individual features for a feature structure can be set or referenced,
+ using a set of methods that depend on the type of value that feature is declared to have.
+ There are methods on FeatureStructure for this: getBooleanValue, getByteValue,
+ getShortValue, getIntValue, getLongValue, getFloatValue, getDoubleValue,
+ getStringValue, and getFeatureValue (which means to get a value which in turn is a
+ reference to a feature structure). There are corresponding <quote>setter</quote>
+ methods, as well. These methods on the feature structure object take as arguments the
+ feature object retrieved earlier in the typeSystemInit method.</para>
+
+ <para>Using the previous example, with the type system initialized with type personType
+ and feature lastNameFeature, here's a sample code fragment that gets and sets
+ that feature:</para>
+
+
+ <programlisting>// Assume aPerson is a variable holding an object of type Person
+// get the lastNameFeature value from the feature structure
+String lastName = aPerson.getStringValue(lastNameFeature);
+// set the lastNameFeature value
+aPerson.setStringValue(lastNameFeature, newStringValueForLastName);</programlisting>
+
+ <para>The getters and setters for each of the primitive types are defined in the Javadocs
+ as methods of the FeatureStructure interface.</para>
+
+ </section>
+
+ <section id="ugr.ref.cas.indexes_and_iterators">
+ <title>Indexes and Iterators</title>
+
+ <para>Each CAS can have many indexes associated with it; each CAS View contains
+ a complete set of instantions of the indexes. Each index is represented by an
+ instance of the type org.apache.uima.cas.FSIndex. You use the object
+ org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to
+ retrieve instances of indexes. There are methods that let you select the index
+ by name, by type, or by both name and type. Since each index is already associated with a type,
+ passing both a name and a type is valid only if the type passed in is the same
+ type or a subtype of the one declared in the index specification for the named index. If you
+ pass in a subtype, the returned FSIndex object refers to an index that will return only
+ items belonging to that subtype (or subtypes of that subtype).</para>
+
+ <para>The returned FSIndex objects are used, in turn, to create iterators.
+ There is also a method on the Index Repository, <literal>getAllIndexedFS</literal>,
+ which will return an iterator over all indexed Feature Structures (for that CAS View),
+ in no particular order. The iterators
+ created can be used like common Java iterators, to sequentially retrieve items
+ indexed. If the index represents a sorted index, the items are returned in a sorted
+ order, where the sort order is specified in the XML index definition. This XML is part of
+ the Component Descriptor, see <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor.aes.index"/>.</para>
+
+ <para>Feature structures should not be added to or removed from indexes while iterating
+ over them; a ConcurrentModificationException is thrown when this is detected.
+ Certain operations are allowed with the iterators after modification, which can
+ <quote>reset</quote> this condition, such as moving to beginning, end, or moving to a
+ particular feature structure. So - if you have to modify the index, you can move it back to
+ the last FS you had retrieved from the iterator, and then continue, if that makes sense in
+ your application.</para>
+
+ <section id="ugr.ref.cas.index.built_in_indexes">
+ <title>Built-in Indexes</title>
+
+ <para>An unnamed built-in bag index exists which holds all feature structures which are indexed.
+ The only access to this index is the method getAllIndexedFS(Type) which returns an iterator
+ over all indexed Feature Structures.</para>
+
+ <para>The CAS also contains a built-in index for the type <literal>uima.tcas.Annotation</literal>, which sorts
+ annotations in the order in which they appear in the document. Annotations are sorted first by increasing
+ <literal>begin</literal> position. Ties are then broken by <emphasis>decreasing</emphasis>
+ <literal>end</literal> position (so that longer annotations come first). Annotations that match in both
+ their <literal>begin</literal> and <literal>end</literal> features are sorted using the Type Priority
+ (see <olink targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.xml.component_descriptor.aes.type_priority"/> )</para>
+ </section>
+
+
+ <section id="ugr.ref.cas.index.adding_to_indexes">
+ <title>Adding Feature Structures to the Indexes</title>
+
+ <para>Feature Structures are added to the indexes by calling the
+ <literal>FSIndexRepository.addFS(FeatureStructure)</literal> method or the equivalent convenience
+ method <literal>CAS.addFsToIndexes(FeatureStructure)</literal>. This adds the Feature Structure to
+ <emphasis>all</emphasis> indexes that are defined for the type of that FeatureStructure (or any of its
+ supertypes). Note that you should not add a Feature Structure to the indexes until you have set values for all
+ of the features that may be used as sort keys in an index.</para>
+ </section>
+
+ <section id="ugr.ref.cas.index.iterators">
+ <title>Iterators</title>
+
+ <para>Iterators are objects of class <literal>org.apache.uima.cas.FSIterator.</literal> This class
+ extends <literal>java.util.Iterator</literal> and implements the normal Java iterator methods, plus
+ additional ones that allow moving both forwards and backwards.</para>
+ </section>
+
+ <section id="ugr.ref.cas.index.annotation_index">
+ <title>Special iterators for Annotation types</title>
+
+ <para>The built-in index over the <literal>uima.tcas.Annotation</literal> type
+ named <quote><literal>AnnotationIndex</literal></quote> has additional
+ capabilities. To use them, you first get a reference to this built-in index using
+ either the <literal>getAnnotationIndex</literal> method on a CAS View object, or
+ by asking the <literal>FSIndexRepository</literal> object for an index having the
+ particular name <quote>AnnotationIndex</quote>, for example:
+
+ <programlisting>AnnotationIndex idx = aCAS.getAnnotationIndex();
+// or you can iterate over a specific subtype of Annotation:
+AnnotationIndex idx = aCAS.getAnnotationIndex(aType); </programlisting></para>
+
+ <para>This object can be used to produce several additional kinds of iterators. It can
+ produce unambiguous iterators; these skip over elements until it finds one where the
+ start position of the next annotation is equal to or greater than the end position of
+ the previously returned annotation.</para>
+
+ <para>It can also produce several kinds of subiterators; these are iterators whose
+ annotations fall within the span of another annotation. This kind of iterator can
+ also have the unambiguous property, if desired. It also can be
+ <quote>strict</quote> or not; strict means that the returned annotation lies
+ completely within the span of the controlling annotation. Non-strict only implies
+ that the beginning of the returned annotation falls within the span of the
+ controlling annotation.</para>
+
+ <para>There is also a method which produces an <literal>AnnotationTree</literal>
+ object, which contains nodes representing the results of doing a strict,
+ unambiguous subiterator over the span of some controlling annotation. For more
+ details, please refer to the Javadocs for the
+ <literal>org.apache.uima.cas.text</literal> package.</para>
+
+ </section>
+
+ <section id="ugr.ref.cas.index.constraints_and_filtered_iterators">
+ <title>Constraints and Filtered iterators</title>
+
+ <para>There is a set of API calls that build constraint objects. These objects can be
+ used directly to test if a particular feature structure matches (satisfies) the
+ constraint, or they can be passed to the createFilteredIterator method to create an
+ iterator that skips over instances which fail to satisfy the constraint.</para>
+
+ <para>It is possible to specify a feature value located by following a chain of
+ references starting from the feature structure being tested. Here's a
+ scenario to explore this concept. Let's suppose you have the following type
+ system (namespaces are omitted for clarity):
+
+ <blockquote>
+ <para><emphasis role="bold">Token</emphasis>, having a feature PartOfSpeech
+ which holds a reference to another type (POS)</para>
+
+ <para><emphasis role="bold">POS</emphasis> (a type with many subtypes, each
+ representing a different part of speech)</para>
+
+ <para><emphasis role="bold">Noun</emphasis> (a subtype of POS)</para>
+
+ <para><emphasis role="bold">ProperName</emphasis> (a subtype of Noun),
+ having a feature Class which holds an integer value encoding some information
+ about the proper noun.</para></blockquote></para>
+
+ <para>If you want to filter Token instances, such that only those tokens get through
+ which are proper names of class 3 (for example), you would need a test that started with
+ a Token instance, followed its PartOfSpeech reference to another instance (the
+ ProperName instance) and then tested the Class feature of that instance for a value
+ equal to 3.</para>
+
+ <para>To support this, the filtering approach has components that specify tests, and
+ components that specify <quote>paths</quote>. The tests that can be done include
+ testing references to type instances to see if they are instances of some type or its
+ subtypes; this is done with a FSTypeConstraint constraint. Other tests check for
+ equality or, for numeric values, ranges.</para>
+
+ <para>Each test may be combined with a path – to get to the value to test. Tests that
+ start from a feature structure instance can be combined with and and or connectors.
+ The Javadocs for these are in the package org.apache.uima.cas in the classes that end
+ in Constraint, plus the classes ConstraintFactory, FeaturePath and CAS.
+ Here's an example; assume the variable cas holds a reference to a CAS instance.
+
+
+ <programlisting>// Start by getting the constraint factory from the CAS.
+ConstraintFactory cf = cas.getConstraintFactory();
+
+// To specify a path to an item to test, you start by
+// creating an empty path.
+FeaturePath path = cas.createFeaturePath();
+
+// Add POS feature to path, creating one-element path.
+path.addFeature(posFeat);
+
+// You can extend the chain arbitrarily by adding additional
+// features.
+
+// Create a new type constraint.
+
+// Type constraints will check that structures
+// they match against have a type at least as specific
+// as the type specified in the constraint.
+FSTypeConstraint nounConstraint = cf.createTypeConstraint();
+
+// Set the type (by default it is TOP).
+// This succeeds if the type being tested by this constraint
+// is nounType or a subtype of nounType.
+nounConstraint.add(nounType);
+
+// Embed the noun constraint under the pos path.
+// This means, associate the test with the path, so it tests the
+// proper value.
+
+// The result is a test which will
+// match a feature structure that has a posFeat defined
+// which has a value which is an instance of a nounType or
+// one of its subtypes.
+FSMatchConstraint embeddedNoun = cf.embedConstraint(path, nounConstraint);
+
+// Create a type constraint for token (or a subtype of it)
+FSTypeConstraint tokenConstraint = cf.createTypeConstraint();
+
+// Set the type.
+tokenConstraint.add(tokenType);
+
+// Create the final constraint by conjoining the two constraints.
+FSMatchConstraint nounTokenCons = cf.and(nounConstraint, tokenConstraint);
+
+// Create a filtered iterator from some annotation iterator.
+FSIterator it = cas.createFilteredIterator(annotIt, nounTokenCons);</programlisting>
+ </para></section></section>
+
+ <section id="ugr.ref.cas.guide_to_javadocs">
+ <title>The CAS API's – a guide to the Javadocs</title>
+ <titleabbrev>CAS API's Javadocs</titleabbrev>
+
+ <para>The CAS APIs are organized into 3 Java packages: cas, cas.impl, and cas.text. Most
+ of the APIs described here are in the cas package. The cas.impl package contains classes
+ used in serializing and deserializing (reading and writing to external strings) the
+ XCAS form of the CAS (XCAS is an XML serialization of the CAS). The XCAS form is used for
+ transporting the CAS among local and remote annotators, or for storing the CAS in
+ permanent storage. The cas.text contains the APIs that extend the CAS to support
+ artifact (including <quote>text</quote>) analysis.</para>
+
+ <section id="ugr.ref.cas.javadocs.cas_package">
+ <title>APIs in the CAS package</title>
+
+ <para>The main objects implementing the APIs discussed here are shown in the diagram
+ below. The hierarchy represents that there is a way to get from an upper object to an
+ instance of the lower object, usually by using a method on the upper object; this is not
+ an inheritance hierarchy.
+ <figure id="ugr.ref.cas.fig.api_hierarchy">
+ <title>CAS Object hierarchy</title>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.8in" format="JPG"
+ fileref="&imgroot;image001.png"/>
+ </imageobject>
+ <textobject><phrase>CAS object hierarchy</phrase></textobject>
+ </mediaobject>
+ </figure> </para>
+
+ <para>The main Interface is the CAS interface. This has most of the functionality of the
+ CAS, except for the type system metadata access, and the indexing access. JCas and CAS
+ are alternative representations and API approaches to the CAS; each has a method to
+ get the other. You can mix JCas and CAS APIs in your application as needed. To use the
+ JCas APIs, you have to create the Java classes that correspond to the CAS types, and
+ include them in the Java class path of the application. If you have a CAS object, you can
+ get a JCas object by using the getJCas() method call on the CAS object; likewise, you
+ can get the CAS object from a JCas by using the getCAS() method call on the JCas object.
+ There is also a low level CAS interface that is not part of the official API, and is
+ intended for internal use only – it is not documented here.</para>
+
+ <para>The type system metadata APIs are found in the TypeSystem interface. The objects
+ defining each type and feature are defined by the interfaces Type and Feature. The
+ Type interface has methods to see what types subsume other types, to iterate over the
+ types available, and to extract information about the types, including what
+ features it has. The Feature interface has methods that get what type it belongs to,
+ its name, and its range (the kind of values it can hold).</para>
+
+ <para>The FSIndexRepository gives you access to methods to get instances of indexes, and
+ also provides access to the iterator over all indexed feature structures:
+ <literal>getAllIndexedFS(aType)</literal>.
+ The FSIndex and AnnotationIndex objects give you methods to create instances of
+ iterators.</para>
+
+ <para>Iterators and the CAS methods that create new feature structures return
+ FeatureStructure objects. These objects can be used to set and get the values of
+ defined features within them.</para>
+ </section>
+ </section>
+</chapter>
\ No newline at end of file
Added: uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.javadocs.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.javadocs.xml?rev=941739&view=auto
==============================================================================
--- uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.javadocs.xml (added)
+++ uima/uimaj/branches/mavenAlign/uima-docbook-references/src/docbook/ref.javadocs.xml Thu May 6 14:01:56 2010
@@ -0,0 +1,87 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.4//EN"
+"http://www.oasis-open.org/docbook/xml/4.4/docbookx.dtd"[
+<!ENTITY imgroot "images/references/ref.javadocs/">
+<!ENTITY tp "ugr.ref.javadocs.">
+<!ENTITY % uimaents SYSTEM "../../target/docbook-shared/entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.ref.javadocs">
+ <title>Javadocs</title>
+
+ <para>The details of all the public APIs for UIMA are contained in the API Javadocs. These are located in the docs/api
+ directory; the top level to open in your browser is called <ulink url="api/index.html"/>.</para>
+
+ <para>Eclipse supports the ability to attach the Javadocs to your project. The Javadoc should already be attached
+ to the <literal>uimaj-examples</literal> project, if you followed the setup instructions in <olink
+ targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.eclipse_setup.example_code"/>. To attach
+ Javadocs to your own Eclipse project, use the following instructions.</para>
+
+ <note><para>As an alternative, you can add the UIMA source to the UIMA binary distribution; if you
+ do this you not only will have the Javadocs automatically available (you can skip the following
+ setup), you will have the ability to step through the UIMA framework code while debugging.
+ To add the source, follow the instructions as described in the setup chapter:
+ <olink targetdoc="&uima_docs_overview;" targetptr="ugr.ovv.eclipse_setup.adding_source"/>.</para></note>
+
+ <para>To add the Javadocs, open a project which is referring to the UIMA APIs in its class path, and open the project properties. Then pick
+ Java Build Path. Pick the "Libraries" tab and select one of the UIMA library entries (if you don't have, for
+ instance, uima-core.jar in this list, it's unlikely your code will compile). Each library entry has a small "+"
+ sign on its left - click that to expand the view to see the Javadoc location. If you highlight that and press edit - you
+ can add a reference to the Javadocs, in the following dialog:
+
+
+ <screenshot>
+ <mediaobject>
+ <imageobject>
+ <imagedata width="5.8in" format="JPG" fileref="&imgroot;image002.jpg"/>
+ </imageobject>
+ <textobject><phrase>Screenshot of attaching Javadoc to source in Eclipse</phrase></textobject>
+ </mediaobject>
+ </screenshot></para>
+
+ <para>Once you do this, Eclipse can show you Javadocs for UIMA APIs as you work. To see the Javadoc for a UIMA API, you
+ can hover over the API class or method, or select it and press shift-F2, or use the menu Navigate →
+ Open External Javadoc, or open the Javadoc view (Window → Show View → Other
+ → Java → Javadoc).</para>
+
+ <para>In a similar manner, you can attach the source for the UIMA framework, if you download the source
+ distribution. The source corresponding to particular
+ releases is available from the Apache UIMA web site (<ulink url="http://incubator.apache.org/uima"/>) on the
+ downloads page.</para>
+
+ <section id="ugr.ref.javadocs.libraries">
+ <title>Using named Eclipse User Libraries</title>
+ <para>You can also create a named "user library" in Eclipse containing the UIMA Jars, and attach the Javadocs (or
+ optionally, the sources); this named library is saved in the Eclipse workspace. Once created, it can be
+ added to the classpath of newly created Eclipse projects.</para>
+
+ <para>Use the menu option Project → Properties
+ → Java Build Path, and then pick the Libraries tab, and click the Add Library button. Then select
+ User Libraries, click "Next", and pick the library you created for the UIMA Jars.</para>
+
+ <para>To create this library in the workspace,
+ use the same menu picks as above, but after you select the User Libraries and click "Next", you can click the "New Library..."
+ button to define your new library. You use the "Add Jars" button and multi-select all the Jars in the lib directory
+ of the UIMA binary distribution. Then you add the Javadoc attachment for each Jar. The path to use is
+ file:/ -- insert the path to your install of UIMA -- /docs/api. After you do this for the first Jar, you can
+ copy this string to the clipboard and paste it into the rest of the Jars.</para>
+ </section>
+</chapter>
\ No newline at end of file