You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2008/08/28 23:28:16 UTC
svn commit: r689997 [9/32] - in /incubator/uima/uimaj/trunk/uima-docbooks:
./ src/ src/docbook/overview_and_setup/ src/docbook/references/
src/docbook/tools/ src/docbook/tutorials_and_users_guides/
src/docbook/uima/organization/ src/olink/references/
Modified: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.jcas.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.jcas.xml?rev=689997&r1=689996&r2=689997&view=diff
==============================================================================
--- incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.jcas.xml (original)
+++ incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.jcas.xml Thu Aug 28 14:28:14 2008
@@ -1,660 +1,660 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
-"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
-<!ENTITY % uimaents SYSTEM "../entities.ent" >
-%uimaents;
-]>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements. See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership. The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License. You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied. See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<chapter id="ugr.ref.jcas">
- <title>JCas Reference</title>
-
- <para>The CAS is a system for sharing data among annotators, consisting of data structures
- (definable at run time), sets of indexes over these data, metadata describing these, subjects of
- analysis, and a high
- performance serialization/deserialization mechanism. JCas provides Java approach to
- accessing CAS data, and is based on using generated, specific Java classes for each CAS
- type.</para>
-
- <para>Annotators process one CAS per call to their process method. During processing,
- annotators can retrieve feature structures from the passed in CAS, add new ones, modify
- existing ones, and use and update CAS indexes. Of course, an annotator can also use plain
- Java Objects in addition; but the data in the CAS is what is shared among annotators within
- an application.</para>
-
- <para>All the facilities present in the APIs for the CAS are available when using the JCas
- APIs; indeed, you can use the getCas() method to get the corresponding CAS object from a
- JCas (and vice-versa). The JCas APIs often have helper methods that make using this
- interface more convenient for Java developers.</para>
-
- <para>The data in the CAS are typed objects having fields. JCas uses a set of generated Java
- classes (each corresponding to a particular CAS type) with <quote>getter</quote> and
- <quote>setter</quote> methods for the features, plus a constructor so new instances can
- be made. The Java classes don't actually store the data in the class instance;
- instead, the getters and setters forward to the underlying CAS data representation.
- Because of this, applications which use the JCas interface can share data with annotators
- using plain CAS (i.e., not using the JCas approach). </para>
-
- <para>Users can modify the JCas generated
- Java classes by adding fields to them; this allows arbitrary non-CAS data to also be
- represented within the JCas objects, as well; however, the non-CAS data stored in the JCas
- object instances cannot be shared with annotators using the plain CAS.</para>
-
- <para>Data in the CAS initially has no corresponding JCas type instances; these are created
- as needed at the first reference. This means, if your annotator is passed a large CAS having
- millions of CAS feature structures, but you only reference a few of them, and no previously
- created Java JCas object instances were created by upstream annotators, the only Java
- objects that will be created will be those that correspond to the CAS feature structures
- that you reference.</para>
-
- <para>The JCas class Java source files are generated from XML type system descriptions. The
- JCasGen utility does the work of generating the corresponding Java Class Model for the CAS
- types. There are a variety of ways JCasGen can be run; these are described later. You
- include the generated classes with your UIMA component, and you can publish these classes
- for others who might want to use your type system.</para>
-
- <para>The specification of the type system in XML can be written using a conventional text
- editor, an XML editor, or using the Eclipse plug-in that supports editing UIMA
- descriptors.</para>
-
- <para>Changes to the type system are done by changing the XML and regenerating the
- corresponding Java Class Models. Of course, once you've published your type system
- for others to use, you should be careful that any changes you make don't adversely
- impact the users. Additional features can be added to existing types without breaking
- other code.</para>
-
- <para>A separate Java class is generated for each type; this type implements the CAS
- FeatureStructure interface, as well as having the special getters and setters for the
- included features. In the current implementation, an additional helper class per type is
- also generated. The generated Java classes have methods (getters and setters) for the
- fields as defined in the XML type specification. Descriptor comments are reflected in the
- generated Java code as Java-doc style comments.</para>
-
-
- <section id="ugr.ref.jcas.name_spaces">
- <title>Name Spaces</title>
-
- <para>Full Type names consist of a <quote>namespace</quote> prefix dotted with a simple
- name. Namespaces are used like packages to avoid collisions between types that are
- defined by different people at different times. The namespace is used as the Java
- package name for generated Java files.</para>
-
- <para>Type names used in the CAS correspond to the generated Java classes directly. If the
- CAS name is com.myCompany.myProject.ExampleClass, the generated Java class is in the
- package com.myCompany.myProject, and the class is ExampleClass.</para>
-
- <para>
- An exception to this rule is the built-in types
- starting with <literal>uima.cas </literal>and <literal>uima.tcas</literal>;
- these names are mapped to Java packages named
- <literal>org.apache.uima.jcas.cas</literal> and
- <literal>org.apache.uima.jcas.tcas</literal>.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.use_of_description">
- <title>XML description element</title>
- <titleabbrev>Use of XML Description</titleabbrev>
-
- <para>Each XML type specification can have <description ...
- > tags. The description for a type will be copied into the generated Java code, as a
- Javadoc style comment for the class. When writing these descriptions in the XML type
- specification file, you might want to use html tags, as allowed in Javadocs.</para>
-
- <para>If you use the Component Description Editor, you can write the html tags normally,
- for instance, <quote><h1>My Title</h1></quote>. The Component
- Descriptor Editor will take care of coverting the actual descriptor source so that it
- has the leading <quote><</quote> character written as <quote>&lt;</quote>,
- to avoid confusing the XML type specification. For example, <p> would be written
- in the source of the descriptor as &lt;p>. Any characters used in the Javadoc
- comment must of course be from the character set allowed by the XML type specification.
- These specifications often start with the line <?xml version=<quote>1.0</quote>
- encoding=<quote>UTF-8</quote> ?>, which means you can use any of the UTF-8
- characters.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.mapping_built_ins">
- <title>Mapping built-in CAS types to Java types</title>
-
- <para>The built-in primitive CAS types map to Java types as follows:</para>
-
-
- <programlisting>uima.cas.Boolean → boolean
-uima.cas.Byte → byte
-uima.cas.Short → short
-uima.cas.Integer → int
-uima.cas.Long → long
-uima.cas.Float → float
-uima.cas.Double → double
-uima.cas.String → String</programlisting>
-
- </section>
-
- <section id="ugr.ref.jcas.augmenting_generated_code">
- <title>Augmenting the generated Java Code</title>
-
- <para>The Java Class Models generated for each type can be augmented by the user. Typical
- augmentations include adding additional (non-CAS) fields and methods, and import
- statements that might be needed to support these. Commonly added methods include
- additional constructors (having different parameter signatures), and
- implementations of toString().</para>
-
- <para>To augment the code, just edit the generated Java source code for the class named the
- same as the CAS type. Here's an example of an additional method you might add; the
- various getter methods are retrieving values from the instance:</para>
-
-
- <programlisting>public String toString() { // for debugging
- return "XsgParse "
- + getslotName() + ": "
- + getheadWord().getCoveredText()
- + " seqNo: " + getseqNo()
- + ", cAddr: " + id
- + ", size left mods: " + getlMods().size()
- + ", size right mods: " + getrMods().size();
-}</programlisting>
-
- <section id="ugr.ref.jcas.data_persistence">
- <title>Persistence of additional data</title>
- <para>If you add custom instance fields to JCas cover classes, these exist in the JCas cover object instance,
- but not in the CAS itself. Each time a CAS object is referenced (by an iterator, or by following a Feature
- Structure reference), a new JCas cover object instance may be created. If you need these values, you can (a)
- make them CAS values if possible, or (b) hold a reference to the the particular JCas cover object instance in
- your Java code. For some simple cases, setting the the performance tuning option JCAS_CACHE_ENABLE (see
- <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="tug.application.pto"/>)
- to true
- will cause the same JCas cover object that was previously used for a particular CAS Feature Structure to be
- reused. However, this capability won't work when other factors interfere with the ability to reuse the same
- object. Pear isolation is an example of this.</para>
- <para>Because of this, and because the JCas Cache holds on to the JCas cover objects beyond their useful life and
- prevents them from being garbage collected, it is normally recommended running with the
- JCAS_CACHE_ENABLE set to "false".</para>
- </section>
- <section id="ugr.ref.jcas.keeping_augmentations_when_regenerating">
- <title>Keeping hand-coded augmentations when regenerating</title>
-
- <para>If the type system specification changes, you have to re-run the JCasGen
- generator. This will produce updated Java for the Class Models that capture the
- changed specification. If you have previously augmented the source for these Java
- Class Models, your changes must be merged with the newly (re)generated Java source
- code for the Class Models. This can be done by hand, or you can run the version of JCasGen
- that is integrated with Eclipse, and use automatic merging that is done using Eclipse's EMF
- plug-in. You can obtain Eclipse and the needed EMF plug-in from <ulink
- url="http://www.eclipse.org/"/>.</para>
-
- <para>If you run the generator version that works without using Eclipse, it will not
- merge Java source changes you may have previously made; if you want them retained,
- you'll have to do the merging by hand.</para>
-
- <para>The Java source merging will keep additional constructors, additional fields,
- and any changes you may have made to the readObject method (see below). Merging will
- <emphasis>not</emphasis> delete classes in the target corresponding to deleted CAS types, which no longer
- are in the source – you should delete these by hand.</para>
-
- <warning><para>The merging supports Java 1.4 syntactic constructs only.
- JCasGen generates Java 1.4 code, so as long as any code you change here also sticks to
- only Java 1.4 constructs, the merge will work. If you use Java 5 or later specific syntax or constructs, the merge
- operation will likely fail to merge properly.</para></warning>
- </section>
-
- <section id="ugr.ref.jcas.additional_constructors">
- <title>Additional Constructors</title>
-
- <para>Any additional constructors that you add must include the JCas argument. The
- first line of your constructor is required to be</para>
-
-
- <programlisting>this(jcas); // run the standard constructor</programlisting>
-
- <para>where jcas is the passed in JCas reference. If the type you're defining
- extends <literal>uima.tcas.Annotation</literal>, JCasGen will automatically
- add a constructor which takes 2 additional parameters – the begin and end Java
- int values, and set the <literal>uima.tcas.Annotation</literal>
- <literal>begin</literal> and <literal>end</literal> fields.</para>
-
- <para>Here's an example: If you're defining a type MyType which has a
- feature parent, you might make an additional constructor which has an additional
- argument of parent:</para>
-
-
- <programlisting>MyType(JCas jcas, MyType parent) {
- this(jcas); // run the standard constructor
- setParent(parent); // set the parent field from the parameter
-}</programlisting>
-
- <section id="ugr.ref.jcas.using_readobject">
- <title>Using readObject</title>
-
- <para>Fields defined by augmenting the Java Class Model to include additional
- fields represent data that exist for this class in Java, in a local JVM (Java Virtual
- Machine), but do not exist in the CAS when it is passed to other environments (for
- example, passing to a remote annotator).</para>
-
- <para>A problem can arise when new instances are created, perhaps by the underlying
- system when it iterates over an index, which is: how to insure that any additional
- non-CAS fields are properly initialized. To allow for arbitrary initialization
- at instance creation time, an initialization method in the Java Class Model,
- called readObject is used. The generated default for this method is to do nothing,
- but it is one of the methods that you can modify – to do whatever
- initialization might be needed. It is called with 0 parameters, during the
- constructor for the object, after the basic object fields have been set up. It can
- refer to fields in the CAS using the getters and setters, and other fields in the Java
- object instance being initialized.</para>
-
- <para>A pre-existing CAS feature structure could exist if a CAS was being passed to
- this annotator; in this case the JCas system calls the readObject method when
- creating the corresponding Java instance for the first time for the CAS feature
- structure. This can happen at two points: when a new object is being returned from an
- iterator over a CAS index, or a getter method is getting a field for the first time
- whose value is a feature structure.</para>
-
- </section>
- </section>
-
- <section id="ugr.ref.jcas.modifying_generated_items">
- <title>Modifying generated items</title>
-
- <para>The following modifications, if made in generated items, will be preserved when
- regenerating.</para>
-
- <para>The public/private etc. flags associated with methods (getters and setters).
- You can change the default (<quote>public</quote>) if needed.</para>
-
- <para><quote>final</quote> or <quote>abstract</quote> can be added to the type
- itself, with the usual semantics.</para>
-
- </section>
- </section>
-
- <section id="ugr.ref.jcas.merging_types_from_other_specs">
- <title>Merging types</title>
- <titleabbrev>Merging Types</titleabbrev>
- <para>Type definitions are merged by the framework from all the components being run together.</para>
-
- <section id="ugr.ref.jcas.merging_types.aggregates_and_cpes">
- <title>Aggregate AEs and CPEs as sources of types</title>
-
- <para>When running aggregate AEs (Analysis Engines), or a set of AEs in a collection processing engine, the
- UIMA framework will build a merged type system (Note: this <quote>merge</quote> is merging types, not to be
- confused with merging Java source code, discussed above). This merged type system has all the types of every
- component used in the application. In addition, application code can use UIMA Framework APIs to read and merge
- type descriptions, manually.</para>
-
- <para>In most cases, each type system can have its own Java Class Models generated individually, perhaps at an
- earlier time, and the resulting class files (or .jar files containing these class files) can be put in the
- class path to enable JCas.</para>
-
- <para>However, it is possible that there may be multiple definitions of the same CAS type, each of which might
- have different features defined. In this case, the UIMA framework will create a merged type by accumulating
- all the defined features for a particular type into that type's type definition. However, the JCas
- classes for these types are not automatically merged, which can create some issues for JCas users, as
- discussed in the next section.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.merging_types.jcasgen_support">
- <title>JCasGen support for type merging</title>
-
- <para>When there are multiple definitions of the same CAS type with different features defined, then JCasGen
- can be re-run on the merged type system, to create one set of JCas Class definitions for the merged types,
- which can then be shared by all the components.
- Directions for running JCasGen can be found in <olink
- targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>. This is typically done by the person who
- is assembling the Aggregate Analysis Engine or Collection Processing Engine. The resulting merged Java
- Class Model will then contain get and set methods for the complete set of features. These Java classes must
- then be made available in the class path, <emphasis>replacing</emphasis> the pre-merge versions of the
- classes.</para>
-
- <para>If hand-modifications were done to the pre-merge versions of the classes, these must be applied to the
- merged versions, as described in section <xref
- linkend="ugr.ref.jcas.keeping_augmentations_when_regenerating"/>, above. If just one of the
- pre-merge versions had hand-modifications, the source for this hand-modified version can be put into the
- file system where the generated output will go, and the -merge option for JCasGen will automatically
- merge the hand-modifications with the generated code. If
- <emphasis>both</emphasis> pre-merged versions had hand-modifications, then these modifications must
- be manually merged.</para>
-
- <para>An alternative to this is packaging the components as individual PEAR files, each with their own
- version of the JCas generated Classes. The Framework (as of release 2.2) can run PEAR files using the
- pear file descriptor, and supply each component with its particular version of the JCas generated class.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.impact_of_type_merging_on_composability">
- <title>Impact of Type Merging on Composability of Annotators</title>
- <titleabbrev>Type Merging impacts on Composability</titleabbrev>
-
- <para>The recommended approach in UIMA is to build and maintain type systems as separate components, which are
- imported by Annotators. Using this approach, Type Merging does not occur because the Type System and its JCas
- classes are centrally managed and shared by the annotators.</para>
-
- <para>If you do choose to create a JCas Annotator that relies on Type Merging (meaning that your annotator
- redefines a Type that is already in use elsewhere, and adds its own features), this can negatively impact the
- reusability of your annotator, unless your component is used as a PEAR file.</para>
-
- <para>If not using PEAR file packaging isolation capability, whenever
- anyone wants to combine your annotator with another annotator that uses a different version of
- the same Type, they will need to be aware of all of the issues described in the previous section. They will need
- to have the know-how to re-run JCasGen and appropriately set up their classpath to include the merged Java
- classes and to not include the pre-merge classes. (To enable this, you should package these classes
- separately from other .jar files for your annotator, so that they can be more easily excluded.) And, if you
- have done hand-modifications to your JCas classes, the person assembling your annotator will need to
- properly merge those changes. These issues significantly complicate the task of combining annotators, and
- will cause your annotator not to be as easily reusable as other UIMA annotators. </para>
-
- </section>
-
- <section id="ugr.ref.jcas.documentannotation_issues">
- <title>Adding Features to DocumentAnnotation</title>
-
- <para>There is one built-in type, <literal>uima.tcas.DocumentAnnotion</literal>,
- to which applications can add additional features. (All other built-in types
- are "feature-final" and you cannot add additional features to them.) Frequently,
- additional features are added to <literal>uima.tcas.DocumentAnnotion</literal>
- to provide a place to store document-level metadata.</para>
-
- <para>For the same reasons mentioned in the previous section, adding features to
- DocumentAnnotation is not recommended if you are using JCas. Instead, it is recommended
- that you define your own type for storing your document-level metadata. You can create
- an instance of this type and add it to the indexes in the usual way. You can then
- retrieve this instance using the iterator returned from the method<literal>getAllIndexedFS(type)</literal>
- on an instance of a JFSIndexRepository object.
- (As of UIMA v2.1, you do not have to declare a custom index in your descriptor to
- get this to work).</para>
-
- <para>If you do choose to add features to DocumentAnnotation, there are additional issues to
- be aware of. The UIMA SDK provides the JCas cover class for the built-in definition of
- DocumentAnnotation, in the separate jar file <literal>uima-document-annotation.jar</literal>.
- If you add additional features to DocumentAnnotation, you must remove this jar file
- from your classpath, because you will not want to use the default JCas cover class.
- You will need to re-run JCasGen as described in <xref
- linkend="ugr.ref.jcas.merging_types.jcasgen_support"/>. JCasGen will generate a new cover
- class for DocumentAnnotation, which you must place in your classpath in lieu of the version
- in <literal>uima-document-annotation.jar</literal>.</para>
-
- <para>Also, this is the reason why the method <literal>JCas.getDocumentAnnotationFs()</literal> returns
- type <literal>TOP</literal>, rather than type <literal>DocumentAnnotation</literal>. Because the
- <literal>DocumentAnnotation</literal> class can be replaced by users, it is not part of
- <literal>uima-core.jar</literal> and so the core UIMA framework cannot have any references
- to it. In your code, you may <quote>cast</quote> the result of <literal>JCas.getDocumentAnnotationFs()</literal>
- to type <literal>DocumentAnnotation</literal>, which must be available on the classpath either via
- <literal>uima-document-annotation.jar</literal> or by including a custom version that you have generated using JCasGen.</para>
- </section>
-
- </section>
-
- <section id="ugr.ref.jcas.using_within_an_annotator">
- <title>Using JCas within an Annotator</title>
-
- <para>To use JCas within an annotator, you must include the generated Java classes output
- from JCasGen in the class path.</para>
-
- <para>An annotator written using JCas is built by defining a class for the annotator that
- extends JCasAnnotator_ImplBase. The process method for this annotator is
- written</para>
-
- <programlisting>public void process(JCas jcas)
- throws AnalysisEngineProcessException {
- ... // body of annotator goes here
-}</programlisting>
-
- <para>The process method is passed the JCas instance to use as a parameter.</para>
-
- <para>The JCas reference is used throughout the annotator to refer to the particular JCas
- instance being worked on. In pooled or multi-threaded implementations, there will be a
- separate JCas for each thread being (simultaneously) worked on.</para>
-
- <para>You can do several kinds of operations using the JCas APIs: create new feature
- structures (instances of CAS types) (using the new operator), access existing feature
- structures passed to your annotator in the JCas (for example, by using the next method of
- an iterator over the feature structures), get and set the fields of a particular
- instance of a feature structure, and add and remove feature structure instances from
- the CAS indexes. To support iteration, there are also functions to get and use indexes
- and iterators over the instances in a JCas.</para>
-
- <section id="ugr.ref.jcas.new_instances">
- <title>Creating new instances using the Java <quote>new</quote> operator</title>
- <titleabbrev>Creating new instances</titleabbrev>
-
- <para>The new operator creates new instances of JCas types. It takes at least one
- parameter, the JCas instance in which the type is to be created. For example, if there
- was a type Meeting defined, you can create a new instance of it using:
-
- <programlisting>Meeting m = new Meeting(jcas);</programlisting></para>
-
- <para>Other variations of constructors can be added in custom code; the single
- parameter version is the one automatically generated by JCasGen. For types that are
- subtypes of Annotation, JCasGen also generates an additional constructor with
- additional <quote>begin</quote> and <quote>end</quote> arguments.</para>
-
- </section>
- <section id="ugr.ref.jcas.getters_and_setters">
- <title>Getters and Setters</title>
-
- <para>If the CAS type Meeting had fields location and time, you could get or set these by
- using getter or setter methods. These methods have names formed by splicing together
- the word <quote>get</quote> or <quote>set</quote> followed by the field name, with
- the first letter of the field name capitalized. For instance
-
- <programlisting>getLocation()</programlisting></para>
-
- <para>The getter forms take no parameters and return the value of the field; the setter
- forms take one parameter, the value to set into the field, and return void.</para>
-
- <para>There are built-in CAS types for arrays of integers, strings, floats, and
- feature structures. For fields whose values are these types of arrays, there is an
- alternate form of getters and setters that take an additional parameter, written as
- the first parameter, which is the index in the array of an item to get or set.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.obtaining_refs_to_indexes">
- <title>Obtaining references to Indexes</title>
-
- <para>The only way to access instances (not otherwise referenced from other
- instances) passed in to your annotator in its JCas is to use an iterator over some
- index. Indexes in the CAS are specified in the annotator descriptor. Indexes have a
- name; text annotators have a built-in, standard index over all annotations.</para>
-
- <para>To get an index, first get the JFSIndexRepository from the JCas using the method
- jcas.getJFSIndexRepository(). Here are the calls to get indexes:</para>
-
-
- <programlisting>JFSIndexRepository ir = jcas.getJFSIndexRepository();
-
-ir.getIndex(name-of-index) // get the index by its name, a string
-ir.getIndex(name-of-index, Foo.type) // filtered by specific type
-
-ir.getAnnotationIndex() // get AnnotationIndex
-ir.getAnnotationIndex(Foo.type) // filtered by specific type</programlisting>
-
- <para>For convenience, the getAnnotationIndex method is available directly on the JCas object
- instance; the implementation merely forwards to the associated index repository.</para>
-
- <para>Filtering types have to be a subtype of the type specified for this index in its
- index specification. They can be written as either Foo.type or if you have an instance
- of Foo, you can write</para>
-
- <programlisting>fooInstance.jcasType.casType. </programlisting>
-
- <para>Foo is (of course) an example of the name of the type.</para>
-
- </section>
- <section id="ugr.ref.jcas.adding_removing_instances_to_indexes">
- <title>Adding (and removing) instances to (from) indexes</title>
- <titleabbrev>Updating Indexes</titleabbrev>
-
- <para>CAS indexes are maintained automatically by the CAS. But you must add any
- instances of feature structures you want the index to find, to the indexes by using the
- call:</para>
-
- <programlisting>myInstance.addToIndexes();</programlisting>
-
- <para>Do this after setting all features in the instance <emphasis role="bold-italic">which could be used in indexing</emphasis>, for example, in
- determining the sorting order. After indexing, do not change the values of these
- particular features because the indexes will not be updated. If you need to change the
- values, you must first remove the instance from the CAS indexes, change the values,
- and then add the instance back. To remove an instance from the indexes, use the method:
-
- <programlisting>myInstance.removeFromIndexes();</programlisting></para>
- <note><para>It's OK to change feature values which are not used in determining
- sort ordering (or set membership), without removing and re-adding back to the index.
- </para></note>
-
- <para>When writing a Multi-View component, you may need to index instances in multiple
- CAS views. The methods above use the indexes associated with the current JCas object.
- There is a variation of the <literal>addToIndexes / removeFromIndexes</literal> methods which
- takes one argument: a reference to a JCas object holding the view in which you want to
- index this instance.
- <programlisting>myInstance.addToIndexes(anotherJCas)
-myInstance.removeFromIndexes(anotherJCas)</programlisting>
- </para>
-
- <para>
- You can also explicitly add instances to other views using the addFsToIndexes method on
- other JCas (or CAS) objects. For instance, if you had 2 other CAS views (myView1 and
- myView2), in which you wanted to index myInstance, you could write:</para>
-
- <programlisting>myInstance.addToIndexes(); //addToIndexes used with the new operator
-myView1.addFsToIndexes(myInstance); // index myInstance in myView1
-myView2.addFsToIndexes(myInstance); // index myInstance in myView2</programlisting>
-
- <para>
- The rules for determining which index to use with a particular JCas object are designed to
- behave the way most would think they should; if you need specific behavior, you can always
- explicitly designate which view the index adding and removing operations should work on.
- </para>
-
- <para>
- The rules are:
- If the instance is a subtype of AnnotationBase, then the view is the view associated with the
- annotation as specified in the feature holding the view reference in AnnotationBase.
- Otherwise, if the instance was created using the "new" operator, then the view is the view passed to the
- instance's constructor.
- Otherwise, if the instance was created by getting a feature value from some other instance, whose range
- type is a feature structure, then the view is the same as the referring instance.
- Otherwise, if the instance was created by any of the Feature Structure Iterator operations over some index,
- then it is the view associated with the index.
- </para>
- </section>
-
- <section id="ugr.ref.jcas.using_iterators">
- <title>Using Iterators</title>
-
- <para>Once you have an index obtained from the JCas, you can get an iterator from the
- index; here is an example:</para>
-
-
- <programlisting>FSIndexRepository ir = jcas.getFSIndexRepository();
-FSIndex myIndex = ir.getIndex("myIndexName");
-FSIterator myIterator = myIndex.iterator();
-
-JFSIndexRepository ir = jcas.getJFSIndexRepository();
-FSIndex myIndex = ir.getIndex("myIndexName", Foo.type); // filtered
-FSIterator myIterator = myIndex.iterator();</programlisting>
-
- <para>Iterators work like normal Java iterators, but are augmented to support
- additional capabilities. Iterators are described in the CAS Reference, <olink
- targetdoc="&uima_docs_ref;"
- targetptr="ugr.ref.cas.indexes_and_iterators"/>.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.class_loaders">
- <title>Class Loaders in UIMA</title>
-
- <para>The basic concept of a UIMA application includes assembling engines into a flow.
- The application made up of these Engines are run within the UIMA Framework, either by
- the Collection Processing Manager, or by using more basic UIMA Framework
- APIs.</para>
-
- <para>The UIMA Framework exists within a JVM (Java Virtual Machine). A JVM has the
- capability to load multiple applications, in a way where each one is isolated from the
- others, by using a separate class loader for each application. For instance, one set
- of UIMA Framework Classes could be shared by multiple sets of application - specific
- classes, even if these application-specific classes had the same names but were
- different versions.</para>
-
- <section id="ugr.ref.jcas.class_loaders.optional">
- <title>Use of Class Loaders is optional</title>
-
- <para>The UIMA framework will use a specific ClassLoader, based on how
- ResourceManager instances are used. Specific ClassLoaders are only created if
- you specify an ExtensionClassPath as part of the ResourceManager. If you do not
- need to support multiple applications within one UIMA framework within a JVM,
- don't specify an ExtensionClassPath; in this case, the classloader used
- will be the one used to load the UIMA framework - usually the overall application
- class loader.</para>
-
- <para>Of course, you should not run multiple UIMA applications together, in this
- way, if they have different class definitions for the same class name. This
- includes the JCas <quote>cover</quote> classes. This case might arise, for
- instance, if both applications extended
- <literal>uima.tcas.DocumentAnnotation</literal> in differing,
- incompatible ways. Each application would need its own definition of this class,
- but only one could be loaded (unless you specify ExtensionClassPath in the
- ResourceManager which will cause the UIMA application to load its private
- versions of its classes, from its classpath).</para>
- </section>
- </section>
-
- <section id="ugr.ref.jcas.accessing_jcas_objects_outside_uima_components">
- <title>Issues accessing JCas objects outside of UIMA Engine Components</title>
-
- <para>If you are using the ExtensionClassPaths, the JCas cover classes are loaded
- under a class loader created by the ResourceManager part of the UIMA Framework.
- If you reference the same JCas
- classes outside of any UIMA component, for instance, in top level application code,
- the JCas classes used by that top level application code also must be in the class path
- for the application code.</para>
-
- <para>Alternatively, you could do all the JCas processing inside a UIMA component (and do no
- processing using JCas outside of the UIMA pipeline).</para>
-
- </section>
- </section>
-
- <section id="ugr.ref.jcas.setting_up_classpath">
- <title>Setting up Classpath for JCas</title>
-
- <para>The JCas Java classes generated by JCasGen are typically compiled and put into a JAR
- file, which, in turn, is put into the application's class path.</para>
-
- <para>This JAR file must be generated from the application's merged type system.
- This is most conveniently done by opening the top level descriptor used by the
- application in the Component Descriptor Editor tool, and pressing the Run-JCasGen
- button on the Type System Definition page.</para>
-
- </section>
-
- <section id="ugr.ref.jcas.pear_support">
- <title>PEAR isolation</title>
- <para>
- As of version 2.2, the framework supports component descriptors which are PEAR descriptors.
- These descriptors define components plus include information on the class path needed to
- run them. The framework uses the class path information to set up a localized class path, just
- for code running within the PEAR context. This allows PEAR files requiring different
- versions of common code to work well together, even if the class names in the different versions
- have the same names.
- </para>
-
- </section>
-
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../entities.ent" >
+%uimaents;
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.ref.jcas">
+ <title>JCas Reference</title>
+
+ <para>The CAS is a system for sharing data among annotators, consisting of data structures
+ (definable at run time), sets of indexes over these data, metadata describing these, subjects of
+ analysis, and a high
+ performance serialization/deserialization mechanism. JCas provides Java approach to
+ accessing CAS data, and is based on using generated, specific Java classes for each CAS
+ type.</para>
+
+ <para>Annotators process one CAS per call to their process method. During processing,
+ annotators can retrieve feature structures from the passed in CAS, add new ones, modify
+ existing ones, and use and update CAS indexes. Of course, an annotator can also use plain
+ Java Objects in addition; but the data in the CAS is what is shared among annotators within
+ an application.</para>
+
+ <para>All the facilities present in the APIs for the CAS are available when using the JCas
+ APIs; indeed, you can use the getCas() method to get the corresponding CAS object from a
+ JCas (and vice-versa). The JCas APIs often have helper methods that make using this
+ interface more convenient for Java developers.</para>
+
+ <para>The data in the CAS are typed objects having fields. JCas uses a set of generated Java
+ classes (each corresponding to a particular CAS type) with <quote>getter</quote> and
+ <quote>setter</quote> methods for the features, plus a constructor so new instances can
+ be made. The Java classes don't actually store the data in the class instance;
+ instead, the getters and setters forward to the underlying CAS data representation.
+ Because of this, applications which use the JCas interface can share data with annotators
+ using plain CAS (i.e., not using the JCas approach). </para>
+
+ <para>Users can modify the JCas generated
+ Java classes by adding fields to them; this allows arbitrary non-CAS data to also be
+ represented within the JCas objects, as well; however, the non-CAS data stored in the JCas
+ object instances cannot be shared with annotators using the plain CAS.</para>
+
+ <para>Data in the CAS initially has no corresponding JCas type instances; these are created
+ as needed at the first reference. This means, if your annotator is passed a large CAS having
+ millions of CAS feature structures, but you only reference a few of them, and no previously
+ created Java JCas object instances were created by upstream annotators, the only Java
+ objects that will be created will be those that correspond to the CAS feature structures
+ that you reference.</para>
+
+ <para>The JCas class Java source files are generated from XML type system descriptions. The
+ JCasGen utility does the work of generating the corresponding Java Class Model for the CAS
+ types. There are a variety of ways JCasGen can be run; these are described later. You
+ include the generated classes with your UIMA component, and you can publish these classes
+ for others who might want to use your type system.</para>
+
+ <para>The specification of the type system in XML can be written using a conventional text
+ editor, an XML editor, or using the Eclipse plug-in that supports editing UIMA
+ descriptors.</para>
+
+ <para>Changes to the type system are done by changing the XML and regenerating the
+ corresponding Java Class Models. Of course, once you've published your type system
+ for others to use, you should be careful that any changes you make don't adversely
+ impact the users. Additional features can be added to existing types without breaking
+ other code.</para>
+
+ <para>A separate Java class is generated for each type; this type implements the CAS
+ FeatureStructure interface, as well as having the special getters and setters for the
+ included features. In the current implementation, an additional helper class per type is
+ also generated. The generated Java classes have methods (getters and setters) for the
+ fields as defined in the XML type specification. Descriptor comments are reflected in the
+ generated Java code as Java-doc style comments.</para>
+
+
+ <section id="ugr.ref.jcas.name_spaces">
+ <title>Name Spaces</title>
+
+ <para>Full Type names consist of a <quote>namespace</quote> prefix dotted with a simple
+ name. Namespaces are used like packages to avoid collisions between types that are
+ defined by different people at different times. The namespace is used as the Java
+ package name for generated Java files.</para>
+
+ <para>Type names used in the CAS correspond to the generated Java classes directly. If the
+ CAS name is com.myCompany.myProject.ExampleClass, the generated Java class is in the
+ package com.myCompany.myProject, and the class is ExampleClass.</para>
+
+ <para>
+ An exception to this rule is the built-in types
+ starting with <literal>uima.cas </literal>and <literal>uima.tcas</literal>;
+ these names are mapped to Java packages named
+ <literal>org.apache.uima.jcas.cas</literal> and
+ <literal>org.apache.uima.jcas.tcas</literal>.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.use_of_description">
+ <title>XML description element</title>
+ <titleabbrev>Use of XML Description</titleabbrev>
+
+ <para>Each XML type specification can have <description ...
+ > tags. The description for a type will be copied into the generated Java code, as a
+ Javadoc style comment for the class. When writing these descriptions in the XML type
+ specification file, you might want to use html tags, as allowed in Javadocs.</para>
+
+ <para>If you use the Component Description Editor, you can write the html tags normally,
+ for instance, <quote><h1>My Title</h1></quote>. The Component
+ Descriptor Editor will take care of coverting the actual descriptor source so that it
+ has the leading <quote><</quote> character written as <quote>&lt;</quote>,
+ to avoid confusing the XML type specification. For example, <p> would be written
+ in the source of the descriptor as &lt;p>. Any characters used in the Javadoc
+ comment must of course be from the character set allowed by the XML type specification.
+ These specifications often start with the line <?xml version=<quote>1.0</quote>
+ encoding=<quote>UTF-8</quote> ?>, which means you can use any of the UTF-8
+ characters.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.mapping_built_ins">
+ <title>Mapping built-in CAS types to Java types</title>
+
+ <para>The built-in primitive CAS types map to Java types as follows:</para>
+
+
+ <programlisting>uima.cas.Boolean → boolean
+uima.cas.Byte → byte
+uima.cas.Short → short
+uima.cas.Integer → int
+uima.cas.Long → long
+uima.cas.Float → float
+uima.cas.Double → double
+uima.cas.String → String</programlisting>
+
+ </section>
+
+ <section id="ugr.ref.jcas.augmenting_generated_code">
+ <title>Augmenting the generated Java Code</title>
+
+ <para>The Java Class Models generated for each type can be augmented by the user. Typical
+ augmentations include adding additional (non-CAS) fields and methods, and import
+ statements that might be needed to support these. Commonly added methods include
+ additional constructors (having different parameter signatures), and
+ implementations of toString().</para>
+
+ <para>To augment the code, just edit the generated Java source code for the class named the
+ same as the CAS type. Here's an example of an additional method you might add; the
+ various getter methods are retrieving values from the instance:</para>
+
+
+ <programlisting>public String toString() { // for debugging
+ return "XsgParse "
+ + getslotName() + ": "
+ + getheadWord().getCoveredText()
+ + " seqNo: " + getseqNo()
+ + ", cAddr: " + id
+ + ", size left mods: " + getlMods().size()
+ + ", size right mods: " + getrMods().size();
+}</programlisting>
+
+ <section id="ugr.ref.jcas.data_persistence">
+ <title>Persistence of additional data</title>
+ <para>If you add custom instance fields to JCas cover classes, these exist in the JCas cover object instance,
+ but not in the CAS itself. Each time a CAS object is referenced (by an iterator, or by following a Feature
+ Structure reference), a new JCas cover object instance may be created. If you need these values, you can (a)
+ make them CAS values if possible, or (b) hold a reference to the the particular JCas cover object instance in
+ your Java code. For some simple cases, setting the the performance tuning option JCAS_CACHE_ENABLE (see
+ <olink targetdoc="&uima_docs_tutorial_guides;" targetptr="tug.application.pto"/>)
+ to true
+ will cause the same JCas cover object that was previously used for a particular CAS Feature Structure to be
+ reused. However, this capability won't work when other factors interfere with the ability to reuse the same
+ object. Pear isolation is an example of this.</para>
+ <para>Because of this, and because the JCas Cache holds on to the JCas cover objects beyond their useful life and
+ prevents them from being garbage collected, it is normally recommended running with the
+ JCAS_CACHE_ENABLE set to "false".</para>
+ </section>
+ <section id="ugr.ref.jcas.keeping_augmentations_when_regenerating">
+ <title>Keeping hand-coded augmentations when regenerating</title>
+
+ <para>If the type system specification changes, you have to re-run the JCasGen
+ generator. This will produce updated Java for the Class Models that capture the
+ changed specification. If you have previously augmented the source for these Java
+ Class Models, your changes must be merged with the newly (re)generated Java source
+ code for the Class Models. This can be done by hand, or you can run the version of JCasGen
+ that is integrated with Eclipse, and use automatic merging that is done using Eclipse's EMF
+ plug-in. You can obtain Eclipse and the needed EMF plug-in from <ulink
+ url="http://www.eclipse.org/"/>.</para>
+
+ <para>If you run the generator version that works without using Eclipse, it will not
+ merge Java source changes you may have previously made; if you want them retained,
+ you'll have to do the merging by hand.</para>
+
+ <para>The Java source merging will keep additional constructors, additional fields,
+ and any changes you may have made to the readObject method (see below). Merging will
+ <emphasis>not</emphasis> delete classes in the target corresponding to deleted CAS types, which no longer
+ are in the source – you should delete these by hand.</para>
+
+ <warning><para>The merging supports Java 1.4 syntactic constructs only.
+ JCasGen generates Java 1.4 code, so as long as any code you change here also sticks to
+ only Java 1.4 constructs, the merge will work. If you use Java 5 or later specific syntax or constructs, the merge
+ operation will likely fail to merge properly.</para></warning>
+ </section>
+
+ <section id="ugr.ref.jcas.additional_constructors">
+ <title>Additional Constructors</title>
+
+ <para>Any additional constructors that you add must include the JCas argument. The
+ first line of your constructor is required to be</para>
+
+
+ <programlisting>this(jcas); // run the standard constructor</programlisting>
+
+ <para>where jcas is the passed in JCas reference. If the type you're defining
+ extends <literal>uima.tcas.Annotation</literal>, JCasGen will automatically
+ add a constructor which takes 2 additional parameters – the begin and end Java
+ int values, and set the <literal>uima.tcas.Annotation</literal>
+ <literal>begin</literal> and <literal>end</literal> fields.</para>
+
+ <para>Here's an example: If you're defining a type MyType which has a
+ feature parent, you might make an additional constructor which has an additional
+ argument of parent:</para>
+
+
+ <programlisting>MyType(JCas jcas, MyType parent) {
+ this(jcas); // run the standard constructor
+ setParent(parent); // set the parent field from the parameter
+}</programlisting>
+
+ <section id="ugr.ref.jcas.using_readobject">
+ <title>Using readObject</title>
+
+ <para>Fields defined by augmenting the Java Class Model to include additional
+ fields represent data that exist for this class in Java, in a local JVM (Java Virtual
+ Machine), but do not exist in the CAS when it is passed to other environments (for
+ example, passing to a remote annotator).</para>
+
+ <para>A problem can arise when new instances are created, perhaps by the underlying
+ system when it iterates over an index, which is: how to insure that any additional
+ non-CAS fields are properly initialized. To allow for arbitrary initialization
+ at instance creation time, an initialization method in the Java Class Model,
+ called readObject is used. The generated default for this method is to do nothing,
+ but it is one of the methods that you can modify – to do whatever
+ initialization might be needed. It is called with 0 parameters, during the
+ constructor for the object, after the basic object fields have been set up. It can
+ refer to fields in the CAS using the getters and setters, and other fields in the Java
+ object instance being initialized.</para>
+
+ <para>A pre-existing CAS feature structure could exist if a CAS was being passed to
+ this annotator; in this case the JCas system calls the readObject method when
+ creating the corresponding Java instance for the first time for the CAS feature
+ structure. This can happen at two points: when a new object is being returned from an
+ iterator over a CAS index, or a getter method is getting a field for the first time
+ whose value is a feature structure.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.ref.jcas.modifying_generated_items">
+ <title>Modifying generated items</title>
+
+ <para>The following modifications, if made in generated items, will be preserved when
+ regenerating.</para>
+
+ <para>The public/private etc. flags associated with methods (getters and setters).
+ You can change the default (<quote>public</quote>) if needed.</para>
+
+ <para><quote>final</quote> or <quote>abstract</quote> can be added to the type
+ itself, with the usual semantics.</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.ref.jcas.merging_types_from_other_specs">
+ <title>Merging types</title>
+ <titleabbrev>Merging Types</titleabbrev>
+ <para>Type definitions are merged by the framework from all the components being run together.</para>
+
+ <section id="ugr.ref.jcas.merging_types.aggregates_and_cpes">
+ <title>Aggregate AEs and CPEs as sources of types</title>
+
+ <para>When running aggregate AEs (Analysis Engines), or a set of AEs in a collection processing engine, the
+ UIMA framework will build a merged type system (Note: this <quote>merge</quote> is merging types, not to be
+ confused with merging Java source code, discussed above). This merged type system has all the types of every
+ component used in the application. In addition, application code can use UIMA Framework APIs to read and merge
+ type descriptions, manually.</para>
+
+ <para>In most cases, each type system can have its own Java Class Models generated individually, perhaps at an
+ earlier time, and the resulting class files (or .jar files containing these class files) can be put in the
+ class path to enable JCas.</para>
+
+ <para>However, it is possible that there may be multiple definitions of the same CAS type, each of which might
+ have different features defined. In this case, the UIMA framework will create a merged type by accumulating
+ all the defined features for a particular type into that type's type definition. However, the JCas
+ classes for these types are not automatically merged, which can create some issues for JCas users, as
+ discussed in the next section.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.merging_types.jcasgen_support">
+ <title>JCasGen support for type merging</title>
+
+ <para>When there are multiple definitions of the same CAS type with different features defined, then JCasGen
+ can be re-run on the merged type system, to create one set of JCas Class definitions for the merged types,
+ which can then be shared by all the components.
+ Directions for running JCasGen can be found in <olink
+ targetdoc="&uima_docs_tools;" targetptr="ugr.tools.jcasgen"/>. This is typically done by the person who
+ is assembling the Aggregate Analysis Engine or Collection Processing Engine. The resulting merged Java
+ Class Model will then contain get and set methods for the complete set of features. These Java classes must
+ then be made available in the class path, <emphasis>replacing</emphasis> the pre-merge versions of the
+ classes.</para>
+
+ <para>If hand-modifications were done to the pre-merge versions of the classes, these must be applied to the
+ merged versions, as described in section <xref
+ linkend="ugr.ref.jcas.keeping_augmentations_when_regenerating"/>, above. If just one of the
+ pre-merge versions had hand-modifications, the source for this hand-modified version can be put into the
+ file system where the generated output will go, and the -merge option for JCasGen will automatically
+ merge the hand-modifications with the generated code. If
+ <emphasis>both</emphasis> pre-merged versions had hand-modifications, then these modifications must
+ be manually merged.</para>
+
+ <para>An alternative to this is packaging the components as individual PEAR files, each with their own
+ version of the JCas generated Classes. The Framework (as of release 2.2) can run PEAR files using the
+ pear file descriptor, and supply each component with its particular version of the JCas generated class.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.impact_of_type_merging_on_composability">
+ <title>Impact of Type Merging on Composability of Annotators</title>
+ <titleabbrev>Type Merging impacts on Composability</titleabbrev>
+
+ <para>The recommended approach in UIMA is to build and maintain type systems as separate components, which are
+ imported by Annotators. Using this approach, Type Merging does not occur because the Type System and its JCas
+ classes are centrally managed and shared by the annotators.</para>
+
+ <para>If you do choose to create a JCas Annotator that relies on Type Merging (meaning that your annotator
+ redefines a Type that is already in use elsewhere, and adds its own features), this can negatively impact the
+ reusability of your annotator, unless your component is used as a PEAR file.</para>
+
+ <para>If not using PEAR file packaging isolation capability, whenever
+ anyone wants to combine your annotator with another annotator that uses a different version of
+ the same Type, they will need to be aware of all of the issues described in the previous section. They will need
+ to have the know-how to re-run JCasGen and appropriately set up their classpath to include the merged Java
+ classes and to not include the pre-merge classes. (To enable this, you should package these classes
+ separately from other .jar files for your annotator, so that they can be more easily excluded.) And, if you
+ have done hand-modifications to your JCas classes, the person assembling your annotator will need to
+ properly merge those changes. These issues significantly complicate the task of combining annotators, and
+ will cause your annotator not to be as easily reusable as other UIMA annotators. </para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.documentannotation_issues">
+ <title>Adding Features to DocumentAnnotation</title>
+
+ <para>There is one built-in type, <literal>uima.tcas.DocumentAnnotion</literal>,
+ to which applications can add additional features. (All other built-in types
+ are "feature-final" and you cannot add additional features to them.) Frequently,
+ additional features are added to <literal>uima.tcas.DocumentAnnotion</literal>
+ to provide a place to store document-level metadata.</para>
+
+ <para>For the same reasons mentioned in the previous section, adding features to
+ DocumentAnnotation is not recommended if you are using JCas. Instead, it is recommended
+ that you define your own type for storing your document-level metadata. You can create
+ an instance of this type and add it to the indexes in the usual way. You can then
+ retrieve this instance using the iterator returned from the method<literal>getAllIndexedFS(type)</literal>
+ on an instance of a JFSIndexRepository object.
+ (As of UIMA v2.1, you do not have to declare a custom index in your descriptor to
+ get this to work).</para>
+
+ <para>If you do choose to add features to DocumentAnnotation, there are additional issues to
+ be aware of. The UIMA SDK provides the JCas cover class for the built-in definition of
+ DocumentAnnotation, in the separate jar file <literal>uima-document-annotation.jar</literal>.
+ If you add additional features to DocumentAnnotation, you must remove this jar file
+ from your classpath, because you will not want to use the default JCas cover class.
+ You will need to re-run JCasGen as described in <xref
+ linkend="ugr.ref.jcas.merging_types.jcasgen_support"/>. JCasGen will generate a new cover
+ class for DocumentAnnotation, which you must place in your classpath in lieu of the version
+ in <literal>uima-document-annotation.jar</literal>.</para>
+
+ <para>Also, this is the reason why the method <literal>JCas.getDocumentAnnotationFs()</literal> returns
+ type <literal>TOP</literal>, rather than type <literal>DocumentAnnotation</literal>. Because the
+ <literal>DocumentAnnotation</literal> class can be replaced by users, it is not part of
+ <literal>uima-core.jar</literal> and so the core UIMA framework cannot have any references
+ to it. In your code, you may <quote>cast</quote> the result of <literal>JCas.getDocumentAnnotationFs()</literal>
+ to type <literal>DocumentAnnotation</literal>, which must be available on the classpath either via
+ <literal>uima-document-annotation.jar</literal> or by including a custom version that you have generated using JCasGen.</para>
+ </section>
+
+ </section>
+
+ <section id="ugr.ref.jcas.using_within_an_annotator">
+ <title>Using JCas within an Annotator</title>
+
+ <para>To use JCas within an annotator, you must include the generated Java classes output
+ from JCasGen in the class path.</para>
+
+ <para>An annotator written using JCas is built by defining a class for the annotator that
+ extends JCasAnnotator_ImplBase. The process method for this annotator is
+ written</para>
+
+ <programlisting>public void process(JCas jcas)
+ throws AnalysisEngineProcessException {
+ ... // body of annotator goes here
+}</programlisting>
+
+ <para>The process method is passed the JCas instance to use as a parameter.</para>
+
+ <para>The JCas reference is used throughout the annotator to refer to the particular JCas
+ instance being worked on. In pooled or multi-threaded implementations, there will be a
+ separate JCas for each thread being (simultaneously) worked on.</para>
+
+ <para>You can do several kinds of operations using the JCas APIs: create new feature
+ structures (instances of CAS types) (using the new operator), access existing feature
+ structures passed to your annotator in the JCas (for example, by using the next method of
+ an iterator over the feature structures), get and set the fields of a particular
+ instance of a feature structure, and add and remove feature structure instances from
+ the CAS indexes. To support iteration, there are also functions to get and use indexes
+ and iterators over the instances in a JCas.</para>
+
+ <section id="ugr.ref.jcas.new_instances">
+ <title>Creating new instances using the Java <quote>new</quote> operator</title>
+ <titleabbrev>Creating new instances</titleabbrev>
+
+ <para>The new operator creates new instances of JCas types. It takes at least one
+ parameter, the JCas instance in which the type is to be created. For example, if there
+ was a type Meeting defined, you can create a new instance of it using:
+
+ <programlisting>Meeting m = new Meeting(jcas);</programlisting></para>
+
+ <para>Other variations of constructors can be added in custom code; the single
+ parameter version is the one automatically generated by JCasGen. For types that are
+ subtypes of Annotation, JCasGen also generates an additional constructor with
+ additional <quote>begin</quote> and <quote>end</quote> arguments.</para>
+
+ </section>
+ <section id="ugr.ref.jcas.getters_and_setters">
+ <title>Getters and Setters</title>
+
+ <para>If the CAS type Meeting had fields location and time, you could get or set these by
+ using getter or setter methods. These methods have names formed by splicing together
+ the word <quote>get</quote> or <quote>set</quote> followed by the field name, with
+ the first letter of the field name capitalized. For instance
+
+ <programlisting>getLocation()</programlisting></para>
+
+ <para>The getter forms take no parameters and return the value of the field; the setter
+ forms take one parameter, the value to set into the field, and return void.</para>
+
+ <para>There are built-in CAS types for arrays of integers, strings, floats, and
+ feature structures. For fields whose values are these types of arrays, there is an
+ alternate form of getters and setters that take an additional parameter, written as
+ the first parameter, which is the index in the array of an item to get or set.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.obtaining_refs_to_indexes">
+ <title>Obtaining references to Indexes</title>
+
+ <para>The only way to access instances (not otherwise referenced from other
+ instances) passed in to your annotator in its JCas is to use an iterator over some
+ index. Indexes in the CAS are specified in the annotator descriptor. Indexes have a
+ name; text annotators have a built-in, standard index over all annotations.</para>
+
+ <para>To get an index, first get the JFSIndexRepository from the JCas using the method
+ jcas.getJFSIndexRepository(). Here are the calls to get indexes:</para>
+
+
+ <programlisting>JFSIndexRepository ir = jcas.getJFSIndexRepository();
+
+ir.getIndex(name-of-index) // get the index by its name, a string
+ir.getIndex(name-of-index, Foo.type) // filtered by specific type
+
+ir.getAnnotationIndex() // get AnnotationIndex
+ir.getAnnotationIndex(Foo.type) // filtered by specific type</programlisting>
+
+ <para>For convenience, the getAnnotationIndex method is available directly on the JCas object
+ instance; the implementation merely forwards to the associated index repository.</para>
+
+ <para>Filtering types have to be a subtype of the type specified for this index in its
+ index specification. They can be written as either Foo.type or if you have an instance
+ of Foo, you can write</para>
+
+ <programlisting>fooInstance.jcasType.casType. </programlisting>
+
+ <para>Foo is (of course) an example of the name of the type.</para>
+
+ </section>
+ <section id="ugr.ref.jcas.adding_removing_instances_to_indexes">
+ <title>Adding (and removing) instances to (from) indexes</title>
+ <titleabbrev>Updating Indexes</titleabbrev>
+
+ <para>CAS indexes are maintained automatically by the CAS. But you must add any
+ instances of feature structures you want the index to find, to the indexes by using the
+ call:</para>
+
+ <programlisting>myInstance.addToIndexes();</programlisting>
+
+ <para>Do this after setting all features in the instance <emphasis role="bold-italic">which could be used in indexing</emphasis>, for example, in
+ determining the sorting order. After indexing, do not change the values of these
+ particular features because the indexes will not be updated. If you need to change the
+ values, you must first remove the instance from the CAS indexes, change the values,
+ and then add the instance back. To remove an instance from the indexes, use the method:
+
+ <programlisting>myInstance.removeFromIndexes();</programlisting></para>
+ <note><para>It's OK to change feature values which are not used in determining
+ sort ordering (or set membership), without removing and re-adding back to the index.
+ </para></note>
+
+ <para>When writing a Multi-View component, you may need to index instances in multiple
+ CAS views. The methods above use the indexes associated with the current JCas object.
+ There is a variation of the <literal>addToIndexes / removeFromIndexes</literal> methods which
+ takes one argument: a reference to a JCas object holding the view in which you want to
+ index this instance.
+ <programlisting>myInstance.addToIndexes(anotherJCas)
+myInstance.removeFromIndexes(anotherJCas)</programlisting>
+ </para>
+
+ <para>
+ You can also explicitly add instances to other views using the addFsToIndexes method on
+ other JCas (or CAS) objects. For instance, if you had 2 other CAS views (myView1 and
+ myView2), in which you wanted to index myInstance, you could write:</para>
+
+ <programlisting>myInstance.addToIndexes(); //addToIndexes used with the new operator
+myView1.addFsToIndexes(myInstance); // index myInstance in myView1
+myView2.addFsToIndexes(myInstance); // index myInstance in myView2</programlisting>
+
+ <para>
+ The rules for determining which index to use with a particular JCas object are designed to
+ behave the way most would think they should; if you need specific behavior, you can always
+ explicitly designate which view the index adding and removing operations should work on.
+ </para>
+
+ <para>
+ The rules are:
+ If the instance is a subtype of AnnotationBase, then the view is the view associated with the
+ annotation as specified in the feature holding the view reference in AnnotationBase.
+ Otherwise, if the instance was created using the "new" operator, then the view is the view passed to the
+ instance's constructor.
+ Otherwise, if the instance was created by getting a feature value from some other instance, whose range
+ type is a feature structure, then the view is the same as the referring instance.
+ Otherwise, if the instance was created by any of the Feature Structure Iterator operations over some index,
+ then it is the view associated with the index.
+ </para>
+ </section>
+
+ <section id="ugr.ref.jcas.using_iterators">
+ <title>Using Iterators</title>
+
+ <para>Once you have an index obtained from the JCas, you can get an iterator from the
+ index; here is an example:</para>
+
+
+ <programlisting>FSIndexRepository ir = jcas.getFSIndexRepository();
+FSIndex myIndex = ir.getIndex("myIndexName");
+FSIterator myIterator = myIndex.iterator();
+
+JFSIndexRepository ir = jcas.getJFSIndexRepository();
+FSIndex myIndex = ir.getIndex("myIndexName", Foo.type); // filtered
+FSIterator myIterator = myIndex.iterator();</programlisting>
+
+ <para>Iterators work like normal Java iterators, but are augmented to support
+ additional capabilities. Iterators are described in the CAS Reference, <olink
+ targetdoc="&uima_docs_ref;"
+ targetptr="ugr.ref.cas.indexes_and_iterators"/>.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.class_loaders">
+ <title>Class Loaders in UIMA</title>
+
+ <para>The basic concept of a UIMA application includes assembling engines into a flow.
+ The application made up of these Engines are run within the UIMA Framework, either by
+ the Collection Processing Manager, or by using more basic UIMA Framework
+ APIs.</para>
+
+ <para>The UIMA Framework exists within a JVM (Java Virtual Machine). A JVM has the
+ capability to load multiple applications, in a way where each one is isolated from the
+ others, by using a separate class loader for each application. For instance, one set
+ of UIMA Framework Classes could be shared by multiple sets of application - specific
+ classes, even if these application-specific classes had the same names but were
+ different versions.</para>
+
+ <section id="ugr.ref.jcas.class_loaders.optional">
+ <title>Use of Class Loaders is optional</title>
+
+ <para>The UIMA framework will use a specific ClassLoader, based on how
+ ResourceManager instances are used. Specific ClassLoaders are only created if
+ you specify an ExtensionClassPath as part of the ResourceManager. If you do not
+ need to support multiple applications within one UIMA framework within a JVM,
+ don't specify an ExtensionClassPath; in this case, the classloader used
+ will be the one used to load the UIMA framework - usually the overall application
+ class loader.</para>
+
+ <para>Of course, you should not run multiple UIMA applications together, in this
+ way, if they have different class definitions for the same class name. This
+ includes the JCas <quote>cover</quote> classes. This case might arise, for
+ instance, if both applications extended
+ <literal>uima.tcas.DocumentAnnotation</literal> in differing,
+ incompatible ways. Each application would need its own definition of this class,
+ but only one could be loaded (unless you specify ExtensionClassPath in the
+ ResourceManager which will cause the UIMA application to load its private
+ versions of its classes, from its classpath).</para>
+ </section>
+ </section>
+
+ <section id="ugr.ref.jcas.accessing_jcas_objects_outside_uima_components">
+ <title>Issues accessing JCas objects outside of UIMA Engine Components</title>
+
+ <para>If you are using the ExtensionClassPaths, the JCas cover classes are loaded
+ under a class loader created by the ResourceManager part of the UIMA Framework.
+ If you reference the same JCas
+ classes outside of any UIMA component, for instance, in top level application code,
+ the JCas classes used by that top level application code also must be in the class path
+ for the application code.</para>
+
+ <para>Alternatively, you could do all the JCas processing inside a UIMA component (and do no
+ processing using JCas outside of the UIMA pipeline).</para>
+
+ </section>
+ </section>
+
+ <section id="ugr.ref.jcas.setting_up_classpath">
+ <title>Setting up Classpath for JCas</title>
+
+ <para>The JCas Java classes generated by JCasGen are typically compiled and put into a JAR
+ file, which, in turn, is put into the application's class path.</para>
+
+ <para>This JAR file must be generated from the application's merged type system.
+ This is most conveniently done by opening the top level descriptor used by the
+ application in the Component Descriptor Editor tool, and pressing the Run-JCasGen
+ button on the Type System Definition page.</para>
+
+ </section>
+
+ <section id="ugr.ref.jcas.pear_support">
+ <title>PEAR isolation</title>
+ <para>
+ As of version 2.2, the framework supports component descriptors which are PEAR descriptors.
+ These descriptors define components plus include information on the class path needed to
+ run them. The framework uses the class path information to set up a localized class path, just
+ for code running within the PEAR context. This allows PEAR files requiring different
+ versions of common code to work well together, even if the class names in the different versions
+ have the same names.
+ </para>
+
+ </section>
+
</chapter>
\ No newline at end of file
Propchange: incubator/uima/uimaj/trunk/uima-docbooks/src/docbook/references/ref.jcas.xml
------------------------------------------------------------------------------
svn:eol-style = native