You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2017/07/18 15:47:01 UTC

svn commit: r1802314 - in /uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook: ref.cas.xml ref.xml.component_descriptor.xml

Author: schor
Date: Tue Jul 18 15:47:01 2017
New Revision: 1802314

URL: http://svn.apache.org/viewvc?rev=1802314&view=rev
Log:
[UIMA-5495] doc UIMA Set index use of implicit type key, update general docs around indexes and iterators

Modified:
    uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.cas.xml
    uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml

Modified: uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.cas.xml
URL: http://svn.apache.org/viewvc/uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.cas.xml?rev=1802314&r1=1802313&r2=1802314&view=diff
==============================================================================
--- uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.cas.xml (original)
+++ uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.cas.xml Tue Jul 18 15:47:01 2017
@@ -149,14 +149,14 @@ under the License.
     <section id="ugr.ref.cas.creating_using_indexes">
       <title>Creating and using indexes</title>
       
-      <para>Each view of a CAS provides a set of indexes for that view. Instances of feature
-        structures can be added to a view&apos;s indexes. These indexes provide
-        the only way for other annotators to locate existing data in the CAS. The only way for an
-        annotator to use data that another annotator has created is by using an index (or the
+      <para>Each view of a CAS provides a set of indexes for that view. Instances of Types (that is, Feature
+        Structures) can be added to a view&apos;s indexes. These indexes provide
+        a way for annotators to locate existing data in the CAS, using a specific index (or the
         method <literal>getAllIndexedFS</literal> of the object <literal>FSIndexRepository</literal>) to
-        retrieve feature structures the first annotator created. If you want the data you
-        create to be visible to other annotators, you must explicitly call methods which
-        add it to the indexes &mdash; you must index it.</para>
+        retrieve the Feature Structures that were previously created. If you want the data you
+        Newly created Feature Structures are not automatically added to the indexes; you choose which
+        Feature Structures to add and use one of several APIs to add them. 
+        </para>
       
       <para>Indexes are named and are associated with a CAS Type; they are used to index
         instances of that CAS type (including instances of that type&apos;s subtypes). If
@@ -169,6 +169,34 @@ under the License.
         query for indexes for that view. Once you have a handle to an index, you can get
         information about the feature structures in the index, the size of the index, as well
         as an iterator over the feature structures.</para>
+        
+      <para>There are three kinds of indexes:
+        <itemizedlist spacing="compact">
+          <listitem>
+            <para>bag - no ordering</para>
+          </listitem>
+          <listitem>
+            <para>set - uses a user-specfied set of keys to define equality; holds one instance of the set of equal items.</para>
+          </listitem>
+          <listitem>
+            <para>sorted - uses a user-specified set of keys to define ordering.</para>
+          </listitem>          
+        </itemizedlist>
+      </para>
+      
+      <para>For set indexes, the comparator keys are augmented with an implicit additional field - the type of the
+        feature structure.  This means that an index over Annotations, having subtype Token, and a key of the "begin" value,
+        will behave as follows:
+        
+        <itemizedlist>
+          <listitem><para>If you make two Tokens (or two Annotations), both having a begin value of 17, and add both of them to the indexes,
+            only one of them will be in the index.</para>
+          </listitem>
+          <listitem><para>If you make 1 Token and 1 Annotation, both having a begin value of 17, and add both of them to the indexes,
+            both of them will be in the index (because the types are different).
+          </para></listitem>
+        </itemizedlist> 
+      </para>
       
       <para>Indexes are defined in the XML descriptor metadata for the application. Each CAS
         View has its own, separate instantiation of indexes based on these definitions, 
@@ -178,29 +206,34 @@ under the License.
         belongs, within just the view's repository. You can specify different repositories
         (associated with different CAS views) to use; a given Feature Structure instance 
         may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).</para>
-      
-      <para>Iterators allow you to enumerate the feature structures in an index.  FS iterators
-        provide two kinds of APIs: the regular Java iterator API, and a specific FS iterator API
+
+      <para>Indexes implement the Iterable interface, so you may use the Java enhanced for loop to iterate over them.</para>
+            
+      <para>You can also get iterators from indexes; 
+        iterators allow you to enumerate the feature structures in an index.  There are two kinds of iterators supported:
+        the regular Java iterator API, and a specific FS iterator API
         where the usual Java iterator APIs (<literal>hasNext()</literal> and <literal>next()</literal>)
-        are replaced by <literal>isValid()</literal>, <literal>moveToNext()</literal> (which does
+        are augmented by <literal>isValid()</literal>, <literal>moveToNext() / moveToPrevious()</literal> (which does
         not return an element) and <literal>get()</literal>.  Which API style you use is up to you,
         but we do not recommend mixing the styles as the results are sometimes unexpected.  If you
         just want to iterate over an index from start to finish, either style is equally appropriate.
         If you also use <literal>moveTo(FeatureStructure fs)</literal> and 
         <literal>moveToPrevious()</literal>, it is better to use the special FS iterator style.
       </para>
+      
       <note><para>The reason to not mix these styles is that you might be thinking that
         next() followed by moveToPrevious() would always work.  This is not true, because
         next() returns the "current" element, and advances to the next position, which might be
-        beyond the last element.  At that point, the interator becomes "invalid", and by the iterator
-        contracts, moveToNext and moveToPrevious are not allowed on "invalid" iterators; 
-        when an iterator is not valid, all bets are off.  But you can
+        beyond the last element.  At that point, the iterator becomes "invalid", and 
+        moveToNext and moveToPrevious no longer move the iterator.  But you can
         call these methods on the iterator &mdash; moveToFirst(), moveToLast(), or moveTo(FS) &mdash; to reset it.</para></note>
       
       <para>Indexes are created by specifying them in the annotator&apos;s or
         aggregate&apos;s resource descriptor. An index specification includes its name,
-        the CAS type being indexed, the kind of index it is, and an (optional) ordering
-        relation on the feature structures to be indexed. At startup time, all index
+        the CAS type being indexed, the kind (bag, set or sorted) of index it is, and an (optional) set of keys.
+        The keys are used for set and sorted indexes, and specify what values are used for 
+        ordering, or (for sets) what values are used to determine set equality. 
+        When a CAS pipeline is created, all index
         specifications are combined; duplicate definitions (having the same name) are
         allowed only if their definitions are the same. </para>
       
@@ -223,7 +256,7 @@ under the License.
       <para>The ordering relation used by this index is to first order by the value of the
         <quote>begin</quote> features (in ascending order) and then by the value of the
         <quote>end</quote> feature (in descending order), and then, finally, by the 
-        Type Priority. This ordering insures that
+        Type Priority (if any type priorities are specified). This ordering insures that
         longer annotations starting at the same spot come before shorter ones. For Subjects
         of Analysis other than Text, this may not be an appropriate index.</para>
       
@@ -841,7 +874,7 @@ aPerson.setStringValue(lastNameFeature,
     <title>Indexes and Iterators</title>
     
     <para>Each CAS can have many indexes associated with it; each CAS View contains 
-      a complete set of instantions of the indexes.   Each index is represented by an
+      a complete set of instantiations of the indexes.   Each index is represented by an
       instance of the type org.apache.uima.cas.FSIndex. You use the object
       org.apache.uima.cas.FSIndexRepository, accessible via a method on a CAS object, to
       retrieve instances of indexes. There are methods that let you select the index
@@ -864,18 +897,18 @@ aPerson.setStringValue(lastNameFeature,
     <para>In UIMA V3, Feature structures may be added to or removed from indexes while iterating
       over them.  If this happens, any iterators already created will continue to operate over the
       before-modification version of the index, unless or until the iterator is re-synchronized with the current
-      value of the index via one of 3 iterator API calls: moveToFirst, moveToLast, or moveTo(FeatureStructure).
+      value of the index via one of the following specific 3 iterator API calls: 
+      moveToFirst, moveToLast, or moveTo(FeatureStructure).
       ConcurrentModificationException is no longer thrown in UIMA v3.
     </para>
     
     <para>Feature structures being iterated over may have features which are used as the "keys" of an index, updated.
-    If this is done, UIMA, to prevent index corruption, will protect the indexes by automatically removing the 
+    If this is done, UIMA will protect the indexes (to prevent index corruption) by automatically removing the 
     Feature Structure from the indexes, 
-    updating the field, and adding the FS back to the index.  This recovery operation, because it updates the index, 
-    no longer makes the iterator throw a ConcurrentModificationException if the iterator is incremented or decremented;
+    updating the field, and adding the FS back to the index (possibly in a new position).  
+    This automatic remove / add-back operation no longer makes the iterator throw a ConcurrentModificationException
+    (as it did in UIMA Version 2) if the iterator is incremented or decremented;
     existing iterators will continue to operate as if no index modification occurred.
-    The automatic removing and add-back of Feature Structures that occurs when features used
-      in index definitions are updated occurs transparently.
     </para>   
       
     <!-- <para>As of version 2.7.0, a new method on FSIndex, <code>withSnapshotIterators(),</code> 
@@ -897,7 +930,8 @@ aPerson.setStringValue(lastNameFeature,
         annotations in the order in which they appear in the document. Annotations are sorted first by increasing
         <literal>begin</literal> position. Ties are then broken by <emphasis>decreasing</emphasis>
         <literal>end</literal> position (so that longer annotations come first). Annotations that match in both
-        their <literal>begin</literal> and <literal>end</literal> features are sorted using the Type Priority
+        their <literal>begin</literal> and <literal>end</literal> features are sorted using the Type Priority,
+        if any are defined
         (see <olink targetdoc="&uima_docs_ref;"
           targetptr="ugr.ref.xml.component_descriptor.aes.type_priority"/> )</para>
     </section>
@@ -905,26 +939,51 @@ aPerson.setStringValue(lastNameFeature,
     
     <section id="ugr.ref.cas.index.adding_to_indexes">
       <title>Adding Feature Structures to the Indexes</title>
-      
-      <para>Feature Structures are added to the indexes by calling the
-        <literal>FSIndexRepository.addFS(FeatureStructure)</literal> method or the equivalent convenience
-        method <literal>CAS.addFsToIndexes(FeatureStructure)</literal>. This adds the Feature Structure to
+
+      <para>Feature Structures are added to the indexes by various APIs. These add the Feature Structure to
         <emphasis>all</emphasis> indexes that are defined for the type of that FeatureStructure (or any of its
-        supertypes). Note that you should not add a Feature Structure to the indexes until you have set values for all
+        supertypes), in a particular view. 
+        Note that you should not add a Feature Structure to the indexes until you have set values for all
         of the features that may be used as sort keys in an index.</para>
+      
+      <para>There are multiple APIs for adding FSs to the index.
+        <itemizedlist>
+          <listitem><para>(preferred) myFeatureStructure.addToIndexes(). This adds the feature structure instance to the
+          view in which it was originally created.</para>
+          </listitem>
+          <listitem><para>(preferred) myFeatureStructure.addToIndexes(JCas or CAS). This adds the feature structure instance to the
+            view represented by the argument.</para>
+          </listitem>
+          <listitem><para>(older form) casView.addFsToIndexes(myFeatureStructure) or jcasView.addFsToIndexes(myFeatureStructure). 
+            This adds the feature structure instance to the
+            view represented by the cas (or jcas).</para>
+          </listitem>
+          <listitem><para>(older form) fsIndexRepositoryView.addFsToIndexes(myFeatureStructure). 
+            This adds the feature structure instance to the
+            view represented by the fsIndexRepository instance.</para>
+          </listitem>
+        </itemizedlist>
+      </para>
     </section>
         
     <section id="ugr.ref.cas.index.iterators">
-      <title>Iterators</title>
+      <title>Iterators over UIMA Indexes</title>
+
       
       <para>Iterators are objects of class <literal>org.apache.uima.cas.FSIterator.</literal> This class
         extends <literal>java.util.Iterator</literal> and implements the normal Java iterator methods, plus
-        additional ones that allow moving both forwards and backwards.</para>  
+        additional ones that allow moving both forwards and backwards.</para>
+        
+      <para>UIMA Indexes implement iterable, so you can use the index directly in a Java extended for loop.</para>
+        
     </section>
     
     <section id="ugr.ref.cas.index.annotation_index">
       <title>Special iterators for Annotation types</title>
       
+      <para>Note: we recommend using the UIMA V3 select framework, instead of the following.
+        It implements all of the following capabilities, and more, in a uniform manner.</para>
+      
       <para>The built-in index over the <literal>uima.tcas.Annotation</literal> type
         named <quote><literal>AnnotationIndex</literal></quote> has additional
         capabilities. To use them, you first get a reference to this built-in index using
@@ -960,6 +1019,9 @@ AnnotationIndex idx = aCAS.getAnnotation
     <section id="ugr.ref.cas.index.constraints_and_filtered_iterators">
       <title>Constraints and Filtered iterators</title>
       
+      <para>Note: for new code, consider using the select framework plus Streams, instead of
+        the following.</para>
+        
       <para>There is a set of API calls that build constraint objects. These objects can be
         used directly to test if a particular feature structure matches (satisfies) the
         constraint, or they can be passed to the createFilteredIterator method to create an

Modified: uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml
URL: http://svn.apache.org/viewvc/uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml?rev=1802314&r1=1802313&r2=1802314&view=diff
==============================================================================
--- uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml (original)
+++ uima/uv3/uimaj-v3/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml Tue Jul 18 15:47:01 2017
@@ -691,7 +691,8 @@ uima.tcas.Annotation.</programlisting>
               Information in the CAS is always accessed through an index. There is a built-in default annotation
               index declared which can be used to access instances of type
               <literal>uima.tcas.Annotation</literal> (or its subtypes), sorted based on their
-              <literal>begin</literal> and <literal>end</literal> features. For all other types, there is a
+              <literal>begin</literal> and <literal>end</literal> features, and the type priority ordering (if specified). 
+              For all other types, there is a
               default, unsorted (bag) index. If there is a need for a specialized index it must be declared in this
               element of the descriptor. See <olink targetdoc="&uima_docs_ref;"
                 targetptr="ugr.ref.cas.indexes_and_iterators"/> for details on FS indexes.</para>