You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2014/12/12 06:48:40 UTC

svn commit: r1644834 - in /uima/uimaj/trunk/uima-docbook-references/src/docbook: ref.cas.xml ref.config.xml ref.jcas.xml ref.xml.component_descriptor.xml

Author: schor
Date: Fri Dec 12 05:48:40 2014
New Revision: 1644834

URL: http://svn.apache.org/r1644834
Log:
[UIMA-4146][UIMA-4135] support snapshot iterators, support for modifying indexed FSs

Modified:
    uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.cas.xml
    uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.config.xml
    uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.jcas.xml
    uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml

Modified: uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.cas.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.cas.xml?rev=1644834&r1=1644833&r2=1644834&view=diff
==============================================================================
--- uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.cas.xml (original)
+++ uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.cas.xml Fri Dec 12 05:48:40 2014
@@ -178,11 +178,11 @@ under the License.
       <para>Indexes are defined in the XML descriptor metadata for the application. Each CAS
         View has its own, separate instantiation of indexes based on these definitions, 
         kept in the view's index repository. When you obtain an index, it is always from a
-        particular CAS view. When you index an item, it is always added to all indexes where it
-        belongs, within just one repository. You can specify different repositories
+        particular CAS view's index repository. 
+        When you index an item, it is always added to all indexes where it
+        belongs, within just the view's repository. You can specify different repositories
         (associated with different CAS views) to use; a given Feature Structure instance 
-        may be indexed in more
-        than one CAS View.</para>
+        may be indexed in more than one CAS View (unless it is a subtype of AnnotationBase).</para>
       
       <para>Iterators allow you to enumerate the feature structures in an index.  FS iterators
         provide two kinds of APIs: the regular Java iterator API, and a specific FS iterator API
@@ -215,7 +215,7 @@ under the License.
         another feature structure, which is indexed, or through a chain of these).</para>
       
       <para>The framework defines an unnamed bag index which indexes all types.  The
-      only access provided for this index is the getAllIndexedFS(type) method on the
+        only access provided for this index is the getAllIndexedFS(type) method on the
         index repository, which returns an iterator over all indexed instances of the
         specified type (including its subtypes) for that CAS View.
       </para>
@@ -227,7 +227,8 @@ under the License.
       
       <para>The ordering relation used by this index is to first order by the value of the
         <quote>begin</quote> features (in ascending order) and then by the value of the
-        <quote>end</quote> feature (in descending order). This ordering insures that
+        <quote>end</quote> feature (in descending order), and then, finally, by the 
+        Type Priority. This ordering insures that
         longer annotations starting at the same spot come before shorter ones. For Subjects
         of Analysis other than Text, this may not be an appropriate index.</para>
       
@@ -671,13 +672,39 @@ private Feature initFeature(String featN
     on particular values of the features of an indexed type, if you change the values of
     those features being used in the index key, the correct way to do this is to
     <orderedlist spacing="compact">
-      <listitem><para>remove the item from all indexes where it is indexed, in all views
+      <listitem><para>completely remove the item from all indexes where it is indexed, in all views
       where it is indexed,</para>       
       </listitem>
       <listitem><para>update the value of the features being used as keys,</para></listitem>
       <listitem><para>add the item back to the indexes, in all views.</para></listitem> 
     </orderedlist></para>
     
+    <para>To completely remove an item from the indices may entail removing it multiple times, if it was 
+    added multiple times, and (as of version 2.7.0) the JVM global property 
+    <code>uima.allow_duplicate_add_to_indices</code> is true.</para>
+    
+    <para>Because this is complex and can be optimized by the framework, the framework (as of version 2.7.0) 
+    supports the <code>protectIndices</code> method in the CAS to make this convenient for users.  Here's two ways
+    of using this, one with a try / finally and the other with a Runnable:
+            <programlisting>// an approach using try / finally
+AutoCloseable ac = my_cas.protectIndices();
+try {
+   ...  arbitrary user code which updates features which may be "keys" in one or more indices
+} finally {
+  ac.close();
+}
+
+// if Java 8 is in use, this can be written using the auto-close feature of try:
+
+try (AutoCloseable ac = my_cas.protectIndices()) {
+   ...  arbitrary user code which updates features which may be "keys" in one or more indices
+}
+
+// an approach using a Runnable, written in Java 8 lambda syntax
+my_cas.protectIndices(() -> {
+  ... arbitrary user code updating "key" features, but no checked exceptions are permitted
+  });</programlisting></para>
+    
     <para>As of version 2.4.1, there are two methods you can use on an index repository 
     to efficiently bulk-remove all
     instances of particular types of feature structures from a particular view.  One of these, 
@@ -742,12 +769,20 @@ aPerson.setStringValue(lastNameFeature,
         targetptr="ugr.ref.xml.component_descriptor.aes.index"/>.</para>
        
     <para>Feature structures should not be added to or removed from indexes while iterating
-      over them; a ConcurrentModificationException is thrown when this is detected.
+      over them; a ConcurrentModificationException is thrown when this is detected (but see the following paragraph).
       Certain operations are allowed with the iterators after modification, which can
       <quote>reset</quote> this condition, such as moving to beginning, end, or moving to a
       particular feature structure. So - if you have to modify the index, you can move it back to
       the last FS you had retrieved from the iterator, and then continue, if that makes sense in
       your application.</para>   
+      
+    <para>As of version 2.7.0, a new method on FSIndex, <code>withSnapshotIterators(),</code> 
+    allows creating a light-weight FSIndex based on the original FSIndex 
+    that supports doing arbitrary index operations while iterating, and will not throw 
+    <code>ConcurrentModificationException</code>.  Iterators obtained from this instance use a 
+    <emphasis>snapshot</emphasis> technique - they create a snapshot of the original index when the 
+    iterator is created, and then use that snapshot while operating, so the iteration is unaffected by any
+    modifications to the actual index.</para>
 
     <section id="ugr.ref.cas.index.built_in_indexes">
       <title>Built-in Indexes</title>

Modified: uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.config.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.config.xml?rev=1644834&r1=1644833&r2=1644834&view=diff
==============================================================================
--- uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.config.xml (original)
+++ uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.config.xml Fri Dec 12 05:48:40 2014
@@ -49,9 +49,31 @@ under the License.
   checking for this, added in version 2.7.0 (the previously existing partial checking is
   still there, though).  
     </para>
+  </section>   
+ 
+  <section id="ugr.ref.config.protect-index">
+    <title>Configuring index protection</title>
     
-    
-    
+    <para>A new feature in version 2.7.0 optionally can include checking for invalid feature updates 
+    which could corrupt indices.  Because this checking can slighly slow down performance, there are 
+    global JVM properties to control it.  The suggested way to operation with these is as follows.
+    <itemizedlist>
+	    <listitem><para>At the beginning, run with automatic protection enabled (the default), but
+	    turn on explicit reporting (<code>uima.report_fs_update_corrupts_index</code>))</para></listitem>
+	    <listitem><para>For all reported instances, examine your code to see if you can restructure to
+	    do the updates before adding the FS to the indices.  Where you cannot, surround the code doing 
+	    these updates with a try / finally or block protectIndices() form (see documentation in the CAS Javadocs).
+	    </para></listitem>
+	    <listitem><para>After no further reports, for maximum performance, you can leave in the protections 
+	    you may have installed in the above step, and then disable the reporting and runtime checking, by setting 
+	    <code>uima.protect_indices_from_key_updates</code> to false, and removing any 
+	    <code>uima.report_fs_update_corrupts_index</code> JVM property.</para></listitem>
+    </itemizedlist></para>
+  </section>
+  
+  <section id="ugr.ref.config.property-table">
+    <title>Properties Table</title>
+      
     <informaltable frame="all" rowsep="1" colsep="1">
      <tgroup cols="3">
        <colspec colnum="1" colname="Title" colwidth="1*"/>
@@ -79,19 +101,45 @@ under the License.
          </row>
          
          <row>
-           <entry><para>Illegal index-key Feature update</para></entry>
+           <entry><para>Report Illegal Index-key Feature Updates</para></entry>
            
-           <entry><para><code>uima.check_fs_update_corrupts_index</code> (default is not to check)</para>
-           
-                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4059"/>.
+           <entry><para><code>uima.report_fs_update_corrupts_index</code> (default is not to report)</para>
+                      
+                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4059">UIMA-4059</ulink>.
                         Updating Features which are used in Set and Sorted
-                        indices as "keys" is illegal, if the Feature Structure (FS)
+                        indices as "keys" may corrupt the indices, if the Feature Structure (FS)
                         has been added to the indices.  To update these, you must first
-                        remove the FS from the index, then do the updates, and then
-                        add it back.  A new (by default disabled) capability
-                        checks for this.  It may be somewhat expensive in
-                        space and time, so is disabled by default, but can be
-                        enabled to check your application.</para></entry>
+                        completely remove the FS from the indexes in all views, then do the updates, and then
+                        add it back.  UIMA now checks for this (unless specifically disabled, see below),
+                        and if this property is set, will log WARN messages for each occurrence unless
+                        the user does explicit protectIndices (see CAS Javadocs), if this
+                        property is defined.</para>
+                   
+                   <para>Specifying this property also forces uima.protect_indices_from_key_updates to true
+                         even if it was set to false (see below).</para>
+                         
+                   <para>Users would run with this property defined, and then for high performance, 
+                        would use the report to manually change their code to avoid the problem or 
+                        to wrap the updates with a protectIndices kind of protection (see the
+                        reference manual, in the CAS or JCas chapters, for examples of user code doing this, 
+                        and then run with the protection turned off (see below).
+                        
+                        </para></entry>
+                        
+           <entry><para>2.7.0</para></entry>
+         </row>
+
+         <row>
+           <entry><para>Protect Indices from Key Updates</para></entry>
+           
+           <entry><para><code>uima.protect_indices_from_key_updates</code> (default is true)</para>
+                      
+                  <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4135">UIMA-4135</ulink>.
+                        After you have fixed all reported issues identified with the above report,
+                        set this to false to omit this check - which will potentially slightly speed up 
+                        runs.</para>
+           </entry>
+                        
            <entry><para>2.7.0</para></entry>
          </row>
 
@@ -101,7 +149,7 @@ under the License.
            <entry><para><code>uima.allow_duplicate_add_to_indices</code></para>
            
                   <para>See <ulink url="https://issues.apache.org/jira/browse/UIMA-4059"/>
-                        and <ulink url="https://issues.apache.org/jira/browse/UIMA3399"/>.
+                        and <ulink url="https://issues.apache.org/jira/browse/UIMA-3399"/>.
                         As of version 2.7.0, adding a particular Feature Structure
                         to the indices more than once is ignored.  The old behavior
                         may be restored by this property.</para></entry>

Modified: uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.jcas.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.jcas.xml?rev=1644834&r1=1644833&r2=1644834&view=diff
==============================================================================
--- uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.jcas.xml (original)
+++ uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.jcas.xml Fri Dec 12 05:48:40 2014
@@ -511,11 +511,42 @@ ir.getAnnotationIndex(Foo.type)      //
       
       <para>Do this after setting all features in the instance <emphasis role="bold-italic">which could be used in indexing</emphasis>, for example, in
         determining the sorting order. After indexing, do not change the values of these
-        particular features because the indexes will not be updated. If you need to change the
-        values, you must first remove the instance from the CAS indexes, change the values,
-        and then add the instance back. To remove an instance from the indexes, use the method:
+        particular features because changing these values can corrupt Set and Sorted indices. 
+        If you need to change the
+        values, you must first remove the instance from the CAS indexes in all views where it may be indexed, 
+        change the values, and then add the instance back.
+         
+              <note><para>If you've allowed duplicates of the same Feature Structure to be added to the index (which
+      was the default before version 2.7.0, and can be enabled in version 2.7.0 and later by defining the JVM
+      property <code>-Duima.allow_duplicate_add_to_indices</code>), then you have to remove 
+      <emphasis role="bold">*all*</emphasis> instances of the Feature Structure.<para></note>
+        
+        
+        The following two ways of writing this both use framework support (available as of Version 2.7.0) to achieve all of this, 
+        in an optimized manner: 
+        
+        <programlisting>// an approach using try / finally
+AutoCloseable ac = my_jcas.protectIndices();
+try {
+   ...  arbitrary user code which updates features which may be "keys" in one or more indices
+} finally {
+  ac.close();
+}
+
+// if Java 8 is in use, this can be written using the auto-close feature of try:
+
+try (Closeable closeable = my_jcas.protectIndices()) {
+   ...  arbitrary user code which updates features which may be "keys" in one or more indices
+}</programlisting>
+      </para>
+      <para>As an alternative to the try / finally, you can use a Runnable, as follows:
+      
+       <programlisting>// an approach using a Runnable, written in Java 8 lambda syntax
+my_jcas.protectIndices(() -> {
+  ... arbitrary user code updating "key" features, but no checked exceptions are permitted
+  });</programlisting>       
+      </para>
         
-        <programlisting>myInstance.removeFromIndexes();</programlisting></para>
       <note><para>It&apos;s OK to change feature values which are not used in determining
       sort ordering (or set membership), without removing and re-adding back to the index.
       </para></note>

Modified: uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml?rev=1644834&r1=1644833&r2=1644834&view=diff
==============================================================================
--- uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml (original)
+++ uima/uimaj/trunk/uima-docbook-references/src/docbook/ref.xml.component_descriptor.xml Fri Dec 12 05:48:40 2014
@@ -754,6 +754,15 @@ uima.tcas.Annotation.</programlisting>
              a JVM defined property,
             "uima.allow_duplicate_add_to_indices", which (if defined whend UIMA is loaded) will restore the previous behavior.</para>
             
+            <note><para>If duplicates are allowed, then the proper way to update an indexed Feature Structure is to
+              <itemizedlist>
+                <listitem><para>remove <emphasis role="bold">*all*</emphasis> instances of the FS to be
+                  updated </para></listitem>
+                <listitem><para>update the features</para></listitem>
+                <listitem><para>re-add the Feature Structure to the indices (perhaps multiple times, depending on the
+                details of your logic).</para></listitem>
+              </itemizedlist></para></note>
+            
             <note><para>There is usually no need to explicitly declare a Bag index in your descriptor.  
               As of UIMA v2.1, if you do not declare any index for a type (or any of its 
               supertypes), a Bag index will be automatically created if an instance of that type is added to the indices.</para></note>