You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2019/11/22 15:29:22 UTC

svn commit: r1870165 - in /uima/site/trunk/uima-website: docs/doc-uimaj-cookbook.html xdocs/doc-uimaj-cookbook.xml

Author: schor
Date: Fri Nov 22 15:29:22 2019
New Revision: 1870165

URL: http://svn.apache.org/viewvc?rev=1870165&view=rev
Log:
no Jira, add a 

Added:
    uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html
    uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml

Added: uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html?rev=1870165&view=auto
==============================================================================
--- uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html (added)
+++ uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html Fri Nov 22 15:29:22 2019
@@ -0,0 +1,536 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "https://www.w3.org/TR/html4/loose.dtd">
+
+
+    <!-- ====================================================================== -->
+    <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
+    <!-- ====================================================================== -->
+    <html>
+        <head>
+            <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
+            <style type="text/css">@import "stylesheets/base.css";</style>
+                                          <meta name="author" value="
+			Apache UIMA Team
+		">
+  <meta name="email" value="dev@uima.apache.org">
+                        
+            
+                        
+                        <title>Apache UIMA - Cookbook: addressing some typical use-cases</title>
+            
+            <!-- Begin Cookie Consent plugin by Silktide - https://silktide.com/cookieconsent -->
+            <!-- Commented out because implied consent is not compatible with GDPR -->
+            <!--
+            <script type="text/javascript">
+                window.cookieconsent_options = {"message":"This website uses cookies to ensure you get the best experience on our website","dismiss":"Got it!","learnMore":"More info","link":"https://uima.apache.org/privacy-policy.html","theme":"dark-bottom"};
+            </script>
+            
+            <script type="text/javascript" src="/cookieconsent2/cookieconsent.min.js"></script>
+            -->
+            <!-- End Cookie Consent plugin -->
+            
+            <!-- Begin Google Analytics -->
+            <!-- Commented out because GA requires consent according to GDPR -->
+            <!--
+            <script>
+              (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+              (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+              m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+              })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+            
+              ga('create', 'UA-70846351-1', 'auto');
+              ga('set', 'anonymizeIp', true);
+              ga('send', 'pageview');
+            
+            </script>
+            -->
+            <!-- End Google Analytics -->
+        </head>
+
+        <body>
+          <div class="topLogos">        
+            <table border="0" width="100%" cellspacing="0">
+                <!-- TOP IMAGE -->
+                <tr>
+                    <td align='LEFT'>
+                      <a href="index.html">
+                                    <img style="border: 1px solid black;" src="./images/UIMA_banner2tlpTm.png" alt="UIMA project logo" border="0"/>
+                            </a>
+                    </td>
+                    <td align='CENTER'>
+                          <div class="pageBanner">Cookbook: addressing some typical use-cases</div>
+                    </td>
+                    <td align='RIGHT'>
+                                  <a href="https://www.apache.org">
+        <img src="./images/asf-logo-on-white-smallTm.png" alt="Apache UIMA" border="0"/>
+      </a>
+                          </td>
+                </tr>
+            </table>
+            <hr noshade="" size="1"/>
+            </div>
+            <table border="0" width="100%" cellspacing="4">
+              <tr>
+                <td align='RIGHT' colspan="2">
+                  <form method="get" action="https://www.google.com/search">
+                    Search the site
+                    <input type="text"   name="q" size="25" maxlength="255" value="" />
+                    <input type="hidden" name="sitesearch" value="https://uima.apache.org/" />
+                    <input name="Search" value="Search Site" type="submit"/>
+                  </form>
+                </td>
+              </tr>
+              <tr> <!-- LEFT SIDE NAVIGATION -->
+                <td width="20%" valign="top">
+
+
+
+
+
+
+                   <!-- regular menu -->
+                      <div class="navBar">
+                  <br/>
+            <div class="navBarItem">      <div class="navPartHeading">General</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./index.html">Home</a>
+                </div>
+                          <div class="navBarItem">      <a href="./downloads.cgi">Downloads</a>
+                </div>
+                          <div class="navBarItem">      <a href="./documentation.html">Documentation</a>
+                </div>
+                          <div class="navBarItem">      <a href="./news.html">News</a>
+                </div>
+                          <div class="navBarItem">      <a href="./publications.html">Publications</a>
+                </div>
+                    <br style="line-height: .5em"/>
+                          <div class="navBarItem">      <a href="https://issues.apache.org/jira/browse/uima" target="_blank" rel="noopener">Issue tracker <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a href="https://cwiki.apache.org/confluence/display/UIMA/" target="_blank" rel="noopener">Wiki <img src="images/offsitelink.png"/></a>
+                </div>
+                    <br style="line-height: .5em"/>
+                          <div class="navBarItem">      <a href="https://cwiki.apache.org/confluence/display/UIMA/Powered+by+Apache+UIMA" target="_blank" rel="noopener">Powered By UIMA <img src="images/offsitelink.png"/></a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Community</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./get-involved.html">Get Involved</a>
+                </div>
+                          <div class="navBarItem">      <a href="./mail-lists.html">Mailing Lists</a>
+                </div>
+                          <div class="navBarItem">      <a href="./contribution-policy.html">Contribution Policies</a>
+                </div>
+                          <div class="navBarItem">      <a href="./faq.html">FAQ</a>
+                </div>
+                          <div class="navBarItem">      <a href="./project-guidelines.html">Project Guidelines</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Scaleout Frameworks</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./doc-uimaas-what.html">UIMA-AS</a>
+                </div>
+                          <div class="navBarItem">      <a href="./doc-uimaducc-whatitam.html">UIMA-DUCC</a>
+                </div>
+                          <div class="navBarItem">      <a href="./doc-uimaducc-demo.html">..Demo Page</a>
+                </div>
+                          <div class="navBarItem">      <a href="http://uima-ducc-demo.apache.org:42133" target="_blank" rel="noopener">..Demo Live <img src="images/offsitelink.png"/></a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Components & Tools</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./sandbox.html#uima-addons-annotators">Annotators</a>
+                </div>
+                          <div class="navBarItem">      <a href="./toolsServers.html">Tools & Servers</a>
+                </div>
+                          <div class="navBarItem">      <a href="./sandbox.html">Addons and Sandbox</a>
+                </div>
+                          <div class="navBarItem">      <a href="./ruta.html">UIMA Ruta</a>
+                </div>
+                          <div class="navBarItem">      <a href="./uimafit.html">uimaFIT</a>
+                </div>
+                          <div class="navBarItem">      <a href="./external-resources.html">External Resources</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Development</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./dev-quick.html">Quick Start: building</a>
+                </div>
+                          <div class="navBarItem">      <a href="./building-uima.html">Building from Source</a>
+                </div>
+                          <div class="navBarItem">      <a href="./one-time-setup.html">One-time setups</a>
+                </div>
+                          <div class="navBarItem">      <a href="./svn.html">Source Code</a>
+                </div>
+                          <div class="navBarItem">      <a href="./distribution.html">Creating a Distribution</a>
+                </div>
+                          <div class="navBarItem">      <a href="./release.html">Doing a UIMA release</a>
+                </div>
+                          <div class="navBarItem">      <a href="https://www.apache.org/security/committers.html" target="_blank" rel="noopener">Doing a CVE (Apache) <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a href="./eclipse-update-site.html">Eclipse Update Sites</a>
+                </div>
+                          <div class="navBarItem">      <a href="./git.html">GIT</a>
+                </div>
+                          <div class="navBarItem">      <a href="./codeConventions.html">Code Conventions</a>
+                </div>
+                          <div class="navBarItem">      <a href="./uima-specification.html">UIMA Specification (OASIS)</a>
+                </div>
+                          <div class="navBarItem">      <a href="./team-list.html">Project Team</a>
+                </div>
+                          <div class="navBarItem">      <a href="./maven-design.html">Maven Use</a>
+                </div>
+                          <div class="navBarItem">      <a href="./updating-website.html">Updating this Website</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">Events and Conferences</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="./coling14.html">COLING 2014</a>
+                </div>
+                          <div class="navBarItem">      <a href="./gscl13.html">GSCL 2013</a>
+                </div>
+                          <div class="navBarItem">      <a href="./iks09.html">IKS 2009</a>
+                </div>
+                          <div class="navBarItem">      <a href="./gscl09.html">GSCL 2009</a>
+                </div>
+                          <div class="navBarItem">      <a href="./lsm09.html">LSM 2009</a>
+                </div>
+                          <div class="navBarItem">      <a href="./lrec08.html">LREC 2008</a>
+                </div>
+                          <div class="navBarItem">      <a href="./gldv07.html">GLDV 2007</a>
+                </div>
+            </div>
+                      <br/>
+            <div class="navBarItem">      <div class="navPartHeading">ASF</div>
+                </div>
+                <div class="navBar">
+                  <div class="navBarItem">      <a href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">ASF Sponsors <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">ASF Sponsorship <img src="images/offsitelink.png"/></a>
+                </div>
+                          <div class="navBarItem">      <a href="./security_report">Security</a>
+                </div>
+            </div>
+        </div>
+                </td>
+                <td width="80%" align="left" valign="top">
+                                                          <div class="sectionTable">
+      <table class="sectionTable">
+        <tr><td>
+        <a name="Working with Feature Structures"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/>&nbsp;Working with Feature Structures</h1></a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="sectionBody">
+                                    <p>These work with all kinds of Feature Structures, Annotations and non-Annotations, both.</p>
+                                                      <table class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Remove all Feature Structures of a particular type">
+            <h2>Remove all Feature Structures of a particular type
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>There are built-in methods to do this, over all indexes in a particular view.  There are 2 variations:
+          <ul><li>remove all including the subtypes of the type
+               <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre>
+          </li>
+          <li>remove all excluding the subtypes of the type
+              <pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul>
+        </p>
+                                                <p>Both of these are much faster than iterating over the Feature Structures; they directly clear the associated indexes.</p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="General suggestions: working with iterators">
+            <h2>General suggestions: working with iterators
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>Many times code will iterate over all instances of a type, and only do something with a subset.
+         Frequently, the iteration can be cut short, by starting near the spot of interest and stopping as soon
+         as it can be determined that no further iteration will find interesting Annotations.</p>
+                                                <p>Example: Let's say you have a "token" annotation, and want to find the "sentence" that contains it.
+         You could write an iterator over all sentences.  
+      </p>
+                                                <h3>Stop early</h3>
+                                                <p>
+         When you find the first sentence that overlaps the token, you can use extra knowledge that you might have,
+         such as: there's only one sentence per token, to conclude that having found it, there's no need to do any
+         further iteration, so you can stop the iteration. 
+       </p>
+                                                <p>Furthermore, if the token appears outside of any sentence, you can similarly stop the iteration, and return
+          an "empty" result, as soon as the test sentence begins after the token's "begin".
+          This is because, at that point, due to the sorting of the returned values, no future sentences could
+          start before or equal to the token's begin.
+       </p>
+                                                <h3>Begin closer to the right spot, maybe iterate backwards</h3>
+                                                <p>But you can do better.</p>
+                                                <p>You can start the iteration, instead of at the beginning, at the position of the token, and iterate backwards.
+       Iterators have a moveTo() method which takes a feature structure argument, so you can moveTo(the-token), 
+       and then perhaps with some edge adjustment for equality, start iterating backwards, looking for the sentence at that
+       position that covers the token.  
+       </p>
+                                                <p>If you are iterating backwards, and looking for a "covering" annotation, and know the largest span for that
+       covering type, then you can stop iterating as soon as the start position you reach, + the largest span, is less than
+       the start of the annotation you're trying to cover.</p>
+                                                <p style="margin-left:1rem">This is used internally in version 3's 
+           <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect">select framework</a>
+       to speed up 
+       the <code>covering</code> kind of iteration.</p>
+                                                <p>There are many other examples, but the principle is the same: start the iteration "close to" the right spot, 
+          perhaps moving backwards instead of forwards, and end the iteration as soon as you can logically say that
+          no more suitable feature structures would be found. </p>
+                                                <h3>Use UIMA Version 3's select framework</h3>
+                                                <p>The <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a> 
+           incorporates many of the popular use cases for doing iterations that we've seen, into a Java friendly approach that
+           automatically uses optimized iterators and can produce Java Streams, as well.</p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                            </blockquote>
+        </p>
+      </td></tr>
+    </table>
+                                        <div class="sectionTable">
+      <table class="sectionTable">
+        <tr><td>
+        <a name="Working with Annotations"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/>&nbsp;Working with Annotations</h1></a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="sectionBody">
+                                          <ul>
+          <li><a href='#Watch out for type-priorites'>
+                  Watch out for type-priorites
+        
+                </a></li>
+          <li><a href='#Annotation containment'>
+                  Annotation containment
+        
+                </a></li>
+          <li><a href='#Adjusting an existing annotation's begin and end'>
+                  Adjusting an existing annotation's begin and end
+        
+                </a></li>
+          <li><a href='#Avoid where possible, copying sets of Feature Structures'>
+                  Avoid where possible, copying sets of Feature Structures
+        
+                </a></li>
+        </ul>
+                                                  <p>
+				The CAS holds Feature Structures (FSs).  There is special support for FSs which are a subtype of Annotation;
+				these have an associated Subject of Analysis (Sofa) and <code>begin</code> and <code>end</code> offsets. 
+			</p>
+                                                <h3>Annotations are not required in all cases</h3>
+                                                <p>If your application deals with a different kind of unstructured data, say, for instance, images, then
+			     Annotations may not be the appropriate supertype for your types, because they're designed for 
+			     things having a linear begin / end meaningful demarcations. </p>
+                                                <p>You can have your feature structures inherit from TOP, or from some other appropriate supertype, other
+			     than Annotation.</p>
+                                                <h3>Making use of the built-in Annotation index</h3>
+                                                <p>Annotations are special in UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used
+			   to rapidly access these in a sorted order.  The ordering is by <code>begin</code> (ascending), then by
+			   <code>end</code> (descending), and then by type-priorities.</p>
+                                                <p style="margin-left:1rem"><i>This is really a set of indexes, one for each subtype of Annotation.</i></p>
+                                                <p style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, the <code>select-framework</code>
+			  by default ignores these; this behavior can be overridden.</i></p>
+                                                      <table class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Watch out for type-priorites">
+            <h2>Watch out for type-priorites
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>When 2 annotations have the same start and end, but different types, then one comes before the other,
+			     according to type priorites.  This is intended to allow you to say if you have a Sentence annotation, and a 
+			     Foo annotation, both covering the same span, to declare that the Sentence logically contains Foo, and not the 
+			     other way around.</p>
+                                                <p>To make this work, you need to specify the type priorities. This is a global setting for your application.
+			     See 
+			     <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive">
+			       type priorities</a> (scroll down to find it) for how to specify this.</p>
+                                                <h3>Avoiding type priorities</h3>
+                                                <p>Often, the use of type priorities gets in the way.  With UIMA Version 3, the 
+			     <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>
+           by default ignores type priorites when doing its operations; but this can be overridden as needed.</p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Annotation containment">
+            <h2>Annotation containment
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <h3>a contains b</h3>
+                                                <ul><li>Ignoring type priorities:</li></ul>
+                                                <pre>a != null &amp;&amp; b != null &amp;&amp;       // null check
+a.getBegin() &lt;= b.getBegin() &amp;&amp; // a starts before (or equal to) b 
+a.getEnd() &gt;= b.getEnd()        // a ends after (or equal to) b</pre>
+                                                <h3>a and b overlap (have at least one char in common)</h3>
+                                                <pre>
+                                    // ((omitted) check for non-null)
+if (a.getBegin() &lt;= b.getBegin()) { // if a starts before (or equal to) b
+  return a.getEnd() &gt; b.getBegin(); // then it overlaps if a's end is after b's begin
+} else {                            // otherwise, b's begin is before a's begin
+  return b.getEnd() &gt; a.getBegin(); // so it overlaps if b's end is after a's begin.
+</pre>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Adjusting an existing annotation's begin and end">
+            <h2>Adjusting an existing annotation's begin and end
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>Sometimes, your code may want to adjust an annotations begin and end values.
+			     If the annotation is not indexed, there's no issue - just change the value.
+			     But if it is indexed, it's in index(es) in a position determined by its begin and end position, so if you 
+			     change these, the item needs to be reindexed (in all the indexes holding it).  Typically, only one index
+			     (the Annotation Index for a particular CAS View) is involved, but in general, there could be multiple
+			     indexes involved.</p>
+                                                <p>If you are using UIMA version 2.7.0 or later, the UIMA 
+			     <a target="_blank" rel="nopener" href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">framework</a> 
+			        detects updates that would need this re-indexing, and
+			        automatically removes the Feature Structure from all involved index(es), updates the Feature, and then adds the Feature Structure back to the index(es).
+			   </p>
+                                                <p>You can improve the efficiency of this, if you are updating, say, both the begin and end value of an annotation, by
+			      doing this yourself, in your code.
+			     <ul><li>Removing the item from the index(es)</li>
+			         <li>Doing both updates</li>
+			         <li>Adding the item back into the index(es)</li></ul>.
+	        More details <a target="_blank" rel="nopener" href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">here</a>.
+	  		         </p>
+                                                <p>Example: if you know a particular annotation is only indexed in one view, 
+	  		    then you can update it's begin and end features using
+	  		    <pre>a.<b>removeFsFromIndexes</b>();
+	  		    
+  a.setBegin(new_value_begin);
+  a.setEnd(new_value_end);
+  
+a.<b>addToIndexes</b>();</pre>
+This is the most efficient way to do this.
+	  		    </p>
+                                                <p>There's a couple of special forms you can use to protect indexes while you're updating features used as keys.
+			   This is useful when you're not sure what feature values might be used as keys in some index.
+			   <pre>
+try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) {
+   // ...  arbitrary user code which updates features 
+   //      which may be "keys" in one or more indexes, e.g.
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end); 
+}</pre>
+or
+<pre>
+my_cas.<b>protectIndexes</b>(() -&gt; {
+   // ... arbitrary user code updating "key" features, 
+   //     but no checked exceptions are permitted
+   //     (because inside a lambda)
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end);
+   });</pre>
+   These use the frameworks automatic detection mechanism, and removes Feature Structures from all involved indexes
+   if needed, but delays adding them back, until the end of the protected section.
+			       </p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                                                      <table class="subsectionTable">
+        <tr><td>
+       
+       
+       
+          <a name="Avoid where possible, copying sets of Feature Structures">
+            <h2>Avoid where possible, copying sets of Feature Structures
+                        </h2>
+          </a>
+      </td></tr>
+      <tr><td>
+        <blockquote class="subsectionBody">
+                                    <p>Operations which iterate over Feature Structures, and put them into a Collection or List, and then 
+			     iterate over that list to do some other operations, can often be done directly on the Feature Structures in the CAS,
+			     omitting the first copying of them into a list.
+			  </p>
+                                                <p>A frequent speedup can happen when the particular logic can detect when no further items in a (sorted) index
+			     are needed, and the iteration can be stopped early.</p>
+                                                <p>For example, you might have code which iterates over all feature structures of a particular type, and puts these into a list,
+			     and then goes thru the list, and picks out certain ones and put those into another list, which is then returned.
+			  </p>
+                                                <p>The first copying can be omitted, by moving the logic of what to include into the first iteration, and producing the second
+			     list directly.</p>
+                                                <p>In UIMA Version 3, you can make use of the <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>.
+			     It already has many of the use-cases where you might want to start or exit an iteration, accounted for.
+			     You can also use its ability to produce streams, and combine that with Java's takeWhile method, to exit a stream early.
+			  </p>
+                            </blockquote>
+        </td></tr>
+    </table>
+                            </blockquote>
+        </p>
+      </td></tr>
+    </table>
+                                  </td>
+                </tr>
+                <!-- FOOTER -->
+                <tr><td colspan="2">
+                  <hr noshade="" size="1"/>
+                </td></tr>
+                <tr><td colspan="2"> 
+                  <table class="pageFooter">
+                    <tr>
+                      <td><a href="index.html">Home</a></td>
+                      <td><a href="privacy-policy.html">Privacy Policy</a></td>
+                      <td style="font-size:75%">
+                Copyright &#169; 2006-2013, The Apache Software Foundation.<br/>
+                Apache UIMA, UIMA, the Apache UIMA logo and the Apache Feather logo are trademarks of The Apache Software Foundation.<br/>
+                All other marks mentioned may be trademarks or registered trademarks of their respective owners.
+                      </td>
+                      <td><a href="mailto:dev@uima.apache.org">Contact us</a></td>
+                    </tr>
+                  </table>                    
+                </td></tr>
+            </table>
+        </body>
+    </html>
+

Added: uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml?rev=1870165&view=auto
==============================================================================
--- uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml (added)
+++ uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml Fri Nov 22 15:29:22 2019
@@ -0,0 +1,252 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+
+<!--
+	Licensed to the Apache Software Foundation (ASF) under one
+	or more contributor license agreements.  See the NOTICE file
+	distributed with this work for additional information
+	regarding copyright ownership.  The ASF licenses this file
+	to you under the Apache License, Version 2.0 (the
+	"License"); you may not use this file except in compliance
+	with the License.  You may obtain a copy of the License at
+	
+	https://www.apache.org/licenses/LICENSE-2.0
+	
+	Unless required by applicable law or agreed to in writing,
+	software distributed under the License is distributed on an
+	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+	KIND, either express or implied.  See the License for the
+	specific language governing permissions and limitations
+	under the License.
+-->
+
+<document>
+
+	<properties>
+		<title>Cookbook: addressing some typical use-cases</title>
+		<author email="dev@uima.apache.org">
+			Apache UIMA Team
+		</author>
+	</properties>
+
+	<body>
+	
+	  <section name="Working with Feature Structures">
+	    <p>These work with all kinds of Feature Structures, Annotations and non-Annotations, both.</p>
+	   
+	    <subsection name="Remove all Feature Structures of a particular type">
+        <p>There are built-in methods to do this, over all indexes in a particular view.  There are 2 variations:
+          <ul><li>remove all including the subtypes of the type
+               <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre>
+          </li>
+          <li>remove all excluding the subtypes of the type
+              <pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul>
+        </p>
+        <p>Both of these are much faster than iterating over the Feature Structures; they directly clear the associated indexes.</p>
+      </subsection>
+      
+      <subsection name="General suggestions: working with iterators">
+
+            
+      <p>Many times code will iterate over all instances of a type, and only do something with a subset.
+         Frequently, the iteration can be cut short, by starting near the spot of interest and stopping as soon
+         as it can be determined that no further iteration will find interesting Annotations.</p>
+         
+      <p>Example: Let's say you have a "token" annotation, and want to find the "sentence" that contains it.
+         You could write an iterator over all sentences.  
+      </p>
+      <h3>Stop early</h3>
+        <p>
+         When you find the first sentence that overlaps the token, you can use extra knowledge that you might have,
+         such as: there's only one sentence per token, to conclude that having found it, there's no need to do any
+         further iteration, so you can stop the iteration. 
+       </p>
+       
+       <p>Furthermore, if the token appears outside of any sentence, you can similarly stop the iteration, and return
+          an "empty" result, as soon as the test sentence begins after the token's "begin".
+          This is because, at that point, due to the sorting of the returned values, no future sentences could
+          start before or equal to the token's begin.
+       </p>
+       
+       <h3>Begin closer to the right spot, maybe iterate backwards</h3>
+       
+       <p>But you can do better.</p>
+       <p>You can start the iteration, instead of at the beginning, at the position of the token, and iterate backwards.
+       Iterators have a moveTo() method which takes a feature structure argument, so you can moveTo(the-token), 
+       and then perhaps with some edge adjustment for equality, start iterating backwards, looking for the sentence at that
+       position that covers the token.  
+       </p>
+       
+       <p>If you are iterating backwards, and looking for a "covering" annotation, and know the largest span for that
+       covering type, then you can stop iterating as soon as the start position you reach, + the largest span, is less than
+       the start of the annotation you're trying to cover.</p>
+       
+       <p style="margin-left:1rem">This is used internally in version 3's 
+           <a target="_blank" rel="noopener"
+           href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect">select framework</a>
+       to speed up 
+       the <code>covering</code> kind of iteration.</p>
+       
+       <p>There are many other examples, but the principle is the same: start the iteration "close to" the right spot, 
+          perhaps moving backwards instead of forwards, and end the iteration as soon as you can logically say that
+          no more suitable feature structures would be found. </p>
+      
+       <h3>Use UIMA Version 3's select framework</h3>
+       <p>The <a target="_blank" rel="noopener"
+           href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a> 
+           incorporates many of the popular use cases for doing iterations that we've seen, into a Java friendly approach that
+           automatically uses optimized iterators and can produce Java Streams, as well.</p>    
+      </subsection>
+      
+	  </section>
+		
+		<section
+			name="Working with Annotations">
+			
+			    <subsectionToc/>
+			<p>
+				The CAS holds Feature Structures (FSs).  There is special support for FSs which are a subtype of Annotation;
+				these have an associated Subject of Analysis (Sofa) and <code>begin</code> and <code>end</code> offsets. 
+			</p>
+			
+			<h3>Annotations are not required in all cases</h3>
+			  <p>If your application deals with a different kind of unstructured data, say, for instance, images, then
+			     Annotations may not be the appropriate supertype for your types, because they're designed for 
+			     things having a linear begin / end meaningful demarcations. </p>
+			  <p>You can have your feature structures inherit from TOP, or from some other appropriate supertype, other
+			     than Annotation.</p>
+			
+			<h3>Making use of the built-in Annotation index</h3>        
+			<p>Annotations are special in UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used
+			   to rapidly access these in a sorted order.  The ordering is by <code>begin</code> (ascending), then by
+			   <code>end</code> (descending), and then by type-priorities.</p>
+			<p style="margin-left:1rem"><i>This is really a set of indexes, one for each subtype of Annotation.</i></p>
+			<p style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, the <code>select-framework</code>
+			  by default ignores these; this behavior can be overridden.</i></p>  
+			  
+			<subsection name="Watch out for type-priorites">
+			  <p>When 2 annotations have the same start and end, but different types, then one comes before the other,
+			     according to type priorites.  This is intended to allow you to say if you have a Sentence annotation, and a 
+			     Foo annotation, both covering the same span, to declare that the Sentence logically contains Foo, and not the 
+			     other way around.</p>
+			     
+			  <p>To make this work, you need to specify the type priorities. This is a global setting for your application.
+			     See 
+			     <a target="_blank" rel="noopener"
+			       href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive">
+			       type priorities</a> (scroll down to find it) for how to specify this.</p>
+			       
+			  <h3>Avoiding type priorities</h3>
+			  <p>Often, the use of type priorities gets in the way.  With UIMA Version 3, the 
+			     <a target="_blank" rel="noopener"
+           href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>
+           by default ignores type priorites when doing its operations; but this can be overridden as needed.</p>        
+			</subsection>  
+     
+      <subsection name="Annotation containment">
+        <h3>a contains b</h3>
+        <ul><li>Ignoring type priorities:</li></ul>
+        <pre>a != null &amp;&amp; b != null &amp;&amp;       // null check
+a.getBegin() &lt;= b.getBegin() &amp;&amp; // a starts before (or equal to) b 
+a.getEnd() &gt;= b.getEnd()        // a ends after (or equal to) b</pre>
+        
+        <h3>a and b overlap (have at least one char in common)</h3>
+        <pre>
+                                    // ((omitted) check for non-null)
+if (a.getBegin() &lt;= b.getBegin()) { // if a starts before (or equal to) b
+  return a.getEnd() &gt; b.getBegin(); // then it overlaps if a's end is after b's begin
+} else {                            // otherwise, b's begin is before a's begin
+  return b.getEnd() &gt; a.getBegin(); // so it overlaps if b's end is after a's begin.
+</pre>    
+			</subsection>
+			
+			
+			<subsection name="Adjusting an existing annotation's begin and end">
+			  <p>Sometimes, your code may want to adjust an annotations begin and end values.
+			     If the annotation is not indexed, there's no issue - just change the value.
+			     But if it is indexed, it's in index(es) in a position determined by its begin and end position, so if you 
+			     change these, the item needs to be reindexed (in all the indexes holding it).  Typically, only one index
+			     (the Annotation Index for a particular CAS View) is involved, but in general, there could be multiple
+			     indexes involved.</p>
+			     
+			  <p>If you are using UIMA version 2.7.0 or later, the UIMA 
+			     <a target="_blank" rel="nopener"
+			        href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">framework</a> 
+			        detects updates that would need this re-indexing, and
+			        automatically removes the Feature Structure from all involved index(es), updates the Feature, and then adds the Feature Structure back to the index(es).
+			   </p> 
+			   
+			   <p>You can improve the efficiency of this, if you are updating, say, both the begin and end value of an annotation, by
+			      doing this yourself, in your code.
+			     <ul><li>Removing the item from the index(es)</li>
+			         <li>Doing both updates</li>
+			         <li>Adding the item back into the index(es)</li></ul>.
+	        More details <a target="_blank" rel="nopener"
+              href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">here</a>.
+	  		         </p> 
+	  		         
+	  		 <p>Example: if you know a particular annotation is only indexed in one view, 
+	  		    then you can update it's begin and end features using
+	  		    <pre>a.<b>removeFsFromIndexes</b>();
+	  		    
+  a.setBegin(new_value_begin);
+  a.setEnd(new_value_end);
+  
+a.<b>addToIndexes</b>();</pre>
+This is the most efficient way to do this.
+	  		    </p>         
+ 			         
+			   <p>There's a couple of special forms you can use to protect indexes while you're updating features used as keys.
+			   This is useful when you're not sure what feature values might be used as keys in some index.
+			   <pre>
+try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) {
+   // ...  arbitrary user code which updates features 
+   //      which may be "keys" in one or more indexes, e.g.
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end); 
+}</pre>
+or
+<pre>
+my_cas.<b>protectIndexes</b>(() -> {
+   // ... arbitrary user code updating "key" features, 
+   //     but no checked exceptions are permitted
+   //     (because inside a lambda)
+   
+   a.setBegin(new_value_begin);
+   a.setEnd(new_value_end);
+   });</pre>
+   These use the frameworks automatic detection mechanism, and removes Feature Structures from all involved indexes
+   if needed, but delays adding them back, until the end of the protected section.
+			       </p>      
+			      
+			</subsection>
+			
+			<subsection name="Avoid where possible, copying sets of Feature Structures">
+			
+			  <p>Operations which iterate over Feature Structures, and put them into a Collection or List, and then 
+			     iterate over that list to do some other operations, can often be done directly on the Feature Structures in the CAS,
+			     omitting the first copying of them into a list.
+			  </p>
+			  
+			  <p>A frequent speedup can happen when the particular logic can detect when no further items in a (sorted) index
+			     are needed, and the iteration can be stopped early.</p>
+			  
+			  <p>For example, you might have code which iterates over all feature structures of a particular type, and puts these into a list,
+			     and then goes thru the list, and picks out certain ones and put those into another list, which is then returned.
+			  </p>
+			  
+			  <p>The first copying can be omitted, by moving the logic of what to include into the first iteration, and producing the second
+			     list directly.</p>
+			     
+			  <p>In UIMA Version 3, you can make use of the <a target="_blank" rel="noopener"
+           href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>.
+			     It already has many of the use-cases where you might want to start or exit an iteration, accounted for.
+			     You can also use its ability to produce streams, and combine that with Java's takeWhile method, to exit a stream early.
+			  </p>     
+			     
+			</subsection>
+		</section>
+	</body>
+
+</document>
+