You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2019/11/22 15:29:22 UTC
svn commit: r1870165 - in /uima/site/trunk/uima-website:
docs/doc-uimaj-cookbook.html xdocs/doc-uimaj-cookbook.xml
Author: schor
Date: Fri Nov 22 15:29:22 2019
New Revision: 1870165
URL: http://svn.apache.org/viewvc?rev=1870165&view=rev
Log:
no Jira, add a
Added:
uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html
uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml
Added: uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html?rev=1870165&view=auto
==============================================================================
--- uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html (added)
+++ uima/site/trunk/uima-website/docs/doc-uimaj-cookbook.html Fri Nov 22 15:29:22 2019
@@ -0,0 +1,536 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "https://www.w3.org/TR/html4/loose.dtd">
+
+
+ <!-- ====================================================================== -->
+ <!-- GENERATED FILE, DO NOT EDIT, EDIT THE XML FILE IN xdocs INSTEAD! -->
+ <!-- ====================================================================== -->
+ <html>
+ <head>
+ <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"/>
+ <style type="text/css">@import "stylesheets/base.css";</style>
+ <meta name="author" value="
+ Apache UIMA Team
+ ">
+ <meta name="email" value="dev@uima.apache.org">
+
+
+
+ <title>Apache UIMA - Cookbook: addressing some typical use-cases</title>
+
+ <!-- Begin Cookie Consent plugin by Silktide - https://silktide.com/cookieconsent -->
+ <!-- Commented out because implied consent is not compatible with GDPR -->
+ <!--
+ <script type="text/javascript">
+ window.cookieconsent_options = {"message":"This website uses cookies to ensure you get the best experience on our website","dismiss":"Got it!","learnMore":"More info","link":"https://uima.apache.org/privacy-policy.html","theme":"dark-bottom"};
+ </script>
+
+ <script type="text/javascript" src="/cookieconsent2/cookieconsent.min.js"></script>
+ -->
+ <!-- End Cookie Consent plugin -->
+
+ <!-- Begin Google Analytics -->
+ <!-- Commented out because GA requires consent according to GDPR -->
+ <!--
+ <script>
+ (function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
+ (i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
+ m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
+ })(window,document,'script','//www.google-analytics.com/analytics.js','ga');
+
+ ga('create', 'UA-70846351-1', 'auto');
+ ga('set', 'anonymizeIp', true);
+ ga('send', 'pageview');
+
+ </script>
+ -->
+ <!-- End Google Analytics -->
+ </head>
+
+ <body>
+ <div class="topLogos">
+ <table border="0" width="100%" cellspacing="0">
+ <!-- TOP IMAGE -->
+ <tr>
+ <td align='LEFT'>
+ <a href="index.html">
+ <img style="border: 1px solid black;" src="./images/UIMA_banner2tlpTm.png" alt="UIMA project logo" border="0"/>
+ </a>
+ </td>
+ <td align='CENTER'>
+ <div class="pageBanner">Cookbook: addressing some typical use-cases</div>
+ </td>
+ <td align='RIGHT'>
+ <a href="https://www.apache.org">
+ <img src="./images/asf-logo-on-white-smallTm.png" alt="Apache UIMA" border="0"/>
+ </a>
+ </td>
+ </tr>
+ </table>
+ <hr noshade="" size="1"/>
+ </div>
+ <table border="0" width="100%" cellspacing="4">
+ <tr>
+ <td align='RIGHT' colspan="2">
+ <form method="get" action="https://www.google.com/search">
+ Search the site
+ <input type="text" name="q" size="25" maxlength="255" value="" />
+ <input type="hidden" name="sitesearch" value="https://uima.apache.org/" />
+ <input name="Search" value="Search Site" type="submit"/>
+ </form>
+ </td>
+ </tr>
+ <tr> <!-- LEFT SIDE NAVIGATION -->
+ <td width="20%" valign="top">
+
+
+
+
+
+
+ <!-- regular menu -->
+ <div class="navBar">
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">General</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="./index.html">Home</a>
+ </div>
+ <div class="navBarItem"> <a href="./downloads.cgi">Downloads</a>
+ </div>
+ <div class="navBarItem"> <a href="./documentation.html">Documentation</a>
+ </div>
+ <div class="navBarItem"> <a href="./news.html">News</a>
+ </div>
+ <div class="navBarItem"> <a href="./publications.html">Publications</a>
+ </div>
+ <br style="line-height: .5em"/>
+ <div class="navBarItem"> <a href="https://issues.apache.org/jira/browse/uima" target="_blank" rel="noopener">Issue tracker <img src="images/offsitelink.png"/></a>
+ </div>
+ <div class="navBarItem"> <a href="https://cwiki.apache.org/confluence/display/UIMA/" target="_blank" rel="noopener">Wiki <img src="images/offsitelink.png"/></a>
+ </div>
+ <br style="line-height: .5em"/>
+ <div class="navBarItem"> <a href="https://cwiki.apache.org/confluence/display/UIMA/Powered+by+Apache+UIMA" target="_blank" rel="noopener">Powered By UIMA <img src="images/offsitelink.png"/></a>
+ </div>
+ </div>
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">Community</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="./get-involved.html">Get Involved</a>
+ </div>
+ <div class="navBarItem"> <a href="./mail-lists.html">Mailing Lists</a>
+ </div>
+ <div class="navBarItem"> <a href="./contribution-policy.html">Contribution Policies</a>
+ </div>
+ <div class="navBarItem"> <a href="./faq.html">FAQ</a>
+ </div>
+ <div class="navBarItem"> <a href="./project-guidelines.html">Project Guidelines</a>
+ </div>
+ </div>
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">Scaleout Frameworks</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="./doc-uimaas-what.html">UIMA-AS</a>
+ </div>
+ <div class="navBarItem"> <a href="./doc-uimaducc-whatitam.html">UIMA-DUCC</a>
+ </div>
+ <div class="navBarItem"> <a href="./doc-uimaducc-demo.html">..Demo Page</a>
+ </div>
+ <div class="navBarItem"> <a href="http://uima-ducc-demo.apache.org:42133" target="_blank" rel="noopener">..Demo Live <img src="images/offsitelink.png"/></a>
+ </div>
+ </div>
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">Components & Tools</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="./sandbox.html#uima-addons-annotators">Annotators</a>
+ </div>
+ <div class="navBarItem"> <a href="./toolsServers.html">Tools & Servers</a>
+ </div>
+ <div class="navBarItem"> <a href="./sandbox.html">Addons and Sandbox</a>
+ </div>
+ <div class="navBarItem"> <a href="./ruta.html">UIMA Ruta</a>
+ </div>
+ <div class="navBarItem"> <a href="./uimafit.html">uimaFIT</a>
+ </div>
+ <div class="navBarItem"> <a href="./external-resources.html">External Resources</a>
+ </div>
+ </div>
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">Development</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="./dev-quick.html">Quick Start: building</a>
+ </div>
+ <div class="navBarItem"> <a href="./building-uima.html">Building from Source</a>
+ </div>
+ <div class="navBarItem"> <a href="./one-time-setup.html">One-time setups</a>
+ </div>
+ <div class="navBarItem"> <a href="./svn.html">Source Code</a>
+ </div>
+ <div class="navBarItem"> <a href="./distribution.html">Creating a Distribution</a>
+ </div>
+ <div class="navBarItem"> <a href="./release.html">Doing a UIMA release</a>
+ </div>
+ <div class="navBarItem"> <a href="https://www.apache.org/security/committers.html" target="_blank" rel="noopener">Doing a CVE (Apache) <img src="images/offsitelink.png"/></a>
+ </div>
+ <div class="navBarItem"> <a href="./eclipse-update-site.html">Eclipse Update Sites</a>
+ </div>
+ <div class="navBarItem"> <a href="./git.html">GIT</a>
+ </div>
+ <div class="navBarItem"> <a href="./codeConventions.html">Code Conventions</a>
+ </div>
+ <div class="navBarItem"> <a href="./uima-specification.html">UIMA Specification (OASIS)</a>
+ </div>
+ <div class="navBarItem"> <a href="./team-list.html">Project Team</a>
+ </div>
+ <div class="navBarItem"> <a href="./maven-design.html">Maven Use</a>
+ </div>
+ <div class="navBarItem"> <a href="./updating-website.html">Updating this Website</a>
+ </div>
+ </div>
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">Events and Conferences</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="./coling14.html">COLING 2014</a>
+ </div>
+ <div class="navBarItem"> <a href="./gscl13.html">GSCL 2013</a>
+ </div>
+ <div class="navBarItem"> <a href="./iks09.html">IKS 2009</a>
+ </div>
+ <div class="navBarItem"> <a href="./gscl09.html">GSCL 2009</a>
+ </div>
+ <div class="navBarItem"> <a href="./lsm09.html">LSM 2009</a>
+ </div>
+ <div class="navBarItem"> <a href="./lrec08.html">LREC 2008</a>
+ </div>
+ <div class="navBarItem"> <a href="./gldv07.html">GLDV 2007</a>
+ </div>
+ </div>
+ <br/>
+ <div class="navBarItem"> <div class="navPartHeading">ASF</div>
+ </div>
+ <div class="navBar">
+ <div class="navBarItem"> <a href="https://www.apache.org/licenses/" target="_blank" rel="noopener">License <img src="images/offsitelink.png"/></a>
+ </div>
+ <div class="navBarItem"> <a href="https://www.apache.org/foundation/thanks.html" target="_blank" rel="noopener">ASF Sponsors <img src="images/offsitelink.png"/></a>
+ </div>
+ <div class="navBarItem"> <a href="https://www.apache.org/foundation/sponsorship.html" target="_blank" rel="noopener">ASF Sponsorship <img src="images/offsitelink.png"/></a>
+ </div>
+ <div class="navBarItem"> <a href="./security_report">Security</a>
+ </div>
+ </div>
+ </div>
+ </td>
+ <td width="80%" align="left" valign="top">
+ <div class="sectionTable">
+ <table class="sectionTable">
+ <tr><td>
+ <a name="Working with Feature Structures"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> Working with Feature Structures</h1></a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="sectionBody">
+ <p>These work with all kinds of Feature Structures, Annotations and non-Annotations, both.</p>
+ <table class="subsectionTable">
+ <tr><td>
+
+
+
+ <a name="Remove all Feature Structures of a particular type">
+ <h2>Remove all Feature Structures of a particular type
+ </h2>
+ </a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="subsectionBody">
+ <p>There are built-in methods to do this, over all indexes in a particular view. There are 2 variations:
+ <ul><li>remove all including the subtypes of the type
+ <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre>
+ </li>
+ <li>remove all excluding the subtypes of the type
+ <pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul>
+ </p>
+ <p>Both of these are much faster than iterating over the Feature Structures; they directly clear the associated indexes.</p>
+ </blockquote>
+ </td></tr>
+ </table>
+ <table class="subsectionTable">
+ <tr><td>
+
+
+
+ <a name="General suggestions: working with iterators">
+ <h2>General suggestions: working with iterators
+ </h2>
+ </a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="subsectionBody">
+ <p>Many times code will iterate over all instances of a type, and only do something with a subset.
+ Frequently, the iteration can be cut short, by starting near the spot of interest and stopping as soon
+ as it can be determined that no further iteration will find interesting Annotations.</p>
+ <p>Example: Let's say you have a "token" annotation, and want to find the "sentence" that contains it.
+ You could write an iterator over all sentences.
+ </p>
+ <h3>Stop early</h3>
+ <p>
+ When you find the first sentence that overlaps the token, you can use extra knowledge that you might have,
+ such as: there's only one sentence per token, to conclude that having found it, there's no need to do any
+ further iteration, so you can stop the iteration.
+ </p>
+ <p>Furthermore, if the token appears outside of any sentence, you can similarly stop the iteration, and return
+ an "empty" result, as soon as the test sentence begins after the token's "begin".
+ This is because, at that point, due to the sorting of the returned values, no future sentences could
+ start before or equal to the token's begin.
+ </p>
+ <h3>Begin closer to the right spot, maybe iterate backwards</h3>
+ <p>But you can do better.</p>
+ <p>You can start the iteration, instead of at the beginning, at the position of the token, and iterate backwards.
+ Iterators have a moveTo() method which takes a feature structure argument, so you can moveTo(the-token),
+ and then perhaps with some edge adjustment for equality, start iterating backwards, looking for the sentence at that
+ position that covers the token.
+ </p>
+ <p>If you are iterating backwards, and looking for a "covering" annotation, and know the largest span for that
+ covering type, then you can stop iterating as soon as the start position you reach, + the largest span, is less than
+ the start of the annotation you're trying to cover.</p>
+ <p style="margin-left:1rem">This is used internally in version 3's
+ <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect">select framework</a>
+ to speed up
+ the <code>covering</code> kind of iteration.</p>
+ <p>There are many other examples, but the principle is the same: start the iteration "close to" the right spot,
+ perhaps moving backwards instead of forwards, and end the iteration as soon as you can logically say that
+ no more suitable feature structures would be found. </p>
+ <h3>Use UIMA Version 3's select framework</h3>
+ <p>The <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>
+ incorporates many of the popular use cases for doing iterations that we've seen, into a Java friendly approach that
+ automatically uses optimized iterators and can produce Java Streams, as well.</p>
+ </blockquote>
+ </td></tr>
+ </table>
+ </blockquote>
+ </p>
+ </td></tr>
+ </table>
+ <div class="sectionTable">
+ <table class="sectionTable">
+ <tr><td>
+ <a name="Working with Annotations"><h1><img src="images/UIMA_4sq50tightCropSolid.png"/> Working with Annotations</h1></a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="sectionBody">
+ <ul>
+ <li><a href='#Watch out for type-priorites'>
+ Watch out for type-priorites
+
+ </a></li>
+ <li><a href='#Annotation containment'>
+ Annotation containment
+
+ </a></li>
+ <li><a href='#Adjusting an existing annotation's begin and end'>
+ Adjusting an existing annotation's begin and end
+
+ </a></li>
+ <li><a href='#Avoid where possible, copying sets of Feature Structures'>
+ Avoid where possible, copying sets of Feature Structures
+
+ </a></li>
+ </ul>
+ <p>
+ The CAS holds Feature Structures (FSs). There is special support for FSs which are a subtype of Annotation;
+ these have an associated Subject of Analysis (Sofa) and <code>begin</code> and <code>end</code> offsets.
+ </p>
+ <h3>Annotations are not required in all cases</h3>
+ <p>If your application deals with a different kind of unstructured data, say, for instance, images, then
+ Annotations may not be the appropriate supertype for your types, because they're designed for
+ things having a linear begin / end meaningful demarcations. </p>
+ <p>You can have your feature structures inherit from TOP, or from some other appropriate supertype, other
+ than Annotation.</p>
+ <h3>Making use of the built-in Annotation index</h3>
+ <p>Annotations are special in UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used
+ to rapidly access these in a sorted order. The ordering is by <code>begin</code> (ascending), then by
+ <code>end</code> (descending), and then by type-priorities.</p>
+ <p style="margin-left:1rem"><i>This is really a set of indexes, one for each subtype of Annotation.</i></p>
+ <p style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, the <code>select-framework</code>
+ by default ignores these; this behavior can be overridden.</i></p>
+ <table class="subsectionTable">
+ <tr><td>
+
+
+
+ <a name="Watch out for type-priorites">
+ <h2>Watch out for type-priorites
+ </h2>
+ </a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="subsectionBody">
+ <p>When 2 annotations have the same start and end, but different types, then one comes before the other,
+ according to type priorites. This is intended to allow you to say if you have a Sentence annotation, and a
+ Foo annotation, both covering the same span, to declare that the Sentence logically contains Foo, and not the
+ other way around.</p>
+ <p>To make this work, you need to specify the type priorities. This is a global setting for your application.
+ See
+ <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive">
+ type priorities</a> (scroll down to find it) for how to specify this.</p>
+ <h3>Avoiding type priorities</h3>
+ <p>Often, the use of type priorities gets in the way. With UIMA Version 3, the
+ <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>
+ by default ignores type priorites when doing its operations; but this can be overridden as needed.</p>
+ </blockquote>
+ </td></tr>
+ </table>
+ <table class="subsectionTable">
+ <tr><td>
+
+
+
+ <a name="Annotation containment">
+ <h2>Annotation containment
+ </h2>
+ </a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="subsectionBody">
+ <h3>a contains b</h3>
+ <ul><li>Ignoring type priorities:</li></ul>
+ <pre>a != null && b != null && // null check
+a.getBegin() <= b.getBegin() && // a starts before (or equal to) b
+a.getEnd() >= b.getEnd() // a ends after (or equal to) b</pre>
+ <h3>a and b overlap (have at least one char in common)</h3>
+ <pre>
+ // ((omitted) check for non-null)
+if (a.getBegin() <= b.getBegin()) { // if a starts before (or equal to) b
+ return a.getEnd() > b.getBegin(); // then it overlaps if a's end is after b's begin
+} else { // otherwise, b's begin is before a's begin
+ return b.getEnd() > a.getBegin(); // so it overlaps if b's end is after a's begin.
+</pre>
+ </blockquote>
+ </td></tr>
+ </table>
+ <table class="subsectionTable">
+ <tr><td>
+
+
+
+ <a name="Adjusting an existing annotation's begin and end">
+ <h2>Adjusting an existing annotation's begin and end
+ </h2>
+ </a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="subsectionBody">
+ <p>Sometimes, your code may want to adjust an annotations begin and end values.
+ If the annotation is not indexed, there's no issue - just change the value.
+ But if it is indexed, it's in index(es) in a position determined by its begin and end position, so if you
+ change these, the item needs to be reindexed (in all the indexes holding it). Typically, only one index
+ (the Annotation Index for a particular CAS View) is involved, but in general, there could be multiple
+ indexes involved.</p>
+ <p>If you are using UIMA version 2.7.0 or later, the UIMA
+ <a target="_blank" rel="nopener" href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">framework</a>
+ detects updates that would need this re-indexing, and
+ automatically removes the Feature Structure from all involved index(es), updates the Feature, and then adds the Feature Structure back to the index(es).
+ </p>
+ <p>You can improve the efficiency of this, if you are updating, say, both the begin and end value of an annotation, by
+ doing this yourself, in your code.
+ <ul><li>Removing the item from the index(es)</li>
+ <li>Doing both updates</li>
+ <li>Adding the item back into the index(es)</li></ul>.
+ More details <a target="_blank" rel="nopener" href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">here</a>.
+ </p>
+ <p>Example: if you know a particular annotation is only indexed in one view,
+ then you can update it's begin and end features using
+ <pre>a.<b>removeFsFromIndexes</b>();
+
+ a.setBegin(new_value_begin);
+ a.setEnd(new_value_end);
+
+a.<b>addToIndexes</b>();</pre>
+This is the most efficient way to do this.
+ </p>
+ <p>There's a couple of special forms you can use to protect indexes while you're updating features used as keys.
+ This is useful when you're not sure what feature values might be used as keys in some index.
+ <pre>
+try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) {
+ // ... arbitrary user code which updates features
+ // which may be "keys" in one or more indexes, e.g.
+
+ a.setBegin(new_value_begin);
+ a.setEnd(new_value_end);
+}</pre>
+or
+<pre>
+my_cas.<b>protectIndexes</b>(() -> {
+ // ... arbitrary user code updating "key" features,
+ // but no checked exceptions are permitted
+ // (because inside a lambda)
+
+ a.setBegin(new_value_begin);
+ a.setEnd(new_value_end);
+ });</pre>
+ These use the frameworks automatic detection mechanism, and removes Feature Structures from all involved indexes
+ if needed, but delays adding them back, until the end of the protected section.
+ </p>
+ </blockquote>
+ </td></tr>
+ </table>
+ <table class="subsectionTable">
+ <tr><td>
+
+
+
+ <a name="Avoid where possible, copying sets of Feature Structures">
+ <h2>Avoid where possible, copying sets of Feature Structures
+ </h2>
+ </a>
+ </td></tr>
+ <tr><td>
+ <blockquote class="subsectionBody">
+ <p>Operations which iterate over Feature Structures, and put them into a Collection or List, and then
+ iterate over that list to do some other operations, can often be done directly on the Feature Structures in the CAS,
+ omitting the first copying of them into a list.
+ </p>
+ <p>A frequent speedup can happen when the particular logic can detect when no further items in a (sorted) index
+ are needed, and the iteration can be stopped early.</p>
+ <p>For example, you might have code which iterates over all feature structures of a particular type, and puts these into a list,
+ and then goes thru the list, and picks out certain ones and put those into another list, which is then returned.
+ </p>
+ <p>The first copying can be omitted, by moving the logic of what to include into the first iteration, and producing the second
+ list directly.</p>
+ <p>In UIMA Version 3, you can make use of the <a target="_blank" rel="noopener" href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>.
+ It already has many of the use-cases where you might want to start or exit an iteration, accounted for.
+ You can also use its ability to produce streams, and combine that with Java's takeWhile method, to exit a stream early.
+ </p>
+ </blockquote>
+ </td></tr>
+ </table>
+ </blockquote>
+ </p>
+ </td></tr>
+ </table>
+ </td>
+ </tr>
+ <!-- FOOTER -->
+ <tr><td colspan="2">
+ <hr noshade="" size="1"/>
+ </td></tr>
+ <tr><td colspan="2">
+ <table class="pageFooter">
+ <tr>
+ <td><a href="index.html">Home</a></td>
+ <td><a href="privacy-policy.html">Privacy Policy</a></td>
+ <td style="font-size:75%">
+ Copyright © 2006-2013, The Apache Software Foundation.<br/>
+ Apache UIMA, UIMA, the Apache UIMA logo and the Apache Feather logo are trademarks of The Apache Software Foundation.<br/>
+ All other marks mentioned may be trademarks or registered trademarks of their respective owners.
+ </td>
+ <td><a href="mailto:dev@uima.apache.org">Contact us</a></td>
+ </tr>
+ </table>
+ </td></tr>
+ </table>
+ </body>
+ </html>
+
Added: uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml
URL: http://svn.apache.org/viewvc/uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml?rev=1870165&view=auto
==============================================================================
--- uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml (added)
+++ uima/site/trunk/uima-website/xdocs/doc-uimaj-cookbook.xml Fri Nov 22 15:29:22 2019
@@ -0,0 +1,252 @@
+<?xml version="1.0" encoding="ISO-8859-1"?>
+
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements. See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership. The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ https://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied. See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<document>
+
+ <properties>
+ <title>Cookbook: addressing some typical use-cases</title>
+ <author email="dev@uima.apache.org">
+ Apache UIMA Team
+ </author>
+ </properties>
+
+ <body>
+
+ <section name="Working with Feature Structures">
+ <p>These work with all kinds of Feature Structures, Annotations and non-Annotations, both.</p>
+
+ <subsection name="Remove all Feature Structures of a particular type">
+ <p>There are built-in methods to do this, over all indexes in a particular view. There are 2 variations:
+ <ul><li>remove all including the subtypes of the type
+ <pre>myJCasView.removeAllIncludingSubtypes(Foo.type)</pre>
+ </li>
+ <li>remove all excluding the subtypes of the type
+ <pre>myJCasView.removeAllExcludingSubtypes(Foo.type)</pre></li></ul>
+ </p>
+ <p>Both of these are much faster than iterating over the Feature Structures; they directly clear the associated indexes.</p>
+ </subsection>
+
+ <subsection name="General suggestions: working with iterators">
+
+
+ <p>Many times code will iterate over all instances of a type, and only do something with a subset.
+ Frequently, the iteration can be cut short, by starting near the spot of interest and stopping as soon
+ as it can be determined that no further iteration will find interesting Annotations.</p>
+
+ <p>Example: Let's say you have a "token" annotation, and want to find the "sentence" that contains it.
+ You could write an iterator over all sentences.
+ </p>
+ <h3>Stop early</h3>
+ <p>
+ When you find the first sentence that overlaps the token, you can use extra knowledge that you might have,
+ such as: there's only one sentence per token, to conclude that having found it, there's no need to do any
+ further iteration, so you can stop the iteration.
+ </p>
+
+ <p>Furthermore, if the token appears outside of any sentence, you can similarly stop the iteration, and return
+ an "empty" result, as soon as the test sentence begins after the token's "begin".
+ This is because, at that point, due to the sorting of the returned values, no future sentences could
+ start before or equal to the token's begin.
+ </p>
+
+ <h3>Begin closer to the right spot, maybe iterate backwards</h3>
+
+ <p>But you can do better.</p>
+ <p>You can start the iteration, instead of at the beginning, at the position of the token, and iterate backwards.
+ Iterators have a moveTo() method which takes a feature structure argument, so you can moveTo(the-token),
+ and then perhaps with some edge adjustment for equality, start iterating backwards, looking for the sentence at that
+ position that covers the token.
+ </p>
+
+ <p>If you are iterating backwards, and looking for a "covering" annotation, and know the largest span for that
+ covering type, then you can stop iterating as soon as the start position you reach, + the largest span, is less than
+ the start of the annotation you're trying to cover.</p>
+
+ <p style="margin-left:1rem">This is used internally in version 3's
+ <a target="_blank" rel="noopener"
+ href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select.annot.subselect">select framework</a>
+ to speed up
+ the <code>covering</code> kind of iteration.</p>
+
+ <p>There are many other examples, but the principle is the same: start the iteration "close to" the right spot,
+ perhaps moving backwards instead of forwards, and end the iteration as soon as you can logically say that
+ no more suitable feature structures would be found. </p>
+
+ <h3>Use UIMA Version 3's select framework</h3>
+ <p>The <a target="_blank" rel="noopener"
+ href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>
+ incorporates many of the popular use cases for doing iterations that we've seen, into a Java friendly approach that
+ automatically uses optimized iterators and can produce Java Streams, as well.</p>
+ </subsection>
+
+ </section>
+
+ <section
+ name="Working with Annotations">
+
+ <subsectionToc/>
+ <p>
+ The CAS holds Feature Structures (FSs). There is special support for FSs which are a subtype of Annotation;
+ these have an associated Subject of Analysis (Sofa) and <code>begin</code> and <code>end</code> offsets.
+ </p>
+
+ <h3>Annotations are not required in all cases</h3>
+ <p>If your application deals with a different kind of unstructured data, say, for instance, images, then
+ Annotations may not be the appropriate supertype for your types, because they're designed for
+ things having a linear begin / end meaningful demarcations. </p>
+ <p>You can have your feature structures inherit from TOP, or from some other appropriate supertype, other
+ than Annotation.</p>
+
+ <h3>Making use of the built-in Annotation index</h3>
+ <p>Annotations are special in UIMA in that there is a "built-in" index, the AnnotationIndex, which can be used
+ to rapidly access these in a sorted order. The ordering is by <code>begin</code> (ascending), then by
+ <code>end</code> (descending), and then by type-priorities.</p>
+ <p style="margin-left:1rem"><i>This is really a set of indexes, one for each subtype of Annotation.</i></p>
+ <p style="margin-left:1rem"><i>Although the index has type-priorities, in UIMA v3, the <code>select-framework</code>
+ by default ignores these; this behavior can be overridden.</i></p>
+
+ <subsection name="Watch out for type-priorites">
+ <p>When 2 annotations have the same start and end, but different types, then one comes before the other,
+ according to type priorites. This is intended to allow you to say if you have a Sentence annotation, and a
+ Foo annotation, both covering the same span, to declare that the Sentence logically contains Foo, and not the
+ other way around.</p>
+
+ <p>To make this work, you need to specify the type priorities. This is a global setting for your application.
+ See
+ <a target="_blank" rel="noopener"
+ href="http://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.aes.primitive">
+ type priorities</a> (scroll down to find it) for how to specify this.</p>
+
+ <h3>Avoiding type priorities</h3>
+ <p>Often, the use of type priorities gets in the way. With UIMA Version 3, the
+ <a target="_blank" rel="noopener"
+ href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>
+ by default ignores type priorites when doing its operations; but this can be overridden as needed.</p>
+ </subsection>
+
+ <subsection name="Annotation containment">
+ <h3>a contains b</h3>
+ <ul><li>Ignoring type priorities:</li></ul>
+ <pre>a != null && b != null && // null check
+a.getBegin() <= b.getBegin() && // a starts before (or equal to) b
+a.getEnd() >= b.getEnd() // a ends after (or equal to) b</pre>
+
+ <h3>a and b overlap (have at least one char in common)</h3>
+ <pre>
+ // ((omitted) check for non-null)
+if (a.getBegin() <= b.getBegin()) { // if a starts before (or equal to) b
+ return a.getEnd() > b.getBegin(); // then it overlaps if a's end is after b's begin
+} else { // otherwise, b's begin is before a's begin
+ return b.getEnd() > a.getBegin(); // so it overlaps if b's end is after a's begin.
+</pre>
+ </subsection>
+
+
+ <subsection name="Adjusting an existing annotation's begin and end">
+ <p>Sometimes, your code may want to adjust an annotations begin and end values.
+ If the annotation is not indexed, there's no issue - just change the value.
+ But if it is indexed, it's in index(es) in a position determined by its begin and end position, so if you
+ change these, the item needs to be reindexed (in all the indexes holding it). Typically, only one index
+ (the Annotation Index for a particular CAS View) is involved, but in general, there could be multiple
+ indexes involved.</p>
+
+ <p>If you are using UIMA version 2.7.0 or later, the UIMA
+ <a target="_blank" rel="nopener"
+ href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">framework</a>
+ detects updates that would need this re-indexing, and
+ automatically removes the Feature Structure from all involved index(es), updates the Feature, and then adds the Feature Structure back to the index(es).
+ </p>
+
+ <p>You can improve the efficiency of this, if you are updating, say, both the begin and end value of an annotation, by
+ doing this yourself, in your code.
+ <ul><li>Removing the item from the index(es)</li>
+ <li>Doing both updates</li>
+ <li>Adding the item back into the index(es)</li></ul>.
+ More details <a target="_blank" rel="nopener"
+ href="https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.cas.updating_indexed_feature_structures">here</a>.
+ </p>
+
+ <p>Example: if you know a particular annotation is only indexed in one view,
+ then you can update it's begin and end features using
+ <pre>a.<b>removeFsFromIndexes</b>();
+
+ a.setBegin(new_value_begin);
+ a.setEnd(new_value_end);
+
+a.<b>addToIndexes</b>();</pre>
+This is the most efficient way to do this.
+ </p>
+
+ <p>There's a couple of special forms you can use to protect indexes while you're updating features used as keys.
+ This is useful when you're not sure what feature values might be used as keys in some index.
+ <pre>
+try (AutoCloseable ac = my_cas.<b>protectIndexes</b>()) {
+ // ... arbitrary user code which updates features
+ // which may be "keys" in one or more indexes, e.g.
+
+ a.setBegin(new_value_begin);
+ a.setEnd(new_value_end);
+}</pre>
+or
+<pre>
+my_cas.<b>protectIndexes</b>(() -> {
+ // ... arbitrary user code updating "key" features,
+ // but no checked exceptions are permitted
+ // (because inside a lambda)
+
+ a.setBegin(new_value_begin);
+ a.setEnd(new_value_end);
+ });</pre>
+ These use the frameworks automatic detection mechanism, and removes Feature Structures from all involved indexes
+ if needed, but delays adding them back, until the end of the protected section.
+ </p>
+
+ </subsection>
+
+ <subsection name="Avoid where possible, copying sets of Feature Structures">
+
+ <p>Operations which iterate over Feature Structures, and put them into a Collection or List, and then
+ iterate over that list to do some other operations, can often be done directly on the Feature Structures in the CAS,
+ omitting the first copying of them into a list.
+ </p>
+
+ <p>A frequent speedup can happen when the particular logic can detect when no further items in a (sorted) index
+ are needed, and the iteration can be stopped early.</p>
+
+ <p>For example, you might have code which iterates over all feature structures of a particular type, and puts these into a list,
+ and then goes thru the list, and picks out certain ones and put those into another list, which is then returned.
+ </p>
+
+ <p>The first copying can be omitted, by moving the logic of what to include into the first iteration, and producing the second
+ list directly.</p>
+
+ <p>In UIMA Version 3, you can make use of the <a target="_blank" rel="noopener"
+ href="http://uima.apache.org/d/uimaj-current/version_3_users_guide.html#uv3.select">select framework</a>.
+ It already has many of the use-cases where you might want to start or exit an iteration, accounted for.
+ You can also use its ability to produce streams, and combine that with Java's takeWhile method, to exit a stream early.
+ </p>
+
+ </subsection>
+ </section>
+ </body>
+
+</document>
+