You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2011/08/15 03:23:03 UTC

svn commit: r1157694 - /uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml

Author: schor
Date: Mon Aug 15 01:23:03 2011
New Revision: 1157694

URL: http://svn.apache.org/viewvc?rev=1157694&view=rev
Log:
[UIMA-2212] update slightly the docs for Result Spec - trying to make it a bit more clear.

Modified:
    uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml

Modified: uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml
URL: http://svn.apache.org/viewvc/uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml?rev=1157694&r1=1157693&r2=1157694&view=diff
==============================================================================
--- uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml (original)
+++ uima/uimaj/trunk/uima-docbook-tutorials-and-users-guides/src/docbook/annotator_analysis_engine_guide.xml Mon Aug 15 01:23:03 2011
@@ -2015,8 +2015,28 @@ catch (PatternSyntaxException e) {
     <section id="ugr.tug.aae.result_specification_setting">
       <title>Result Specifications</title>
       
+      <para>Annotators often are written to do a lot of computation and produce a lot of different outputs.
+      For example, a tokenizer can, in addition to identifying tokens, look them up in dictionaries, create 
+      lemma forms (dropping suffexes and prefixes), etc.  Result Specifications provide a way to dynamically
+      specify what results are desired for a particular CAS being processed.</para>
+      
+      <para>It is up to the annotator writer to take advantage of the result specification; using it is optional.
+      If it is used, the annotator writer checks if a particular output is wanted, by asking the result specification
+      if it contains a specific Type and/or Feature.  If it does, then the annotator produces that type/feature; if not,
+      it skips the computations for producing that type/feature.</para>
+      
+      <para>The Result Specification querying may 
+      include the language.  A typical use case:  The CAS contains a document written in some language, and some
+      upstream Annotator has discovered what this language is.  
+      The Annotator extracts the previously discovered language specification from the CAS and 
+      then includes it when querying the Result Specification.  The exact method of encoding 
+      language specifications in the CAS is left up to annotator developers; however,
+      the framework provides a commonly used type for this - the org.apache.uima.tcas.DocumentAnnotation
+      type.</para>
+      
       <para>The Result Specification is passed to the annotator instance by calling its
-        setResultSpecificaiton method. When called, the default implementation saves the
+        setResultSpecificaiton method (this call is typically done by the framework, based on Capability specifications). 
+        When called, the default implementation saves the
         result specification in an instance variable of the Annotator instance, which can be
         accessed by the annotator using the protected
         <literal>getResultSpecification()</literal> method.</para>
@@ -2032,27 +2052,28 @@ catch (PatternSyntaxException e) {
       specifiable in Capability Specifications; examples include "en" for English, "en-uk" for
       British English, etc.  There is also a language type, "x-unspecified", which is presumed
       if no language specification(s) are given.</para>
-      
-      <para>Result Specifications can be queryed by the Annotator code, and the query may 
-      include the language.  If it doesn't include the language, it is treated as if the 
+           
+      <para>If a query of the Result Specification doesn't include a language, it is treated as if the 
       language "x-unspecified" was specified.  Language matching is hierarchically defaulted,
-      in one direction: if a query asks about a type T for language "en-uk", it will match
-        for languages "en-uk", "en", or "x-unspecified".  However the reverse is not true:
-        If the query asks about a type T for language "x-unspecified", then it only 
-        matches Result Specifications with no language (or "x-unspecified", which is equivalent).
+      in one direction: if a query includes the language "en-uk", meaning that the document
+      being processed is in that language, it will match
+        Result Specifications whose languages "en-uk", "en", or "x-unspecified".  In other words, if the 
+        Result Specifications say to produce output if the actual document's language
+        is en-uk, or en, or x-unspecified, then having the actual document's language be
+        en-uk would "match" any of these Result Specifications. However the reverse is not true:
+        If the query asks about producing output if the actual document's language is "x-unspecified", 
+        then it would not match if the Result Specification said to produce output only if the 
+        actual document is en-uk or en;  the Result Specification would need to say to 
+        produce output for "x-unspecified).
         </para>
       
-      <para>
-      The effect of this is that if the Result Specification indicates it wants output
+      <para>If the Result Specification indicates it wants output
       produced for "en-uk", but the annotator is given a language which is unknown, 
         or one that is known, but isn't "en-uk", then the query (using the language 
-        of the document) will 
-      return false.   This is true even if the language is "en".  
+        of the document) will return false.   This is true even if the language is "en".  
         However, if the Result Specification indicates it wants output for "en", 
-      and the query is for "en-uk" (presumably because that's the language of the document
-      and the annotator can handle that especially well), then the query will return true.
-    </para> 
-      
+      and the query is for a document whose language is "en-uk" then the query will return true.
+    </para>      
       
       <para>Sometimes you can specify the Result Specification; othertimes, you cannot
         (for instance, inside a Collection Processing Engine, you cannot). When you cannot
@@ -2114,7 +2135,8 @@ catch (PatternSyntaxException e) {
           <emphasis>all</emphasis> input types and features of
           <emphasis>all</emphasis> component AnalysisEngines within that aggregate. This forms the
           complete set of types and features that any component of the aggregate might need to
-          produce. This derived Result Specification is then passed to the
+          produce. This derived Result Specification is then intersected with the 
+          delegate's output capabilities, and the result is passed to the
           <code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
           of each component AnalysisEngine. In the case of nested aggregates, this procedure
           is applied recursively.</para>