You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Marshall Schor <ms...@schor.com> on 2006/11/25 17:11:52 UTC

Result specification - update needed

I need to write up the version 2 tutorial and user's guide for Results 
Specification.  The current write up is inaccurate, I think.  I started 
to change it to fit the new API where it is not passed in as a 
parameter, but there are more things that need fixing.

Could Adam and/or Thilo take a look at this write up and fix it up?  
(see below):

-Marshall

<section id="ugr.tug.aae.result_specification_setting">
      <title>Result Specification Setting</title>
     
      <para>The Result Specification is passed to the annotator instance 
by calling its
        setResultSpecificaiton method. When called, the default 
implementation saves the
        result specification in an instance variable of the Annotator 
instance.</para>
     
      <para>A results specification is a list of output types and / or 
type:feature
        specifications, which are expected to be <quote>output</quote> 
from the
        annotator. Annotators may use this to optimize their operations, 
when possible, for
        those cases where only particular outputs are wanted. The 
interface to the Result
        Specification object (see the JavaDocs) allows querying both 
types and particular
        features of types.</para>
     
      <para>Sometimes you can specify the Result Specification; 
othertimes, you cannot
        (for instance, inside a Collection Processing Engine, you 
cannot). When you cannot
        specify it, or choose not to specify it (for example, using the 
form of the
        process(...) call on an Analysis Engine that doesn&apos;t 
include the Result
        Specification), a <quote>Default</quote> Result Specification is 
used.</para>
     
    </section>
   
    <section><title>Default ResultSpecification</title>
     
      <para>The default Result Specification is taken from the 
Engine&apos;s output
        Capability Specification. Remember that a Capability 
Specification has both
        inputs and outputs, can specify types and / or features, and 
there can be more than one
        Capability Set. If there is more than one set, the logical union 
of these sets is used.
        The default Result Specification is exactly what&apos;s included 
in the output
        Capability Specification.</para>
     
    </section>
   
    <section><title>Passing Result Specifications to Annotators</title>
     
      <para>If you are not using aggregation or collection processing, 
but instead are
        instantiating your own primitive analysis engines and calling 
their process
        methods, you can pass whatever Result Specification is 
appropriate in your call to
        process(CAS, ResultSpecification). For primitive engines, 
whatever you pass in is
        passed along as the value of the 2nd argument in the 
annotator&apos;s process()
        method. If you use the form of the call without the Result 
Specification, the default
        Result Specification is created and passed, as above.</para>
    </section>
   
    <section><title>Aggregates</title>
     
      <para>For aggregate engines, the value passed to the primitive 
annotator code depends
        on the kind of flow.</para>
    </section>
   
    <section><title>Fixed Flow</title>
     
      <para>For FixedFlow, any ResultSpecification passed into the 
aggregate is ignored,
        and instead, each primitive annotator is passed a result spec 
that corresponds to the
        union of its output capability specifications at the primitive 
descriptor level. If
        no output capability specification is given, the annotator will 
still be called, but
        the result specification will be empty.</para>
     
    </section>
   
    <section><title>CapabilityLanguageFlow</title>
      <para>For CapabilityLanguageFlow, each annotator is passed a 
ResultSpecification
        that is the intersection of the primitive annotator&apos;s 
output Capability
        Specification with the ResultSpecification passed to the 
aggregate. If this
        intersection is null (the annotator does not produce any type or 
feature included in
        the ResultSpecification), the annotator will not be called at 
all.</para>
     
      <para>Therefore, if using the CapabilityLanguageFlow, if you want 
to supply a custom
        ResultSpecification for the aggregate it must include any 
intermediate types that
        need to be produced internally in the flow, or else things will 
not work
        properly.</para>
    </section>
   
    <section><title>Special rule for skipping Analysis Engines</title>
     
      <para>When using the CapabilityLanguageFlow, an annotator will be 
also be skipped if
        all of its outputs are in the output capability of some 
annotator(s) that has (have)
        executed previously in the flow. The concept here is that if all 
of an
        annotator&apos;s output types have already been produced, that 
annotator will not
        be called.</para>
     
      <para>For an Aggregate, each annotator is passed a Result 
Specification that is the
        intersection of the set of types mentioned in its output with 
the Result
        Specification passed to the aggregate. If this intersection is 
null (the annotator
        does not produce any type included in the ResultSpecification), 
the annotator will
        not be called at all.</para>
     
      <para>Therefore, if using the CapabilityLanguageFlow, if you want 
to supply a custom
        ResultSpecification for the aggregate it must include any 
intermediate types that
        need to be produced, or else things will not work properly.</para>
    </section>
   
    <section><title>Collection Proessing Engines</title>
     
      <para>The Default Result Specification is always used for all 
components of a
        Collection Processing Engine.</para>
     
    </section>

Re: Result specification - update needed

Posted by Adam Lally <al...@alum.rpi.edu>.
On 11/25/06, Marshall Schor <ms...@schor.com> wrote:
> I need to write up the version 2 tutorial and user's guide for Results
> Specification.  The current write up is inaccurate, I think.  I started
> to change it to fit the new API where it is not passed in as a
> parameter, but there are more things that need fixing.
>
> Could Adam and/or Thilo take a look at this write up and fix it up?
> (see below):
> <snip/>

Yes, this needed an overhaul.  Result Specifcation handling in
aggregates no longer has anything to do with the type of flow.  Here's
my suggested documentation (note I used <code/> tags for monospace
font as in HTML, I have no idea if that's right for docbook):

<section id="ugr.tug.aae.result_specification_setting">
	<title>Result Specification Setting</title>
	
	<para>The Result Specification is passed to the annotator instance by
calling its
		setResultSpecificaiton method. When called, the default
implementation saves the
		result specification in an instance variable of the Annotator
instance, which can be
		accessed by the annotator using the protected
		<code>getResultSpecification()</code> method.</para>
	
	<para>A Result Specification is a list of output types and / or type:feature
		names, which are expected to be
		<quote>output</quote> from the annotator. Annotators may use this to optimize
		their operations, when possible, for those cases where only
particular outputs are
		wanted. The interface to the Result Specification object (see the
JavaDocs) allows
		querying both types and particular features of types.</para>
	
	<para>Sometimes you can specify the Result Specification; othertimes,
you cannot (for
		instance, inside a Collection Processing Engine, you cannot). When you cannot
		specify it, or choose not to specify it (for example, using the form of the
		process(...) call on an Analysis Engine that doesn&apos;t include the Result
		Specification), a
		<quote>Default</quote> Result Specification is used.</para>
	
</section>

<section>
	<title>Default ResultSpecification</title>
	
	<para>The default Result Specification is taken from the Engine&apos;s output
		Capability Specification. Remember that a Capability Specification has both
		inputs and outputs, can specify types and / or features, and there
can be more than one
		Capability Set. If there is more than one set, the logical union of
these sets is used.
		The default Result Specification is exactly what&apos;s included in the output
		Capability Specification.</para>
	
</section>

<section>
	<title>Passing Result Specifications to Analysis Engines</title>
	
	<para>If you are not using a Collection Processing Engine, you can
specify a Result Specification
		for your AnalysisEngine(s) by calling the
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
		method.</para>
	<para>It is also possible to pass a Result Specification on each call to
		<code>AnalysisEngine.process(CAS, ResultSpecification)</code>.
However, this is not recommended
		if your Result Specification will stay constant across multiple
calls to <code>process</code>.
		In that case it will be more efficient to call
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
		only when the Result Specification changes.</para>
	<para>		
		For primitive Analysis Engines, whatever Result Specification you pass in is
		passed along to the annotator's
<code>setResultSpecification(ResultSpecification)</code>
		method.  For aggregate Analysis Engines, see below.</para>
</section>

<section>
	<title>Aggregates</title>
	
	<para>For aggregate engines, the Result Specification passed to the
		<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
method is intended
		to specify the set of output types/features that the aggregate
should produce.  This is not
		necessarily equivalent to the set of output types/features that each
annotator should produce.
		For example, an annotator may need to produce an intermediate type
that is then consumed
		by a downstream annotator, even though that intermediate type is not
part of the Result
		Specification.</para>
	<para>To handle this situation, when
<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
is called on
		an aggregate, the framework computes the union of the passed Result
Specification with the set of
		<emph>all</emph> input types and features of <emph>all</emph>
component AnalysisEngines within that
		aggregate.  This forms the complete set of types and features that
any component of the aggregate
		might need to produce.  This derived Result Specification is then
passed to the
		<code>AnalysisEngine.setResultSpecification(ResultSpecification)</code>
of each component AnalysisEngine.
		In the case of nested aggregates, this procedure is applied
recursively.</para>
</section>

<section>
	<title>Collection Proessing Engines</title>
	
	<para>The Default Result Specification is always used for all
components of a Collection
		Processing Engine.</para>
</section>

<!--
	This no longer belongs as part of the discussion of Rsult Specifications.
	The CapabilityLanguageFlow now skips annotators on the basis of their complete
	capabilities, it does not take the Result Specification into account.
	Result Specifications are no longer the concern of the Flow
Controller, since this
	was deemed to be too great a complexity without enough benefit.
	
<section>
	<title>Special rule for skipping Analysis Engines</title>
	
	<para>When using the CapabilityLanguageFlow, an annotator will be
also be skipped if all
		of its outputs are in the output capability of some annotator(s)
that has (have)
		executed previously in the flow. The concept here is that if all of an
		annotator&apos;s output types have already been produced, that
annotator will not
		be called.</para>
	
	<para>For an Aggregate, each annotator is passed a Result
Specification that is the
		intersection of the set of types mentioned in its output with the Result
		Specification passed to the aggregate. If this intersection is null
(the annotator
		does not produce any type included in the ResultSpecification), the
annotator will
		not be called at all.</para>
	
	<para>Therefore, if using the CapabilityLanguageFlow, if you want to
supply a custom
		ResultSpecification for the aggregate it must include any
intermediate types that
		need to be produced, or else things will not work properly.</para>
</section>
-->