You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by "LeHouillier, Frank D." <Fr...@gd-ais.com> on 2008/05/29 14:50:54 UTC

Question about Capabilities

If I have UIMA type system with a supertype FS and a subtype FS and an
Analysis engine whose capabilities only list the supertype, should the
subtype still be accessible? As of 2.2.1, the Feature Structure appears
to keep its most specific type.  If I have an aggregate that specifies
the supertype as an input, is the subtype also available?  What about
features of the subtype but not of the supertype?  When I'm running with
such a type system the features of the subtype are apparently carried in
the CAS, even though I no longer need it.  Is this the expected
behaviour? 

 

Thanks,

Frank

RE: Question about Capabilities

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.

Thanks for the clarification Marshall.  I did run some test cases and
they all confirm your description.

Frank

-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com] 
Sent: Thursday, May 29, 2008 12:40 PM
To: uima-user@incubator.apache.org
Subject: Re: Question about Capabilities

I don't think that the serialization step does any filtering (please let

me know if you have a test case showing that it does) other than not 
outputing types which are not indexed and which cannot be reached from 
types which are indexed.  The viewer does have a filter. The Document 
Analyzer code constructs a list of types to display from all the types, 
by filtering those with the set which are designated as "outputs".  This

is not a core framework filtering, it is rather just something this 
particular application (the Document Analyzer) decided to do.  The 
filter includes all subtypes of types in the Output capabilities.

-Marshall

Re: Question about Capabilities

Posted by Marshall Schor <ms...@schor.com>.

LeHouillier, Frank D. wrote:
> Sorry for not being clear.  I was unclear from the documentation as to
> whether the intent of the capabilities section of the Analysis Engine
> Descriptors was to provide external guarantees provided by the framework
> on the input/outputs to the annotator code or to provide a means for the
> annotator code to determine internally which FS types to view and/or
> produce (via the ResultsSpecification).
Currently, the input/output capabilities are not used by the framework 
for providing external guarantees - they are used primarily for 
configuring default ResultSpecifications and for the built-in Capability 
Language Flow (flow controller).  Because the input/output specs are 
available metadata, application code and tooling can make use of these - 
and that's something that the DocumentAnalyzer does. 

Regarding the result specifications - an application can explicitly 
provide result specifications when calling the analysis engine process 
method; if that's not done, then the framework constructs a default 
Result Specification, from input/output capabilities.  See 
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting
for more details.
>  
>
> So take the case of somebody using a primitive Analysis Engine with the
> Document Analyzer.  
Well, the Document Analyzer is a special case :-)  It looks at the 
output capabilities (any application can do this) and won't display 
types that are present in the CAS it's given to display at the end, if 
those types are not specified to be "output".  So, the appearance from 
the result viewer is that those types are not there, but the truth is 
that they are in the CAS (if the Analysis Engine generated them and 
indexed them) but are not displayed.
> If they leave a type out of the AEDescriptor
> Capabilities section then the type is not serialized into the xmi file
> and viewable by the Document Analyzer viewer, thus some "filtering" of
> the output at least, is taking place.  
The Document Analyzer implementation (think of this a a particular 
application implementation) takes an Analysis Engine Descriptor (a 
primitive or an aggregate), runs it on an input or set of inputs, 
serializes the resulting CASes to a results directory,and then calls a 
viewer to display these, passing that viewer a special set of types to 
display which it explicitly constructs from the Output Capabilities 
specification metadata of the Analysis Engine it was given to run..

I don't think that the serialization step does any filtering (please let 
me know if you have a test case showing that it does) other than not 
outputing types which are not indexed and which cannot be reached from 
types which are indexed.  The viewer does have a filter. The Document 
Analyzer code constructs a list of types to display from all the types, 
by filtering those with the set which are designated as "outputs".  This 
is not a core framework filtering, it is rather just something this 
particular application (the Document Analyzer) decided to do.  The 
filter includes all subtypes of types in the Output capabilities.
> Now suppose they have a type
> system with Annotation type of Vehicle with subtypes Car, Submarine,
> etc. but they only want to see in the Document Analyzer what was
> annotated as a Vehicle.  My first instinct was that they could leave out
> the types for Car, Submarine, etc., and only include Vehicle as an
> output in the capabilities section and all of the annotations would be
> serialized not as Car, Submarine, but as Vehicle and thus, when they
> looked at the xmi file through the document analyzer a nice Vehicle type
> would be viewable.  This isn't the case, instead the person gets the
> subtypes, highlighted separately.
Yes.  I looked at the code for DocumentAnalyzer, and it explicitly 
includes subtypes.  See the source for DocumentAnalyzer, line 1194.
>   My guess is that the "filtering"
> behavior is a result of an implementation of the Document Analyzer
> rather than something enforced by the framework, but I wasn't sure.
>   
Yes, that is correct. 

-Marshall

RE: Question about Capabilities

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.

Sorry for not being clear.  I was unclear from the documentation as to
whether the intent of the capabilities section of the Analysis Engine
Descriptors was to provide external guarantees provided by the framework
on the input/outputs to the annotator code or to provide a means for the
annotator code to determine internally which FS types to view and/or
produce (via the ResultsSpecification). 

So take the case of somebody using a primitive Analysis Engine with the
Document Analyzer.  If they leave a type out of the AEDescriptor
Capabilities section then the type is not serialized into the xmi file
and viewable by the Document Analyzer viewer, thus some "filtering" of
the output at least, is taking place.  Now suppose they have a type
system with Annotation type of Vehicle with subtypes Car, Submarine,
etc. but they only want to see in the Document Analyzer what was
annotated as a Vehicle.  My first instinct was that they could leave out
the types for Car, Submarine, etc., and only include Vehicle as an
output in the capabilities section and all of the annotations would be
serialized not as Car, Submarine, but as Vehicle and thus, when they
looked at the xmi file through the document analyzer a nice Vehicle type
would be viewable.  This isn't the case, instead the person gets the
subtypes, highlighted separately.  My guess is that the "filtering"
behavior is a result of an implementation of the Document Analyzer
rather than something enforced by the framework, but I wasn't sure.

-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com] 
Sent: Thursday, May 29, 2008 9:52 AM
To: uima-user@incubator.apache.org
Subject: Re: Question about Capabilities

Hi Frank -

Can you clarify your questions a bit more?

Does your question concern ResultSpecifications and how these are set 
(by default) from Capabilities?  Or something else?

I'm not sure what you mean by asking if subtypes are "available";  there

is no "filtering" of the CAS based on input specification - all 
instances of all types in the CAS are "available" to all annotators.   
There is some interaction with the Capability Language Flow mechanism - 
are you using this?

-Marshall

LeHouillier, Frank D. wrote:
> If I have UIMA type system with a supertype FS and a subtype FS and an
> Analysis engine whose capabilities only list the supertype, should the
> subtype still be accessible? As of 2.2.1, the Feature Structure
appears
> to keep its most specific type.  If I have an aggregate that specifies
> the supertype as an input, is the subtype also available?  What about
> features of the subtype but not of the supertype?  When I'm running
with
> such a type system the features of the subtype are apparently carried
in
> the CAS, even though I no longer need it.  Is this the expected
> behaviour? 
>
>  
>
> Thanks,
>
> Frank
>
>
>

Re: Question about Capabilities

Posted by Marshall Schor <ms...@schor.com>.

Hi Frank -

Can you clarify your questions a bit more?

Does your question concern ResultSpecifications and how these are set 
(by default) from Capabilities?  Or something else?

I'm not sure what you mean by asking if subtypes are "available";  there 
is no "filtering" of the CAS based on input specification - all 
instances of all types in the CAS are "available" to all annotators.   
There is some interaction with the Capability Language Flow mechanism - 
are you using this?

-Marshall

LeHouillier, Frank D. wrote:
> If I have UIMA type system with a supertype FS and a subtype FS and an
> Analysis engine whose capabilities only list the supertype, should the
> subtype still be accessible? As of 2.2.1, the Feature Structure appears
> to keep its most specific type.  If I have an aggregate that specifies
> the supertype as an input, is the subtype also available?  What about
> features of the subtype but not of the supertype?  When I'm running with
> such a type system the features of the subtype are apparently carried in
> the CAS, even though I no longer need it.  Is this the expected
> behaviour? 
>
>  
>
> Thanks,
>
> Frank
>
>
>