You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by "LeHouillier, Frank D." <Fr...@gd-ais.com> on 2008/05/29 14:50:54 UTC
Question about Capabilities
If I have UIMA type system with a supertype FS and a subtype FS and an
Analysis engine whose capabilities only list the supertype, should the
subtype still be accessible? As of 2.2.1, the Feature Structure appears
to keep its most specific type. If I have an aggregate that specifies
the supertype as an input, is the subtype also available? What about
features of the subtype but not of the supertype? When I'm running with
such a type system the features of the subtype are apparently carried in
the CAS, even though I no longer need it. Is this the expected
behaviour?
Thanks,
Frank
RE: Question about Capabilities
Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.
Thanks for the clarification Marshall. I did run some test cases and
they all confirm your description.
Frank
-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com]
Sent: Thursday, May 29, 2008 12:40 PM
To: uima-user@incubator.apache.org
Subject: Re: Question about Capabilities
I don't think that the serialization step does any filtering (please let
me know if you have a test case showing that it does) other than not
outputing types which are not indexed and which cannot be reached from
types which are indexed. The viewer does have a filter. The Document
Analyzer code constructs a list of types to display from all the types,
by filtering those with the set which are designated as "outputs". This
is not a core framework filtering, it is rather just something this
particular application (the Document Analyzer) decided to do. The
filter includes all subtypes of types in the Output capabilities.
-Marshall
Re: Question about Capabilities
Posted by Marshall Schor <ms...@schor.com>.
LeHouillier, Frank D. wrote:
> Sorry for not being clear. I was unclear from the documentation as to
> whether the intent of the capabilities section of the Analysis Engine
> Descriptors was to provide external guarantees provided by the framework
> on the input/outputs to the annotator code or to provide a means for the
> annotator code to determine internally which FS types to view and/or
> produce (via the ResultsSpecification).
Currently, the input/output capabilities are not used by the framework
for providing external guarantees - they are used primarily for
configuring default ResultSpecifications and for the built-in Capability
Language Flow (flow controller). Because the input/output specs are
available metadata, application code and tooling can make use of these -
and that's something that the DocumentAnalyzer does.
Regarding the result specifications - an application can explicitly
provide result specifications when calling the analysis engine process
method; if that's not done, then the framework constructs a default
Result Specification, from input/output capabilities. See
http://incubator.apache.org/uima/downloads/releaseDocs/2.2.2-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.aae.result_specification_setting
for more details.
>
>
> So take the case of somebody using a primitive Analysis Engine with the
> Document Analyzer.
Well, the Document Analyzer is a special case :-) It looks at the
output capabilities (any application can do this) and won't display
types that are present in the CAS it's given to display at the end, if
those types are not specified to be "output". So, the appearance from
the result viewer is that those types are not there, but the truth is
that they are in the CAS (if the Analysis Engine generated them and
indexed them) but are not displayed.
> If they leave a type out of the AEDescriptor
> Capabilities section then the type is not serialized into the xmi file
> and viewable by the Document Analyzer viewer, thus some "filtering" of
> the output at least, is taking place.
The Document Analyzer implementation (think of this a a particular
application implementation) takes an Analysis Engine Descriptor (a
primitive or an aggregate), runs it on an input or set of inputs,
serializes the resulting CASes to a results directory,and then calls a
viewer to display these, passing that viewer a special set of types to
display which it explicitly constructs from the Output Capabilities
specification metadata of the Analysis Engine it was given to run..
I don't think that the serialization step does any filtering (please let
me know if you have a test case showing that it does) other than not
outputing types which are not indexed and which cannot be reached from
types which are indexed. The viewer does have a filter. The Document
Analyzer code constructs a list of types to display from all the types,
by filtering those with the set which are designated as "outputs". This
is not a core framework filtering, it is rather just something this
particular application (the Document Analyzer) decided to do. The
filter includes all subtypes of types in the Output capabilities.
> Now suppose they have a type
> system with Annotation type of Vehicle with subtypes Car, Submarine,
> etc. but they only want to see in the Document Analyzer what was
> annotated as a Vehicle. My first instinct was that they could leave out
> the types for Car, Submarine, etc., and only include Vehicle as an
> output in the capabilities section and all of the annotations would be
> serialized not as Car, Submarine, but as Vehicle and thus, when they
> looked at the xmi file through the document analyzer a nice Vehicle type
> would be viewable. This isn't the case, instead the person gets the
> subtypes, highlighted separately.
Yes. I looked at the code for DocumentAnalyzer, and it explicitly
includes subtypes. See the source for DocumentAnalyzer, line 1194.
> My guess is that the "filtering"
> behavior is a result of an implementation of the Document Analyzer
> rather than something enforced by the framework, but I wasn't sure.
>
Yes, that is correct.
-Marshall
RE: Question about Capabilities
Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.
Sorry for not being clear. I was unclear from the documentation as to
whether the intent of the capabilities section of the Analysis Engine
Descriptors was to provide external guarantees provided by the framework
on the input/outputs to the annotator code or to provide a means for the
annotator code to determine internally which FS types to view and/or
produce (via the ResultsSpecification).
So take the case of somebody using a primitive Analysis Engine with the
Document Analyzer. If they leave a type out of the AEDescriptor
Capabilities section then the type is not serialized into the xmi file
and viewable by the Document Analyzer viewer, thus some "filtering" of
the output at least, is taking place. Now suppose they have a type
system with Annotation type of Vehicle with subtypes Car, Submarine,
etc. but they only want to see in the Document Analyzer what was
annotated as a Vehicle. My first instinct was that they could leave out
the types for Car, Submarine, etc., and only include Vehicle as an
output in the capabilities section and all of the annotations would be
serialized not as Car, Submarine, but as Vehicle and thus, when they
looked at the xmi file through the document analyzer a nice Vehicle type
would be viewable. This isn't the case, instead the person gets the
subtypes, highlighted separately. My guess is that the "filtering"
behavior is a result of an implementation of the Document Analyzer
rather than something enforced by the framework, but I wasn't sure.
-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com]
Sent: Thursday, May 29, 2008 9:52 AM
To: uima-user@incubator.apache.org
Subject: Re: Question about Capabilities
Hi Frank -
Can you clarify your questions a bit more?
Does your question concern ResultSpecifications and how these are set
(by default) from Capabilities? Or something else?
I'm not sure what you mean by asking if subtypes are "available"; there
is no "filtering" of the CAS based on input specification - all
instances of all types in the CAS are "available" to all annotators.
There is some interaction with the Capability Language Flow mechanism -
are you using this?
-Marshall
LeHouillier, Frank D. wrote:
> If I have UIMA type system with a supertype FS and a subtype FS and an
> Analysis engine whose capabilities only list the supertype, should the
> subtype still be accessible? As of 2.2.1, the Feature Structure
appears
> to keep its most specific type. If I have an aggregate that specifies
> the supertype as an input, is the subtype also available? What about
> features of the subtype but not of the supertype? When I'm running
with
> such a type system the features of the subtype are apparently carried
in
> the CAS, even though I no longer need it. Is this the expected
> behaviour?
>
>
>
> Thanks,
>
> Frank
>
>
>
Re: Question about Capabilities
Posted by Marshall Schor <ms...@schor.com>.
Hi Frank -
Can you clarify your questions a bit more?
Does your question concern ResultSpecifications and how these are set
(by default) from Capabilities? Or something else?
I'm not sure what you mean by asking if subtypes are "available"; there
is no "filtering" of the CAS based on input specification - all
instances of all types in the CAS are "available" to all annotators.
There is some interaction with the Capability Language Flow mechanism -
are you using this?
-Marshall
LeHouillier, Frank D. wrote:
> If I have UIMA type system with a supertype FS and a subtype FS and an
> Analysis engine whose capabilities only list the supertype, should the
> subtype still be accessible? As of 2.2.1, the Feature Structure appears
> to keep its most specific type. If I have an aggregate that specifies
> the supertype as an input, is the subtype also available? What about
> features of the subtype but not of the supertype? When I'm running with
> such a type system the features of the subtype are apparently carried in
> the CAS, even though I no longer need it. Is this the expected
> behaviour?
>
>
>
> Thanks,
>
> Frank
>
>
>