You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Michael Baessler <mb...@michael-baessler.de> on 2007/12/18 14:55:43 UTC
capabilityLangugaeFlow - computeResultSpec
Hi,
I got the request on my table that the computation of the result spec
for the capabilityLanguageFlow takes to much time.
I looked at the code and found something interesting... maybe I'm wrong,
I'm not sure.
When looking at the ASB_impl.java at processUntilNextOutputCas() I found
the following:
//check if we have to set result spec, to support
capability language flow
if (nextStep instanceof SimpleStepWithResultSpec) {
ResultSpecification rs =
((SimpleStepWithResultSpec)nextStep).getResultSpecification();
if (rs != null) {
nextAe.setResultSpecification(rs);
}
}
// invoke next AE in flow
CasIterator casIter = null;
CAS outputCas = null; //used if the AE we call outputs a
new CAS
try {
casIter = nextAe.processAndOutputNewCASes(cas);
When a capabilityLanguageFlow is used, the ResultSpec for the flow
engines are precomputed if possible. The code above takes this
precomputed ResultSpec from the flow node and set it for the current AE.
When I go deeper to
casIter = nextAe.processAndOutputNewCASes(cas);
I found in the PrimitiveAnalysisEngine_impl.java class in the
callAnalysisComponentProcess() method the following:
if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
mLastTypeSystem = view.getTypeSystem();
mCurrentResultSpecification.compile(mLastTypeSystem);
// the actual ResultSpec we send to the component is formed by
// looking at this primitive AE's declared output types and
eliminiating
// any that are not in mCurrentResultSpecification.
ResultSpecification analysisComponentResultSpec =
computeAnalysisComponentResultSpec(
mCurrentResultSpecification,
getAnalysisEngineMetaData().getCapabilities());
// compile result spec - necessary to get type subsumption to
work properly
analysisComponentResultSpec.compile(mLastTypeSystem);
mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
mResultSpecChanged = false;
}
any time when the ResultSpec changed, the ResultSpec is recomputed. But
the ResultSpec is changed any time when setResultSpecification() is called.
So what does this mean. The first code fragment in the email shows how
to get the ResultSpec from the flow controller and set it on the AE.
- So the result spec changed - The second code fragment shows what is
executed if the ResultSpec has been changed and how it is recomputed.
This means that the ResultSpec is recomputed each time process is
called. I don't think this is necessary.
Beyond that it seems to me that the ResultsSpec
mCurrentResultSpecification
and the computed ResultSpec
analysisComponentResultSpec
have the same content.
Opinions? Did I miss something?
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> The Capability Language Flow for an aggregate is computed in
> CapabilityLanguageFlowController.computeFlowTable.
>
> This starts with the aggregates output capabilities, and figures out a
> flow for each language, that produces all the outputs.
>
> Should this computation also include in the set of needed outputs,
> inputs that downstream annotators need from upstream ones? That part
> seems to be missing in this computation?
>
> Here's an example:
>
> An aggregate G has delegates A & B. If B needs A to produce some
> type T for some language, but T is not among G's outputs, but
> something that B produces is among G's output, the flow controller
> would need to tell A to produce T so that B could produce the desired
> output at the aggregate level.
>
> -Marshall
Adding the input capabilities automatically is fine with me.
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
The Capability Language Flow for an aggregate is computed in
CapabilityLanguageFlowController.computeFlowTable.
This starts with the aggregates output capabilities, and figures out a
flow for each language, that produces all the outputs.
Should this computation also include in the set of needed outputs,
inputs that downstream annotators need from upstream ones? That part
seems to be missing in this computation?
Here's an example:
An aggregate G has delegates A & B. If B needs A to produce some type
T for some language, but T is not among G's outputs, but something that
B produces is among G's output, the flow controller would need to tell A
to produce T so that B could produce the desired output at the
aggregate level.
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
OK, will do.
-- Michael
Marshall Schor wrote:
> Easy to see- just trace the test case... -Marshall
>
> Michael Baessler wrote:
>> But it would still be interesting why this is never needed and how it
>> works now.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> OK. This would confirm that the other constructor is no longer
>>> needed, since the test that passes a result-spec arg in the process
>>> method no longer calls that.
>>>
>>> Thanks. -Marshall
>>>
>>> Michael Baessler wrote:
>>>> When looking at the tests for the capability language flow I see
>>>> both tests one with the result spec argument in the process()
>>>> method and one without.
>>>> In older UIMA versions, when using the debugger I see that both
>>>> constructors are used there.
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> Thanks. I'll see about comparing the older method with the
>>>>> current method, to verify this. -Marshall
>>>>>
>>>>> Michael Baessler wrote:
>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>>>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>>>>> when the result was set by an application using the process
>>>>>> method with the resultSpec argument. In the current version it
>>>>>> seems that only the version with the precomputed FlowTable is
>>>>>> used. But I can't say if that is correct or not since I don't
>>>>>> know the details about the ResultSpec restructuring (maybe only
>>>>>> Adam knows). But you are right, if this constructor isn't
>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>
>>>>>> Seems that the architecture has changed here. :-)
>>>>>>
>>>>>> -- Michael
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> If this is removed or if it is never called, then there is a
>>>>>>> section of the logic in CapabilityLanguageFlowObject which is
>>>>>>> never used, because mNodeList == null:
>>>>>>>
>>>>>>>
>>>>>>> if (mNodeList != null) {
>>>>>>> // 80 or lines of code elided
>>>>>>> }
>>>>>>>
>>>>>>> Can this logic be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined
>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>> CapabilityLanguageFlowObject(List aNodeList,
>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>
>>>>>>>> Can this be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
So in the older version of the capabilityLanguageFlow the inputs where
not recognized. But I think it is not bad that these are added
automatically since the flow can't work
if those are missing!
-- Michael
Marshall Schor wrote:
> I did this trace. Here's how it works now, without calling this.
>
> The process(cas, result-spec) call goes to
> AggregateAnalysisEngine_Impl which calls setResultSpecification on the
> AEEngine_impl object, which
> 1) clones the result-spec object
> 2) adds capabilities to it from the *inputs* of all components of this
> aggregate
> 3) uses this one cloned object as the result spec passed down to each
> component.
>
> Before going further - Michael - a question: isn't this
> union-with-all-inputs-behavior something you didn't want for
> capability language flow?
>
> Maybe it doesn't matter in that the use of capability language flow is
> not done in the real application use cases by passing the result spec
> in the top level call to the process method of the analysis engine?
>
> -Marshall
>
> Marshall Schor wrote:
>> Easy to see- just trace the test case... -Marshall
>>
>> Michael Baessler wrote:
>>> But it would still be interesting why this is never needed and how
>>> it works now.
>>>
>>> -- Michael
>>>
>>> Marshall Schor wrote:
>>>> OK. This would confirm that the other constructor is no longer
>>>> needed, since the test that passes a result-spec arg in the process
>>>> method no longer calls that.
>>>>
>>>> Thanks. -Marshall
>>>>
>>>> Michael Baessler wrote:
>>>>> When looking at the tests for the capability language flow I see
>>>>> both tests one with the result spec argument in the process()
>>>>> method and one without.
>>>>> In older UIMA versions, when using the debugger I see that both
>>>>> constructors are used there.
>>>>>
>>>>> -- Michael
>>>>>
>>>>> Marshall Schor wrote:
>>>>>> Thanks. I'll see about comparing the older method with the
>>>>>> current method, to verify this. -Marshall
>>>>>>
>>>>>> Michael Baessler wrote:
>>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>>>>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>>>>>> when the result was set by an application using the process
>>>>>>> method with the resultSpec argument. In the current version it
>>>>>>> seems that only the version with the precomputed FlowTable is
>>>>>>> used. But I can't say if that is correct or not since I don't
>>>>>>> know the details about the ResultSpec restructuring (maybe only
>>>>>>> Adam knows). But you are right, if this constructor isn't
>>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>>
>>>>>>> Seems that the architecture has changed here. :-)
>>>>>>>
>>>>>>> -- Michael
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> If this is removed or if it is never called, then there is a
>>>>>>>> section of the logic in CapabilityLanguageFlowObject which is
>>>>>>>> never used, because mNodeList == null:
>>>>>>>>
>>>>>>>>
>>>>>>>> if (mNodeList != null) {
>>>>>>>> // 80 or lines of code elided
>>>>>>>> }
>>>>>>>>
>>>>>>>> Can this logic be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>> Marshall Schor wrote:
>>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined
>>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>>> CapabilityLanguageFlowObject(List aNodeList,
>>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>>
>>>>>>>>> Can this be removed?
>>>>>>>>>
>>>>>>>>> -Marshall
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
I did this trace. Here's how it works now, without calling this.
The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl
which calls setResultSpecification on the AEEngine_impl object, which
1) clones the result-spec object
2) adds capabilities to it from the *inputs* of all components of this
aggregate
3) uses this one cloned object as the result spec passed down to each
component.
Before going further - Michael - a question: isn't this
union-with-all-inputs-behavior something you didn't want for capability
language flow?
Maybe it doesn't matter in that the use of capability language flow is
not done in the real application use cases by passing the result spec in
the top level call to the process method of the analysis engine?
-Marshall
Marshall Schor wrote:
> Easy to see- just trace the test case... -Marshall
>
> Michael Baessler wrote:
>> But it would still be interesting why this is never needed and how it
>> works now.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> OK. This would confirm that the other constructor is no longer
>>> needed, since the test that passes a result-spec arg in the process
>>> method no longer calls that.
>>>
>>> Thanks. -Marshall
>>>
>>> Michael Baessler wrote:
>>>> When looking at the tests for the capability language flow I see
>>>> both tests one with the result spec argument in the process()
>>>> method and one without.
>>>> In older UIMA versions, when using the debugger I see that both
>>>> constructors are used there.
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> Thanks. I'll see about comparing the older method with the
>>>>> current method, to verify this. -Marshall
>>>>>
>>>>> Michael Baessler wrote:
>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>>>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>>>>> when the result was set by an application using the process
>>>>>> method with the resultSpec argument. In the current version it
>>>>>> seems that only the version with the precomputed FlowTable is
>>>>>> used. But I can't say if that is correct or not since I don't
>>>>>> know the details about the ResultSpec restructuring (maybe only
>>>>>> Adam knows). But you are right, if this constructor isn't
>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>
>>>>>> Seems that the architecture has changed here. :-)
>>>>>>
>>>>>> -- Michael
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> If this is removed or if it is never called, then there is a
>>>>>>> section of the logic in CapabilityLanguageFlowObject which is
>>>>>>> never used, because mNodeList == null:
>>>>>>>
>>>>>>>
>>>>>>> if (mNodeList != null) {
>>>>>>> // 80 or lines of code elided
>>>>>>> }
>>>>>>>
>>>>>>> Can this logic be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined
>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>> CapabilityLanguageFlowObject(List aNodeList,
>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>
>>>>>>>> Can this be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Here's the trace of how this works, when run from a top level
process(cas) call:
1) the call goes to the AnalysisEngine_Impl process method, which calls
processAndOutputNewCASes in the same object. This calls the ASB_impl
process method, which creates a new AggregateCasIterator(aCAS). This
constructor calls computeFlow on the ...asb.impl.FlowControllerContainer
object. This calls the particular flow controller's computeFlow
method. In this case, the flowController is the
CapabilityLanguageFlowController. Since this a new CAS coming in to the
aggregate, the computeFlow method makes a new
CapabilityLanguageFlowObject, passing in the pre-computed Flow Table).
So that's how it uses this constructor, in the case where no specific
result spec is passed.
-Marshall
Marshall Schor wrote:
> Easy to see- just trace the test case... -Marshall
>
> Michael Baessler wrote:
>> But it would still be interesting why this is never needed and how it
>> works now.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> OK. This would confirm that the other constructor is no longer
>>> needed, since the test that passes a result-spec arg in the process
>>> method no longer calls that.
>>>
>>> Thanks. -Marshall
>>>
>>> Michael Baessler wrote:
>>>> When looking at the tests for the capability language flow I see
>>>> both tests one with the result spec argument in the process()
>>>> method and one without.
>>>> In older UIMA versions, when using the debugger I see that both
>>>> constructors are used there.
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> Thanks. I'll see about comparing the older method with the
>>>>> current method, to verify this. -Marshall
>>>>>
>>>>> Michael Baessler wrote:
>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>>>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>>>>> when the result was set by an application using the process
>>>>>> method with the resultSpec argument. In the current version it
>>>>>> seems that only the version with the precomputed FlowTable is
>>>>>> used. But I can't say if that is correct or not since I don't
>>>>>> know the details about the ResultSpec restructuring (maybe only
>>>>>> Adam knows). But you are right, if this constructor isn't
>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>
>>>>>> Seems that the architecture has changed here. :-)
>>>>>>
>>>>>> -- Michael
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> If this is removed or if it is never called, then there is a
>>>>>>> section of the logic in CapabilityLanguageFlowObject which is
>>>>>>> never used, because mNodeList == null:
>>>>>>>
>>>>>>>
>>>>>>> if (mNodeList != null) {
>>>>>>> // 80 or lines of code elided
>>>>>>> }
>>>>>>>
>>>>>>> Can this logic be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined
>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>> CapabilityLanguageFlowObject(List aNodeList,
>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>
>>>>>>>> Can this be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Easy to see- just trace the test case... -Marshall
Michael Baessler wrote:
> But it would still be interesting why this is never needed and how it
> works now.
>
> -- Michael
>
> Marshall Schor wrote:
>> OK. This would confirm that the other constructor is no longer
>> needed, since the test that passes a result-spec arg in the process
>> method no longer calls that.
>>
>> Thanks. -Marshall
>>
>> Michael Baessler wrote:
>>> When looking at the tests for the capability language flow I see
>>> both tests one with the result spec argument in the process() method
>>> and one without.
>>> In older UIMA versions, when using the debugger I see that both
>>> constructors are used there.
>>>
>>> -- Michael
>>>
>>> Marshall Schor wrote:
>>>> Thanks. I'll see about comparing the older method with the current
>>>> method, to verify this. -Marshall
>>>>
>>>> Michael Baessler wrote:
>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>>>> when the result was set by an application using the process method
>>>>> with the resultSpec argument. In the current version it seems that
>>>>> only the version with the precomputed FlowTable is used. But I
>>>>> can't say if that is correct or not since I don't know the details
>>>>> about the ResultSpec restructuring (maybe only Adam knows). But
>>>>> you are right, if this constructor isn't necessary both, the code
>>>>> and the constructor, can be removed.
>>>>>
>>>>> Seems that the architecture has changed here. :-)
>>>>>
>>>>> -- Michael
>>>>>
>>>>> Marshall Schor wrote:
>>>>>> If this is removed or if it is never called, then there is a
>>>>>> section of the logic in CapabilityLanguageFlowObject which is
>>>>>> never used, because mNodeList == null:
>>>>>>
>>>>>>
>>>>>> if (mNodeList != null) {
>>>>>> // 80 or lines of code elided
>>>>>> }
>>>>>>
>>>>>> Can this logic be removed?
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> The class CapabilityLanguageFlowObject has 2 defined
>>>>>>> constructors, but one is never used/referenced:
>>>>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
>>>>>>> resultSpec)
>>>>>>>
>>>>>>> Can this be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
But it would still be interesting why this is never needed and how it
works now.
-- Michael
Marshall Schor wrote:
> OK. This would confirm that the other constructor is no longer
> needed, since the test that passes a result-spec arg in the process
> method no longer calls that.
>
> Thanks. -Marshall
>
> Michael Baessler wrote:
>> When looking at the tests for the capability language flow I see both
>> tests one with the result spec argument in the process() method and
>> one without.
>> In older UIMA versions, when using the debugger I see that both
>> constructors are used there.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> Thanks. I'll see about comparing the older method with the current
>>> method, to verify this. -Marshall
>>>
>>> Michael Baessler wrote:
>>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>>> when the result was set by an application using the process method
>>>> with the resultSpec argument. In the current version it seems that
>>>> only the version with the precomputed FlowTable is used. But I
>>>> can't say if that is correct or not since I don't know the details
>>>> about the ResultSpec restructuring (maybe only Adam knows). But you
>>>> are right, if this constructor isn't necessary both, the code and
>>>> the constructor, can be removed.
>>>>
>>>> Seems that the architecture has changed here. :-)
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> If this is removed or if it is never called, then there is a
>>>>> section of the logic in CapabilityLanguageFlowObject which is
>>>>> never used, because mNodeList == null:
>>>>>
>>>>>
>>>>> if (mNodeList != null) {
>>>>> // 80 or lines of code elided
>>>>> }
>>>>>
>>>>> Can this logic be removed?
>>>>>
>>>>> -Marshall
>>>>>
>>>>> Marshall Schor wrote:
>>>>>> The class CapabilityLanguageFlowObject has 2 defined
>>>>>> constructors, but one is never used/referenced:
>>>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
>>>>>> resultSpec)
>>>>>>
>>>>>> Can this be removed?
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
OK. This would confirm that the other constructor is no longer needed,
since the test that passes a result-spec arg in the process method no
longer calls that.
Thanks. -Marshall
Michael Baessler wrote:
> When looking at the tests for the capability language flow I see both
> tests one with the result spec argument in the process() method and
> one without.
> In older UIMA versions, when using the debugger I see that both
> constructors are used there.
>
> -- Michael
>
> Marshall Schor wrote:
>> Thanks. I'll see about comparing the older method with the current
>> method, to verify this. -Marshall
>>
>> Michael Baessler wrote:
>>> In older UIMA versions the CapabilityLanguageFlowObject(List
>>> aNodeList, ResultSpecification resultSpec) constructor was used
>>> when the result was set by an application using the process method
>>> with the resultSpec argument. In the current version it seems that
>>> only the version with the precomputed FlowTable is used. But I can't
>>> say if that is correct or not since I don't know the details about
>>> the ResultSpec restructuring (maybe only Adam knows). But you are
>>> right, if this constructor isn't necessary both, the code and the
>>> constructor, can be removed.
>>>
>>> Seems that the architecture has changed here. :-)
>>>
>>> -- Michael
>>>
>>> Marshall Schor wrote:
>>>> If this is removed or if it is never called, then there is a
>>>> section of the logic in CapabilityLanguageFlowObject which is never
>>>> used, because mNodeList == null:
>>>>
>>>>
>>>> if (mNodeList != null) {
>>>> // 80 or lines of code elided
>>>> }
>>>>
>>>> Can this logic be removed?
>>>>
>>>> -Marshall
>>>>
>>>> Marshall Schor wrote:
>>>>> The class CapabilityLanguageFlowObject has 2 defined constructors,
>>>>> but one is never used/referenced:
>>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
>>>>> resultSpec)
>>>>>
>>>>> Can this be removed?
>>>>>
>>>>> -Marshall
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
When looking at the tests for the capability language flow I see both
tests one with the result spec argument in the process() method and one
without.
In older UIMA versions, when using the debugger I see that both
constructors are used there.
-- Michael
Marshall Schor wrote:
> Thanks. I'll see about comparing the older method with the current
> method, to verify this. -Marshall
>
> Michael Baessler wrote:
>> In older UIMA versions the CapabilityLanguageFlowObject(List
>> aNodeList, ResultSpecification resultSpec) constructor was used when
>> the result was set by an application using the process method with
>> the resultSpec argument. In the current version it seems that only
>> the version with the precomputed FlowTable is used. But I can't say
>> if that is correct or not since I don't know the details about the
>> ResultSpec restructuring (maybe only Adam knows). But you are right,
>> if this constructor isn't necessary both, the code and the
>> constructor, can be removed.
>>
>> Seems that the architecture has changed here. :-)
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> If this is removed or if it is never called, then there is a section
>>> of the logic in CapabilityLanguageFlowObject which is never used,
>>> because mNodeList == null:
>>>
>>>
>>> if (mNodeList != null) {
>>> // 80 or lines of code elided
>>> }
>>>
>>> Can this logic be removed?
>>>
>>> -Marshall
>>>
>>> Marshall Schor wrote:
>>>> The class CapabilityLanguageFlowObject has 2 defined constructors,
>>>> but one is never used/referenced:
>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
>>>> resultSpec)
>>>>
>>>> Can this be removed?
>>>>
>>>> -Marshall
>>>>
>>>>
>>>
>>
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Thanks. I'll see about comparing the older method with the current
method, to verify this. -Marshall
Michael Baessler wrote:
> In older UIMA versions the CapabilityLanguageFlowObject(List
> aNodeList, ResultSpecification resultSpec) constructor was used when
> the result was set by an application using the process method with the
> resultSpec argument. In the current version it seems that only the
> version with the precomputed FlowTable is used. But I can't say if
> that is correct or not since I don't know the details about the
> ResultSpec restructuring (maybe only Adam knows). But you are right,
> if this constructor isn't necessary both, the code and the
> constructor, can be removed.
>
> Seems that the architecture has changed here. :-)
>
> -- Michael
>
> Marshall Schor wrote:
>> If this is removed or if it is never called, then there is a section
>> of the logic in CapabilityLanguageFlowObject which is never used,
>> because mNodeList == null:
>>
>>
>> if (mNodeList != null) {
>> // 80 or lines of code elided
>> }
>>
>> Can this logic be removed?
>>
>> -Marshall
>>
>> Marshall Schor wrote:
>>> The class CapabilityLanguageFlowObject has 2 defined constructors,
>>> but one is never used/referenced:
>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
>>> resultSpec)
>>>
>>> Can this be removed?
>>>
>>> -Marshall
>>>
>>>
>>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 23, 2008 8:06 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList,
> ResultSpecification resultSpec) constructor was used when the result
> was set by an application using the process method with the resultSpec
> argument. In the current version it seems that only the version with the
> precomputed FlowTable is used. But I can't say if that is correct or not
> since I don't know the details about the ResultSpec restructuring (maybe
> only Adam knows).
I think no one knows exactly. This area of the code grew somewhat
organically to address requirements over time. I don't think I ever
fully understood it how CapabilityLanguageFlow was implemented. When
I was adding the custom flow controller in v2.0, I did my best to port
whatever behavior was there and make sure all the test cases passed.
It turned out we were missing some important test cases though, and
that's how we came around to adding the SimpleStepWithResultSpec class
in order to replicate the old behavior. I think the key thing is to
make sure we have the right test cases in place to be sure we're
preserving backward compatibility, and then I'm all for having
Marshall clean up the code so it makes more sense.
-Adam
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList,
ResultSpecification resultSpec) constructor was used when the result
was set by an application using the process method with the resultSpec
argument. In the current version it seems that only the version with the
precomputed FlowTable is used. But I can't say if that is correct or not
since I don't know the details about the ResultSpec restructuring (maybe
only Adam knows). But you are right, if this constructor isn't necessary
both, the code and the constructor, can be removed.
Seems that the architecture has changed here. :-)
-- Michael
Marshall Schor wrote:
> If this is removed or if it is never called, then there is a section
> of the logic in CapabilityLanguageFlowObject which is never used,
> because mNodeList == null:
>
>
> if (mNodeList != null) {
> // 80 or lines of code elided
> }
>
> Can this logic be removed?
>
> -Marshall
>
> Marshall Schor wrote:
>> The class CapabilityLanguageFlowObject has 2 defined constructors,
>> but one is never used/referenced:
>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
>> resultSpec)
>>
>> Can this be removed?
>>
>> -Marshall
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
If this is removed or if it is never called, then there is a section of
the logic in CapabilityLanguageFlowObject which is never used, because
mNodeList == null:
if (mNodeList != null) {
// 80 or lines of code elided
}
Can this logic be removed?
-Marshall
Marshall Schor wrote:
> The class CapabilityLanguageFlowObject has 2 defined constructors, but
> one is never used/referenced:
> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification
> resultSpec)
>
> Can this be removed?
>
> -Marshall
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
The class CapabilityLanguageFlowObject has 2 defined constructors, but
one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec)
Can this be removed?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification
Posted by Marshall Schor <ms...@schor.com>.
I'll fix the Javadocs to correspond to what the code does. This will
have the result that
addResultFeature(1-feature, languages) will *add* to the existing
languages, while
addResultFeature(1-feature) will *replace* all existing languages
with x-unspecified.
-Marshall
Marshall Schor wrote:
> I'm doing a redesign for the result spec area to improve performance.
>
> The basic idea is to put a hasBeenChanged flag into the result spec
> object, and use it being "false" to enable users to avoid recomputing
> things.
> Why not use "equal" ? because a single result spec object is shared
> among multiple users, and when updated, the object is updated in place
> (so there is no other object to compare it to).
> Looking at the ResultSpec object - it has a hashMap that stores the
> Types and Features (TypeOrFeature objects) as the keys; the values are
> hashSets holding languages for which these types and features are in
> the result spec. (There is a special hash set having just the entry
> of the default language = UNSPECIFIED_LANGUAGE = "x-unspecified").
> I'm going to try and make the default language hash set a constant,
> and create just one instance of it - this should improve performance,
> especially when languages are not being used.
>
> There are 2 kinds of methods to add types/features to a result spec:
> ones with language(s) and ones without.
> The ones without reset any language spec associated with the type or
> feature(s) to the UNSPECIFIED_LANGUAGE.
>
> The ones with a language, sometimes "replace" the language
> associated with the type/feature, and other times, they "add" the
> language (assuming the type/feature is already an entry in the
> hashMap of types and features).
>
> methods which are replacing any existing languages:
>
> setResultTypesAndFeatures[array of TypeOrFeature) << repl with
> x-unspecified language
> setResultTypesAndFeatures[array of TypeOrFeature, languages) <<
> repl with languages
> addResultTypeOrFeature(1-TypeOrFeature) << repl
> with x-unspecified language
> addResultTypeOrFeature(1-TypeOrFeature, languages) << repl with
> languages
> addResultType(String, boolean) << repl with x-unspecified
> language
> addResultFeature(1-feature, languages) << repl with
> languagesx-unspecified
>
> methods which are adding to existing languages:
>
> addResultType(1-type, boolean, languages) adds languages
> addResultFeature(1-feature) << adds x-unspecified
>
> The "set..." method essentially clears the result spec and sets it
> with completely new information, so it is reasonable that it replaces
> any existing language information.
>
> The addResult methods, when used to add a type or feature which
> already present, are inconsistent - with one method adding, and the
> others, replacing. This behavior is documented in the JavaDocs for the
> class.
>
> The JavaDocs have the behavior for adding a Feature by name reversed
> with the behavior for adding a Type by name. In one case, including
> the language is treated as a replace, in the other as an add. This
> seems likely a bug in the Javadocs. The code for the addResultFeature
> is reversed from the Javadocs: the code will "add" languages if
> specified, but "replaces" (with the x-unspecified) if languages are
> not specified in the method call.
>
> Does anyone know what the "correct" behavior of these methods is
> supposed to be?
>
> -Marshall
>
>
>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification
Posted by Marshall Schor <ms...@schor.com>.
I'm doing a redesign for the result spec area to improve performance.
The basic idea is to put a hasBeenChanged flag into the result spec
object, and use it being "false" to enable users to avoid recomputing
things.
Why not use "equal" ? because a single result spec object is shared
among multiple users, and when updated, the object is updated in place
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the
Types and Features (TypeOrFeature objects) as the keys; the values are
hashSets holding languages for which these types and features are in the
result spec. (There is a special hash set having just the entry of the
default language = UNSPECIFIED_LANGUAGE = "x-unspecified").
I'm going to try and make the default language hash set a constant, and
create just one instance of it - this should improve performance,
especially when languages are not being used.
There are 2 kinds of methods to add types/features to a result spec:
ones with language(s) and ones without.
The ones without reset any language spec associated with the type or
feature(s) to the UNSPECIFIED_LANGUAGE.
The ones with a language, sometimes "replace" the language
associated with the type/feature, and other times, they "add" the
language (assuming the type/feature is already an entry in the
hashMap of types and features).
methods which are replacing any existing languages:
setResultTypesAndFeatures[array of TypeOrFeature) << repl with
x-unspecified language
setResultTypesAndFeatures[array of TypeOrFeature, languages) <<
repl with languages
addResultTypeOrFeature(1-TypeOrFeature) << repl
with x-unspecified language
addResultTypeOrFeature(1-TypeOrFeature, languages) << repl with
languages
addResultType(String, boolean) << repl with x-unspecified
language
addResultFeature(1-feature, languages) << repl with
languagesx-unspecified
methods which are adding to existing languages:
addResultType(1-type, boolean, languages) adds languages
addResultFeature(1-feature) << adds x-unspecified
The "set..." method essentially clears the result spec and sets it with
completely new information, so it is reasonable that it replaces any
existing language information.
The addResult methods, when used to add a type or feature which already
present, are inconsistent - with one method adding, and the others,
replacing. This behavior is documented in the JavaDocs for the class.
The JavaDocs have the behavior for adding a Feature by name reversed
with the behavior for adding a Type by name. In one case, including the
language is treated as a replace, in the other as an add. This seems
likely a bug in the Javadocs. The code for the addResultFeature is
reversed from the Javadocs: the code will "add" languages if specified,
but "replaces" (with the x-unspecified) if languages are not specified
in the method call.
Does anyone know what the "correct" behavior of these methods is
supposed to be?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Ok, I didn't follow that... so fine with me to do the change!
-- Michael
Marshall Schor wrote:
> I may have missed something - I don't see what would need to be added
> to the ResultSpecification class. The method
> hasOutputTypeOrFeature(...) is always called with doFuzzySearch==
> true, which is how the containsType or containsFeature methods operate
> (always) in the Result Specification class.
>
> Is there some other difference I'm missing?
>
> -Marshall
>
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Can I replace the class CapabilityContainer with the much more
>>> efficient (now) ResultSpecification class?
>>>
>>> It seems to me they do the almost same thing, and the
>>> ResultSpecification may be handling the corner cases better.
>>>
>>> Is there some subtle difference I'm missing? It would be nice to
>>> eliminate a class -
>>> smaller code base => less maintenance effort in the future :-)
>>>
>>> -Marshall
>> Yes, if it is possible to add the missing functionality to the
>> ResultSpecification class, fine with me.
>> For example the important method -
>> hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage,
>> doFuzzySearch) is currently
>> not available at the ResultSpecification class.
>>
>> -- Michael
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
I may have missed something - I don't see what would need to be added to
the ResultSpecification class. The method hasOutputTypeOrFeature(...)
is always called with doFuzzySearch== true, which is how the
containsType or containsFeature methods operate (always) in the Result
Specification class.
Is there some other difference I'm missing?
-Marshall
Michael Baessler wrote:
> Marshall Schor wrote:
>> Can I replace the class CapabilityContainer with the much more
>> efficient (now) ResultSpecification class?
>>
>> It seems to me they do the almost same thing, and the
>> ResultSpecification may be handling the corner cases better.
>>
>> Is there some subtle difference I'm missing? It would be nice to
>> eliminate a class -
>> smaller code base => less maintenance effort in the future :-)
>>
>> -Marshall
> Yes, if it is possible to add the missing functionality to the
> ResultSpecification class, fine with me.
> For example the important method -
> hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage,
> doFuzzySearch) is currently
> not available at the ResultSpecification class.
>
> -- Michael
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Can I replace the class CapabilityContainer with the much more
> efficient (now) ResultSpecification class?
>
> It seems to me they do the almost same thing, and the
> ResultSpecification may be handling the corner cases better.
>
> Is there some subtle difference I'm missing? It would be nice to
> eliminate a class -
> smaller code base => less maintenance effort in the future :-)
>
> -Marshall
Yes, if it is possible to add the missing functionality to the
ResultSpecification class, fine with me.
For example the important method -
hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage,
doFuzzySearch) is currently
not available at the ResultSpecification class.
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Yes, that is correct!
- Michael
Marshall Schor wrote:
> While experimenting with this approach, I found some tests wouldn't
> run. (By the way, the test cases are great - they have been a great
> help :-) ).
>
> Here's a case I'm want to be sure I understand:
>
> Let's suppose that the aggregate says it produces type Foo with
> language x-unspecified.
>
> Let's suppose there are 2 annotators in the flow: the first one
> produces Foo with language "en", the 2nd one produces Foo with
> language "x-unspecified". A flow given language "x-unspecified"
> should run the 2nd annotator, skipping the first one. (This is how it
> works now).
>
> =======
>
> Here's another similar case, using the other language subsumption
> between "en-us" and "en".
>
> Let's suppose that the aggregate says it produces type Foo with
> language "en".
>
> Let's suppose there are 2 annotators in the flow: the first one
> produces Foo with language "en-us", the 2nd one produces Foo with
> language "en". A flow given language "en" should run the 2nd
> annotator, skipping the first one. (This is how it works now, I think).
>
> With this explanation, I see there is a modification to the result
> spec's containsType/Feature method with a language argument needed for
> this use. Currently, the ResultSpecification matching works like this:
> Language arg RsltSpc Matches
> "en" "en-us" no
> "en-us" "en" yes
> "x-unspecified" *any* yes <<< behavior needs to be different
> "en" "x-unsp.." yes
>
> Is this correct?
>
> -Marshall
>
> Marshall Schor wrote:
>> Can I replace the class CapabilityContainer with the much more
>> efficient (now) ResultSpecification class?
>>
>> It seems to me they do the almost same thing, and the
>> ResultSpecification may be handling the corner cases better.
>>
>> Is there some subtle difference I'm missing? It would be nice to
>> eliminate a class -
>> smaller code base => less maintenance effort in the future :-)
>>
>> -Marshall
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
I went back and checked the Javadocs for the ResultSpecification, prior
to my reworking of it. I think I treated the x-unspecified slightly
wrong, and if I had done it right, then the anomaly noted in the
previous note (below) would not be there.
The previous Javadocs all say that the setters for a typeOrFeature
without a language argument, are equivalent to passing in the
x-unspecified language. The method containsType/Feature(foo,
"x-unspecified") should be made to return true only if the Result
specification for this contained x-unspecified. It might not, if, for
instance, the setting for Foo was only for languages "en" and "de".
A consequence of making it work this way is the following:
containsType(foo, "x-unspecified") will return "false" if foo is in
the result spec
only for particular languages.
and the containsType(foo) <<< no language argument
would also return "false", if foo is in the result spec
only for particular languages.
I plan correct the treatment of x-unspecified, along these lines, to
work as described above.
Please post any concerns/objections :-)
-Marshall
Marshall Schor wrote:
> While experimenting with this approach, I found some tests wouldn't
> run. (By the way, the test cases are great - they have been a great
> help :-) ).
>
> Here's a case I'm want to be sure I understand:
>
> Let's suppose that the aggregate says it produces type Foo with
> language x-unspecified.
>
> Let's suppose there are 2 annotators in the flow: the first one
> produces Foo with language "en", the 2nd one produces Foo with
> language "x-unspecified". A flow given language "x-unspecified"
> should run the 2nd annotator, skipping the first one. (This is how it
> works now).
>
> =======
>
> Here's another similar case, using the other language subsumption
> between "en-us" and "en".
>
> Let's suppose that the aggregate says it produces type Foo with
> language "en".
>
> Let's suppose there are 2 annotators in the flow: the first one
> produces Foo with language "en-us", the 2nd one produces Foo with
> language "en". A flow given language "en" should run the 2nd
> annotator, skipping the first one. (This is how it works now, I think).
>
> With this explanation, I see there is a modification to the result
> spec's containsType/Feature method with a language argument needed for
> this use. Currently, the ResultSpecification matching works like this:
> Language arg RsltSpc Matches
> "en" "en-us" no
> "en-us" "en" yes
> "x-unspecified" *any* yes <<< behavior needs to be different
> "en" "x-unsp.." yes
>
> Is this correct?
>
> -Marshall
>
> Marshall Schor wrote:
>> Can I replace the class CapabilityContainer with the much more
>> efficient (now) ResultSpecification class?
>>
>> It seems to me they do the almost same thing, and the
>> ResultSpecification may be handling the corner cases better.
>>
>> Is there some subtle difference I'm missing? It would be nice to
>> eliminate a class -
>> smaller code base => less maintenance effort in the future :-)
>>
>> -Marshall
>>
>>
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
While experimenting with this approach, I found some tests wouldn't
run. (By the way, the test cases are great - they have been a great
help :-) ).
Here's a case I'm want to be sure I understand:
Let's suppose that the aggregate says it produces type Foo with language
x-unspecified.
Let's suppose there are 2 annotators in the flow: the first one
produces Foo with language "en", the 2nd one produces Foo with language
"x-unspecified". A flow given language "x-unspecified" should run the
2nd annotator, skipping the first one. (This is how it works now).
=======
Here's another similar case, using the other language subsumption
between "en-us" and "en".
Let's suppose that the aggregate says it produces type Foo with language
"en".
Let's suppose there are 2 annotators in the flow: the first one
produces Foo with language "en-us", the 2nd one produces Foo with
language "en". A flow given language "en" should run the 2nd annotator,
skipping the first one. (This is how it works now, I think).
With this explanation, I see there is a modification to the result
spec's containsType/Feature method with a language argument needed for
this use.
Currently, the ResultSpecification matching works like this:
Language arg RsltSpc Matches
"en" "en-us" no
"en-us" "en" yes
"x-unspecified" *any* yes <<< behavior needs to be different
"en" "x-unsp.." yes
Is this correct?
-Marshall
Marshall Schor wrote:
> Can I replace the class CapabilityContainer with the much more
> efficient (now) ResultSpecification class?
>
> It seems to me they do the almost same thing, and the
> ResultSpecification may be handling the corner cases better.
>
> Is there some subtle difference I'm missing? It would be nice to
> eliminate a class -
> smaller code base => less maintenance effort in the future :-)
>
> -Marshall
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Can I replace the class CapabilityContainer with the much more efficient
(now) ResultSpecification class?
It seems to me they do the almost same thing, and the
ResultSpecification may be handling the corner cases better.
Is there some subtle difference I'm missing? It would be nice to
eliminate a class -
smaller code base => less maintenance effort in the future :-)
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Yes, I think so. This test dumps the result spec for each AE to a file
to check if it was computed correctly.
The computation of the result spec is done during the initialization of
the aggregate AE when the capability language flow is created.
The precomputed result spec can later be used in the document
processing, but this is currently not used. It is recomputed each time.
For the my simple performance test I removed the second computation that
is done during runtime processing
( PrimitiveAnalysisEngine_impl.java: protected ResultSpecification
computeAnalysisComponentResultSpec() ). So the original computed result
spec is used.
But we cannot remove this code completely since it can happen that a
result spec is provided by the application and it must be recomputed
dynamically.
-- Michael
Marshall Schor wrote:
> Michael -
>
> I'm confused about how this test is setup. The test descriptor this
> code uses loads an aggregate, and then runs a process method which
> ends up calling some dummy process method called
> SequencerTestAnnotator. This process method dumps (to a file) the
> result spec. Is that the case you're running?
>
> How do you turn on and off the (re)computation of the result spec?
>
> -Marshall
>
> Michael Baessler wrote:
>> Michael Baessler wrote:
>>> Adam Lally wrote:
>>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de>
>>>> wrote:
>>>>
>>>>> I tried to figure out how the ResultSpecification handling in
>>>>> uima-core
>>>>> works with all side effects to check how it can be done
>>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>>> to, there are to much open questions where I don't know
>>>>> exactly if it is right in any case ... :-(
>>>>>
>>>>> Adam can you please look at this issue?
>>>>>
>>>>>
>>>>
>>>> I can try to take a look, but I don't have a lot of time. Do you have
>>>> a test case for this, where you expect I would see a significant
>>>> performance improvement if I fix this?
>>>>
>>> Sorry I have to performance test case. I checked my assumption using
>>> the debugger.
>>>
>>> I used the following main() with a loop over the process call to
>>> check if the result spec is recomputed each time.
>>> The descriptor is the same as used in the capabilityLanguageFlow
>>> test case of the uimaj-core project.
>>> Maybe a sysout helps to detect if the unnecessary calls are done or
>>> not.
>>>
>>> Maybe when iterating more than 10 times will give you performance
>>> numbers before and after. Maybe adding additional capabilities
>>> that must be analyzed will increase the time used to compute the
>>> result spec. I will look at this tomorrow.
>>>
>>> public static void main(String[] args) {
>>>
>>> AnalysisEngine ae = null;
>>> try {
>>>
>>> String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>>
>>> XMLInputSource in = new
>>> XMLInputSource(JUnitExtension.getFile(desc));
>>> ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>> .parseResourceSpecifier(in);
>>> ae = UIMAFramework.produceAnalysisEngine(specifier, null,
>>> null);
>>> CAS cas = ae.newCAS();
>>> String text = "Hello world!";
>>> cas.setDocumentText(text);
>>> cas.setDocumentLanguage("en");
>>> for (int i = 0; i < 10; i++) {
>>> ae.process(cas);
>>> }
>>> } catch (Exception ex) {
>>> ex.printStackTrace();
>>> }
>>> }
>>>
>>> -- Michael
>> When setting the loop counter to 1000 I have 6000ms without
>> recomputing the result spec and
>> 27000ms when recomputing the result spec. I think this should be
>> sufficient for testing.
>>
>> -- Michael
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Michael -
I'm confused about how this test is setup. The test descriptor this
code uses loads an aggregate, and then runs a process method which ends
up calling some dummy process method called SequencerTestAnnotator.
This process method dumps (to a file) the result spec. Is that the case
you're running?
How do you turn on and off the (re)computation of the result spec?
-Marshall
Michael Baessler wrote:
> Michael Baessler wrote:
>> Adam Lally wrote:
>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de>
>>> wrote:
>>>
>>>> I tried to figure out how the ResultSpecification handling in
>>>> uima-core
>>>> works with all side effects to check how it can be done
>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>> to, there are to much open questions where I don't know
>>>> exactly if it is right in any case ... :-(
>>>>
>>>> Adam can you please look at this issue?
>>>>
>>>>
>>>
>>> I can try to take a look, but I don't have a lot of time. Do you have
>>> a test case for this, where you expect I would see a significant
>>> performance improvement if I fix this?
>>>
>> Sorry I have to performance test case. I checked my assumption using
>> the debugger.
>>
>> I used the following main() with a loop over the process call to
>> check if the result spec is recomputed each time.
>> The descriptor is the same as used in the capabilityLanguageFlow test
>> case of the uimaj-core project.
>> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>>
>> Maybe when iterating more than 10 times will give you performance
>> numbers before and after. Maybe adding additional capabilities
>> that must be analyzed will increase the time used to compute the
>> result spec. I will look at this tomorrow.
>>
>> public static void main(String[] args) {
>>
>> AnalysisEngine ae = null;
>> try {
>>
>> String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>
>> XMLInputSource in = new
>> XMLInputSource(JUnitExtension.getFile(desc));
>> ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>> .parseResourceSpecifier(in);
>> ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>> CAS cas = ae.newCAS();
>> String text = "Hello world!";
>> cas.setDocumentText(text);
>> cas.setDocumentLanguage("en");
>> for (int i = 0; i < 10; i++) {
>> ae.process(cas);
>> }
>> } catch (Exception ex) {
>> ex.printStackTrace();
>> }
>> }
>>
>> -- Michael
> When setting the loop counter to 1000 I have 6000ms without
> recomputing the result spec and
> 27000ms when recomputing the result spec. I think this should be
> sufficient for testing.
>
> -- Michael
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> I think my change is ready for code review. I kept all the
> idiosyncratic behavior of the old code, so users should not notice any
> difference. All the tests run, and test case above runs at the 6000ms
> range.
> There are 3 areas changed:
> 1) ResultSpecification_impl is restructured for speed and smaller
> memory footprint
> 2) The "compiling" of this is deferred till the latest possible point;
> operations that can be done with the uncompiled form are done that way.
> 3) The code in the CapabilityLanguageFlow where it returns a next step
> now caches the result spec by component key, and only sends it down if
> it is different from what this controller sent the last time in
> invoked this component in the flow.
> This test depends on the precomputed result specs kept in the mTable
> variable being constant - which I believe they are (once they are
> computed) - but Michael -can you confirm this?
Yes the mTable variable contains the precomputed result specs for
sequence engines. These result specs are constant and do not change
during the processing. The computation is done based on the output types
of the aggregate that defines the capabilityLanguageFlow. If the result
spec is passed in by the process method, the precomputed mTable cannot
be used since then results that should be may be different from the
aggregate output types.
> With this change, the code in the framework to "intersect" the result
> spec with a component's output capabilities, by language, is not
> redone on every call, but only when the language changes. That code
> (to do the intersection) is running faster, in any case, due to the
> restructuring.
>
> Because this is a big change it would be good to do a code review of
> some kind - any thoughts on how to do this?
I hoped that Adam could look at this, since he know the code best from
my point of view. All the capabilityLanguageFlow related items has been
discussed already on the list in detail and I think now we also have
some good tests for this.
If the code is checked in I can run again my performance tests to check
the performance improvements.
Opinions?
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Michael Baessler wrote:
> Michael Baessler wrote:
>> Adam Lally wrote:
>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de>
>>> wrote:
>>>
>>>> I tried to figure out how the ResultSpecification handling in
>>>> uima-core
>>>> works with all side effects to check how it can be done
>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>> to, there are to much open questions where I don't know
>>>> exactly if it is right in any case ... :-(
>>>>
>>>> Adam can you please look at this issue?
>>>>
>>>>
>>>
>>> I can try to take a look, but I don't have a lot of time. Do you have
>>> a test case for this, where you expect I would see a significant
>>> performance improvement if I fix this?
>>>
>> Sorry I have to performance test case. I checked my assumption using
>> the debugger.
>>
>> I used the following main() with a loop over the process call to
>> check if the result spec is recomputed each time.
>> The descriptor is the same as used in the capabilityLanguageFlow test
>> case of the uimaj-core project.
>> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>>
>> Maybe when iterating more than 10 times will give you performance
>> numbers before and after. Maybe adding additional capabilities
>> that must be analyzed will increase the time used to compute the
>> result spec. I will look at this tomorrow.
>>
>> public static void main(String[] args) {
>>
>> AnalysisEngine ae = null;
>> try {
>>
>> String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>
>> XMLInputSource in = new
>> XMLInputSource(JUnitExtension.getFile(desc));
>> ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>> .parseResourceSpecifier(in);
>> ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>> CAS cas = ae.newCAS();
>> String text = "Hello world!";
>> cas.setDocumentText(text);
>> cas.setDocumentLanguage("en");
>> for (int i = 0; i < 10; i++) {
>> ae.process(cas);
>> }
>> } catch (Exception ex) {
>> ex.printStackTrace();
>> }
>> }
>>
>> -- Michael
> When setting the loop counter to 1000 I have 6000ms without
> recomputing the result spec and
> 27000ms when recomputing the result spec. I think this should be
> sufficient for testing.
I think my change is ready for code review. I kept all the
idiosyncratic behavior of the old code, so users should not notice any
difference. All the tests run, and test case above runs at the 6000ms
range.
There are 3 areas changed:
1) ResultSpecification_impl is restructured for speed and smaller memory
footprint
2) The "compiling" of this is deferred till the latest possible point;
operations that can be done with the uncompiled form are done that way.
3) The code in the CapabilityLanguageFlow where it returns a next step
now caches the result spec by component key, and only sends it down if
it is different from what this controller sent the last time in invoked
this component in the flow.
This test depends on the precomputed result specs kept in the mTable
variable being constant - which I believe they are (once they are
computed) - but Michael -can you confirm this?
With this change, the code in the framework to "intersect" the result
spec with a component's output capabilities, by language, is not redone
on every call, but only when the language changes. That code (to do the
intersection) is running faster, in any case, due to the restructuring.
Because this is a big change it would be good to do a code review of
some kind - any thoughts on how to do this?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Michael Baessler wrote:
> Adam Lally wrote:
>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de>
>> wrote:
>>
>>> I tried to figure out how the ResultSpecification handling in uima-core
>>> works with all side effects to check how it can be done
>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>> to, there are to much open questions where I don't know
>>> exactly if it is right in any case ... :-(
>>>
>>> Adam can you please look at this issue?
>>>
>>>
>>
>> I can try to take a look, but I don't have a lot of time. Do you have
>> a test case for this, where you expect I would see a significant
>> performance improvement if I fix this?
>>
> Sorry I have to performance test case. I checked my assumption using
> the debugger.
>
> I used the following main() with a loop over the process call to check
> if the result spec is recomputed each time.
> The descriptor is the same as used in the capabilityLanguageFlow test
> case of the uimaj-core project.
> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>
> Maybe when iterating more than 10 times will give you performance
> numbers before and after. Maybe adding additional capabilities
> that must be analyzed will increase the time used to compute the
> result spec. I will look at this tomorrow.
>
> public static void main(String[] args) {
>
> AnalysisEngine ae = null;
> try {
>
> String desc = "SequencerCapabilityLanguageAggregateES.xml";
>
> XMLInputSource in = new
> XMLInputSource(JUnitExtension.getFile(desc));
> ResourceSpecifier specifier = UIMAFramework.getXMLParser()
> .parseResourceSpecifier(in);
> ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
> CAS cas = ae.newCAS();
> String text = "Hello world!";
> cas.setDocumentText(text);
> cas.setDocumentLanguage("en");
> for (int i = 0; i < 10; i++) {
> ae.process(cas);
> }
> } catch (Exception ex) {
> ex.printStackTrace();
> }
> }
>
> -- Michael
When setting the loop counter to 1000 I have 6000ms without recomputing
the result spec and
27000ms when recomputing the result spec. I think this should be
sufficient for testing.
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>
>> I tried to figure out how the ResultSpecification handling in uima-core
>> works with all side effects to check how it can be done
>> to detect when a ResultSpec has changed. Unfortunately I was not able
>> to, there are to much open questions where I don't know
>> exactly if it is right in any case ... :-(
>>
>> Adam can you please look at this issue?
>>
>>
>
> I can try to take a look, but I don't have a lot of time. Do you have
> a test case for this, where you expect I would see a significant
> performance improvement if I fix this?
>
Sorry I have to performance test case. I checked my assumption using the
debugger.
I used the following main() with a loop over the process call to check
if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test
case of the uimaj-core project.
Maybe a sysout helps to detect if the unnecessary calls are done or not.
Maybe when iterating more than 10 times will give you performance
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the result
spec. I will look at this tomorrow.
public static void main(String[] args) {
AnalysisEngine ae = null;
try {
String desc = "SequencerCapabilityLanguageAggregateES.xml";
XMLInputSource in = new
XMLInputSource(JUnitExtension.getFile(desc));
ResourceSpecifier specifier = UIMAFramework.getXMLParser()
.parseResourceSpecifier(in);
ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
CAS cas = ae.newCAS();
String text = "Hello world!";
cas.setDocumentText(text);
cas.setDocumentLanguage("en");
for (int i = 0; i < 10; i++) {
ae.process(cas);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> I tried to figure out how the ResultSpecification handling in uima-core
> works with all side effects to check how it can be done
> to detect when a ResultSpec has changed. Unfortunately I was not able
> to, there are to much open questions where I don't know
> exactly if it is right in any case ... :-(
>
> Adam can you please look at this issue?
>
I can try to take a look, but I don't have a lot of time. Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?
-Adam
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Dec 18, 2007 8:55 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>
>> Hi,
>> I got the request on my table that the computation of the result spec
>> for the capabilityLanguageFlow takes to much time.
>> I looked at the code and found something interesting... maybe I'm wrong,
>> I'm not sure.
>>
>> When looking at the ASB_impl.java at processUntilNextOutputCas() I found
>> the following:
>>
>> //check if we have to set result spec, to support
>> capability language flow
>> if (nextStep instanceof SimpleStepWithResultSpec) {
>> ResultSpecification rs =
>> ((SimpleStepWithResultSpec)nextStep).getResultSpecification();
>> if (rs != null) {
>> nextAe.setResultSpecification(rs);
>> }
>> }
>> // invoke next AE in flow
>> CasIterator casIter = null;
>> CAS outputCas = null; //used if the AE we call outputs a
>> new CAS
>> try {
>> casIter = nextAe.processAndOutputNewCASes(cas);
>>
>> When a capabilityLanguageFlow is used, the ResultSpec for the flow
>> engines are precomputed if possible. The code above takes this
>> precomputed ResultSpec from the flow node and set it for the current AE.
>>
>> When I go deeper to
>>
>> casIter = nextAe.processAndOutputNewCASes(cas);
>>
>> I found in the PrimitiveAnalysisEngine_impl.java class in the
>> callAnalysisComponentProcess() method the following:
>>
>> if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
>> mLastTypeSystem = view.getTypeSystem();
>> mCurrentResultSpecification.compile(mLastTypeSystem);
>> // the actual ResultSpec we send to the component is formed by
>> // looking at this primitive AE's declared output types and
>> eliminiating
>> // any that are not in mCurrentResultSpecification.
>> ResultSpecification analysisComponentResultSpec =
>> computeAnalysisComponentResultSpec(
>> mCurrentResultSpecification,
>> getAnalysisEngineMetaData().getCapabilities());
>> // compile result spec - necessary to get type subsumption to
>> work properly
>> analysisComponentResultSpec.compile(mLastTypeSystem);
>>
>> mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
>> mResultSpecChanged = false;
>> }
>>
>> any time when the ResultSpec changed, the ResultSpec is recomputed. But
>> the ResultSpec is changed any time when setResultSpecification() is called.
>> So what does this mean. The first code fragment in the email shows how
>> to get the ResultSpec from the flow controller and set it on the AE.
>> - So the result spec changed - The second code fragment shows what is
>> executed if the ResultSpec has been changed and how it is recomputed.
>> This means that the ResultSpec is recomputed each time process is
>> called. I don't think this is necessary.
>>
>>
>
> That seems like a good analysis of the situation. I think what we
> need is to detect when the ResultSpecification has actually changed
> and when it hasn't. That might be tricky to do right. If we just
> check if the new ResultSpecification is == to the existing
> ResultSpecification, that wouldn't work if the ResultSpecification had
> been modified (it would be == but the contents wouldn't be the same).
> Perhaps we could add a dirty flag to the ResultSpecification to catch
> this.
I tried to figure out how the ResultSpecification handling in uima-core
works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(
Adam can you please look at this issue?
Thanks Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Adam Lally <al...@alum.rpi.edu>.
On Dec 18, 2007 8:55 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> Hi,
> I got the request on my table that the computation of the result spec
> for the capabilityLanguageFlow takes to much time.
> I looked at the code and found something interesting... maybe I'm wrong,
> I'm not sure.
>
> When looking at the ASB_impl.java at processUntilNextOutputCas() I found
> the following:
>
> //check if we have to set result spec, to support
> capability language flow
> if (nextStep instanceof SimpleStepWithResultSpec) {
> ResultSpecification rs =
> ((SimpleStepWithResultSpec)nextStep).getResultSpecification();
> if (rs != null) {
> nextAe.setResultSpecification(rs);
> }
> }
> // invoke next AE in flow
> CasIterator casIter = null;
> CAS outputCas = null; //used if the AE we call outputs a
> new CAS
> try {
> casIter = nextAe.processAndOutputNewCASes(cas);
>
> When a capabilityLanguageFlow is used, the ResultSpec for the flow
> engines are precomputed if possible. The code above takes this
> precomputed ResultSpec from the flow node and set it for the current AE.
>
> When I go deeper to
>
> casIter = nextAe.processAndOutputNewCASes(cas);
>
> I found in the PrimitiveAnalysisEngine_impl.java class in the
> callAnalysisComponentProcess() method the following:
>
> if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
> mLastTypeSystem = view.getTypeSystem();
> mCurrentResultSpecification.compile(mLastTypeSystem);
> // the actual ResultSpec we send to the component is formed by
> // looking at this primitive AE's declared output types and
> eliminiating
> // any that are not in mCurrentResultSpecification.
> ResultSpecification analysisComponentResultSpec =
> computeAnalysisComponentResultSpec(
> mCurrentResultSpecification,
> getAnalysisEngineMetaData().getCapabilities());
> // compile result spec - necessary to get type subsumption to
> work properly
> analysisComponentResultSpec.compile(mLastTypeSystem);
>
> mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
> mResultSpecChanged = false;
> }
>
> any time when the ResultSpec changed, the ResultSpec is recomputed. But
> the ResultSpec is changed any time when setResultSpecification() is called.
> So what does this mean. The first code fragment in the email shows how
> to get the ResultSpec from the flow controller and set it on the AE.
> - So the result spec changed - The second code fragment shows what is
> executed if the ResultSpec has been changed and how it is recomputed.
> This means that the ResultSpec is recomputed each time process is
> called. I don't think this is necessary.
>
That seems like a good analysis of the situation. I think what we
need is to detect when the ResultSpecification has actually changed
and when it hasn't. That might be tricky to do right. If we just
check if the new ResultSpecification is == to the existing
ResultSpecification, that wouldn't work if the ResultSpecification had
been modified (it would be == but the contents wouldn't be the same).
Perhaps we could add a dirty flag to the ResultSpecification to catch
this.
> Beyond that it seems to me that the ResultsSpec
> mCurrentResultSpecification
> and the computed ResultSpec
> analysisComponentResultSpec
> have the same content.
>
Not in all cases. The computeAnalysisComponentResultSpec() method
does an intersection of the ResultSpec with the component's output
capabilities. I suppose with CapabilityLanguageFlow, it would never
output any type that's not in the component's output capabilities.
However think of the case of a nested aggregate where
CapabilityLanguageFlow is used in the outermost aggregate. This would
cause setResultSpecification to be called on the sub-aggregate. That
in turn causes the ResultSpecificaiton for each annotator to be
computed by the intersection of the sub-aggregate's
ResultSpecification with that annotator's output capabilities.
-Adam
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
LeHouillier, Frank D. wrote:
> While making this change wouldn't affect us in any way as I can see now,
> it would still be possible to use the Features in the Result Spec in a
> similar way.
>
> Suppose you have an information extraction component that extracts
> entities with attributes and you want to control which attributes are
> actually being added to the CAS with the Result Spec. You might have
> type Person, with a range of features such as Address, Phone number,
> Age, etc. some of which you want to output in a given configuration and
> others not. Suppose the information extraction component also extracts
> attributes which are so useless that you don't include them as features
> in the type system at all such as an internal id number. Currently,
> with a compiled Result Spec you could have the annotator look up the
> feature on the basis of the name of the feature and then you could
> reliably instantiate the feature without further ado. After your
> change, the feature would have to be checked to see if it actually
> exists.
>
We added code in the actual change that now checks to see if the feature
actually exists (for a "compiled" Result Spec). I thought it was better
to preserve the status quo here, rather than remove this check (for
performance reasons). It didn't seem like it would have any measurable
performance impact - it's one hash table lookup, basically.
Cheers. -Marshall
> Again, this doesn't seem like it is that big a deal to me but I thought
> I might just point out that it might have a use case. In practice, it
> seems to me that most annotators figure out the features available
> either during compilation by using the JCas or during the initialization
> of the Annotator.
>
> -----Original Message-----
> From: Marshall Schor [mailto:msa@schor.com]
> Sent: Friday, January 25, 2008 3:57 PM
> To: uima-dev@incubator.apache.org
> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>
> LeHouillier, Frank D. wrote:
>
>> We have an annotator that wraps a black box information extraction
>> component that can return objects of a variety of types. We check the
>> result specification to see if the object is something we want to
>>
> output
>
>> based the actual string of the name of the type. If you take away the
>> compiled version of the ResultSpecification then we will have to also
>> check whether the type that we get back from the type system is null
>>
> or
>
>> not.
>>
> Hi Frank -
>
> This change would *not* take away the compiled version of the Result
> Spec. It would only change 1 behavior - that of returning "true" if a
> *feature* (not a type, as in your example above) was associated with a
> type where the capability was marked "allAnnotatorFeatures", even if the
>
> Feature didn't exist.
>
> Suppose you had a type T1, and a type T2 whose super-type was T1, and
> features T1:f1 T2:f2, with an output capability = T1 with
> allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and
> feature T3:f3, and the output capability including T3 with
> allAnnotatorFeatures = false
>
>
> Here's the current behavior:
>
> Before compile: The following would all return true except as marked:
> containsType(T1)
> containsType(T2) << returns false, T2 not in output capability, and
> before compile, T2 isn't recognized as a subtype of T1
> containsType(T2:f2) << returns false, not in output, etc.
> containsFeature(T1:f1)
> containsFeature(T1:asdfasdfasdfasdf) <<< yes... that's what it does -
>
> it ignores the actual feature name because allAnnotatorFeatures is true
>
> After compile the following return true except as marked:
> containsType(T1)
> containsType(T2) << T2 not in output capability, but is recognized
> as a subtype of T1
> containsType(T2:f2) << T1's *allAnnotatorFeatures* is "inherited"
> containsFeature(T1:f1)
> containsFeature(T1:asdfasdfasdfasdf) << false: the actual features
> are looked up
>
> After the change I'm proposing, everything would be same except that
> containsFeature(T1:asdfasdfasdfasdf) would return true.
>
> I don't think this would affect the way you are using result specs, but
> please let me know if I've misunderstood something. We don't want to
> impact users with this change.
>
> Thanks for your comments :-)
>
> -Marshall
>
>> -----Original Message-----
>> From: Marshall Schor [mailto:msa@schor.com]
>> Sent: Friday, January 25, 2008 5:06 AM
>> To: uima-dev@incubator.apache.org
>> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>>
>> The implementation for checking if a feature is in the result spec
>>
> does
>
>> the following:
>>
>> If the result-spec is not "compiled", it says the feature is present
>>
> if
>
>> it specifically put in, or if its type has the allAnnotatorFeatures
>>
> flag
>
>> set.
>>
>> If the result-spec is "compiled", it says the feature is present if it
>>
>
>
>> is specifically put in, or if its type has the allAnnotatorFeatures
>>
> flag
>
>> set and the feature exists in the type system.
>>
>> For performance / space reasons, I'd like to drop the 2nd case; this
>> would have the consequence of changing the result spec to return true
>> for features not in the type system where the type had the
>> allAnnotatorFeatures flag set. This case shouldn't come up in
>>
> practice
>
>> because I can't think of good reason an annotator would ask if a
>>
> feature
>
>> not in its type system was present.
>>
>> Any objections?
>>
>> -Marshall
>>
>>
>>
>>
>
>
>
>
RE: capabilityLangugaeFlow - computeResultSpec
Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.
While making this change wouldn't affect us in any way as I can see now,
it would still be possible to use the Features in the Result Spec in a
similar way.
Suppose you have an information extraction component that extracts
entities with attributes and you want to control which attributes are
actually being added to the CAS with the Result Spec. You might have
type Person, with a range of features such as Address, Phone number,
Age, etc. some of which you want to output in a given configuration and
others not. Suppose the information extraction component also extracts
attributes which are so useless that you don't include them as features
in the type system at all such as an internal id number. Currently,
with a compiled Result Spec you could have the annotator look up the
feature on the basis of the name of the feature and then you could
reliably instantiate the feature without further ado. After your
change, the feature would have to be checked to see if it actually
exists.
Again, this doesn't seem like it is that big a deal to me but I thought
I might just point out that it might have a use case. In practice, it
seems to me that most annotators figure out the features available
either during compilation by using the JCas or during the initialization
of the Annotator.
-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com]
Sent: Friday, January 25, 2008 3:57 PM
To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec
LeHouillier, Frank D. wrote:
> We have an annotator that wraps a black box information extraction
> component that can return objects of a variety of types. We check the
> result specification to see if the object is something we want to
output
> based the actual string of the name of the type. If you take away the
> compiled version of the ResultSpecification then we will have to also
> check whether the type that we get back from the type system is null
or
> not.
Hi Frank -
This change would *not* take away the compiled version of the Result
Spec. It would only change 1 behavior - that of returning "true" if a
*feature* (not a type, as in your example above) was associated with a
type where the capability was marked "allAnnotatorFeatures", even if the
Feature didn't exist.
Suppose you had a type T1, and a type T2 whose super-type was T1, and
features T1:f1 T2:f2, with an output capability = T1 with
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and
feature T3:f3, and the output capability including T3 with
allAnnotatorFeatures = false
Here's the current behavior:
Before compile: The following would all return true except as marked:
containsType(T1)
containsType(T2) << returns false, T2 not in output capability, and
before compile, T2 isn't recognized as a subtype of T1
containsType(T2:f2) << returns false, not in output, etc.
containsFeature(T1:f1)
containsFeature(T1:asdfasdfasdfasdf) <<< yes... that's what it does -
it ignores the actual feature name because allAnnotatorFeatures is true
After compile the following return true except as marked:
containsType(T1)
containsType(T2) << T2 not in output capability, but is recognized
as a subtype of T1
containsType(T2:f2) << T1's *allAnnotatorFeatures* is "inherited"
containsFeature(T1:f1)
containsFeature(T1:asdfasdfasdfasdf) << false: the actual features
are looked up
After the change I'm proposing, everything would be same except that
containsFeature(T1:asdfasdfasdfasdf) would return true.
I don't think this would affect the way you are using result specs, but
please let me know if I've misunderstood something. We don't want to
impact users with this change.
Thanks for your comments :-)
-Marshall
>
> -----Original Message-----
> From: Marshall Schor [mailto:msa@schor.com]
> Sent: Friday, January 25, 2008 5:06 AM
> To: uima-dev@incubator.apache.org
> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>
> The implementation for checking if a feature is in the result spec
does
> the following:
>
> If the result-spec is not "compiled", it says the feature is present
if
> it specifically put in, or if its type has the allAnnotatorFeatures
flag
>
> set.
>
> If the result-spec is "compiled", it says the feature is present if it
> is specifically put in, or if its type has the allAnnotatorFeatures
flag
>
> set and the feature exists in the type system.
>
> For performance / space reasons, I'd like to drop the 2nd case; this
> would have the consequence of changing the result spec to return true
> for features not in the type system where the type had the
> allAnnotatorFeatures flag set. This case shouldn't come up in
practice
> because I can't think of good reason an annotator would ask if a
feature
>
> not in its type system was present.
>
> Any objections?
>
> -Marshall
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
LeHouillier, Frank D. wrote:
> We have an annotator that wraps a black box information extraction
> component that can return objects of a variety of types. We check the
> result specification to see if the object is something we want to output
> based the actual string of the name of the type. If you take away the
> compiled version of the ResultSpecification then we will have to also
> check whether the type that we get back from the type system is null or
> not.
Hi Frank -
This change would *not* take away the compiled version of the Result
Spec. It would only change 1 behavior - that of returning "true" if a
*feature* (not a type, as in your example above) was associated with a
type where the capability was marked "allAnnotatorFeatures", even if the
Feature didn't exist.
Suppose you had a type T1, and a type T2 whose super-type was T1, and
features T1:f1 T2:f2, with an output capability = T1 with
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and
feature T3:f3, and the output capability including T3 with
allAnnotatorFeatures = false
Here's the current behavior:
Before compile: The following would all return true except as marked:
containsType(T1)
containsType(T2) << returns false, T2 not in output capability, and
before compile, T2 isn't recognized as a subtype of T1
containsType(T2:f2) << returns false, not in output, etc.
containsFeature(T1:f1)
containsFeature(T1:asdfasdfasdfasdf) <<< yes... that's what it does -
it ignores the actual feature name because allAnnotatorFeatures is true
After compile the following return true except as marked:
containsType(T1)
containsType(T2) << T2 not in output capability, but is recognized
as a subtype of T1
containsType(T2:f2) << T1's *allAnnotatorFeatures* is "inherited"
containsFeature(T1:f1)
containsFeature(T1:asdfasdfasdfasdf) << false: the actual features
are looked up
After the change I'm proposing, everything would be same except that
containsFeature(T1:asdfasdfasdfasdf) would return true.
I don't think this would affect the way you are using result specs, but
please let me know if I've misunderstood something. We don't want to
impact users with this change.
Thanks for your comments :-)
-Marshall
>
> -----Original Message-----
> From: Marshall Schor [mailto:msa@schor.com]
> Sent: Friday, January 25, 2008 5:06 AM
> To: uima-dev@incubator.apache.org
> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>
> The implementation for checking if a feature is in the result spec does
> the following:
>
> If the result-spec is not "compiled", it says the feature is present if
> it specifically put in, or if its type has the allAnnotatorFeatures flag
>
> set.
>
> If the result-spec is "compiled", it says the feature is present if it
> is specifically put in, or if its type has the allAnnotatorFeatures flag
>
> set and the feature exists in the type system.
>
> For performance / space reasons, I'd like to drop the 2nd case; this
> would have the consequence of changing the result spec to return true
> for features not in the type system where the type had the
> allAnnotatorFeatures flag set. This case shouldn't come up in practice
> because I can't think of good reason an annotator would ask if a feature
>
> not in its type system was present.
>
> Any objections?
>
> -Marshall
>
>
>
RE: capabilityLangugaeFlow - computeResultSpec
Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.
We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types. We check the
result specification to see if the object is something we want to output
based the actual string of the name of the type. If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null or
not. It isn't terribly onerous to have to check for null but it does
actually take some code modification and this situation might be present
in other people's analysis engines too.
-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com]
Sent: Friday, January 25, 2008 5:06 AM
To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec
The implementation for checking if a feature is in the result spec does
the following:
If the result-spec is not "compiled", it says the feature is present if
it specifically put in, or if its type has the allAnnotatorFeatures flag
set.
If the result-spec is "compiled", it says the feature is present if it
is specifically put in, or if its type has the allAnnotatorFeatures flag
set and the feature exists in the type system.
For performance / space reasons, I'd like to drop the 2nd case; this
would have the consequence of changing the result spec to return true
for features not in the type system where the type had the
allAnnotatorFeatures flag set. This case shouldn't come up in practice
because I can't think of good reason an annotator would ask if a feature
not in its type system was present.
Any objections?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
The implementation for checking if a feature is in the result spec does
the following:
If the result-spec is not "compiled", it says the feature is present if
it specifically put in, or if its type has the allAnnotatorFeatures flag
set.
If the result-spec is "compiled", it says the feature is present if it
is specifically put in, or if its type has the allAnnotatorFeatures flag
set and the feature exists in the type system.
For performance / space reasons, I'd like to drop the 2nd case; this
would have the consequence of changing the result spec to return true
for features not in the type system where the type had the
allAnnotatorFeatures flag set. This case shouldn't come up in practice
because I can't think of good reason an annotator would ask if a feature
not in its type system was present.
Any objections?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Some corner cases.
Case 1: If using the method to alter an existing result spec by adding
a single type with an associated set of languages, the passed in
"allAnnotatorFeatures" boolean will now be "unioned" with any existing
setting of this. Javadocs updated to reflect this.
Case 2: If you have a capability for language 1 which says output type A
(not all features), and have another capability for language 2 which
says output type A (allAnnotatorFeatures), this will be represented in
the result spec by having language 1 also be for all features.
Case 3: when setting the result spec, passing null in as the value of
the languages (for those set/add things that take language arrays) will
be equivalent to passing in the one language x-unspecified. So, in
particular, if a spec says produce type A for lang 1 and 2, and then you
use the addResultType(for type A, null-passed-in-for-language-spec) this
will add the language x-unspecified for type A.
I will attempt to document these in the Javadocs. Please post a
response if these corner cases need to be handled differently.
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Eddie Epstein <ea...@gmail.com>.
Very possible that results specification doesn't work correctly
through the JNI. Nobody has ever used them in C++ since I've been
working with it.
Eddie
On Wed, Jan 23, 2008 at 4:02 PM, Marshall Schor <ms...@schor.com> wrote:
> Eddie - this is for you to check I think:
>
> There is code in UimacppEngine in method serializeResultSpecification
> which adds result spec types and features to 2 IntVector arrays (one for
> Types, one for Features). As currently designed, these "miss" getting
> the subtypes of types, and all the features for types marked with the
> all-features flag in the capabilities.
>
> Are these required here?
>
> Also, I notice that the result spec supports "languages" - but the
> serialization for this doesn't support languages. Is that intended?
>
> -Marshall
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Eddie - this is for you to check I think:
There is code in UimacppEngine in method serializeResultSpecification
which adds result spec types and features to 2 IntVector arrays (one for
Types, one for Features). As currently designed, these "miss" getting
the subtypes of types, and all the features for types marked with the
all-features flag in the capabilities.
Are these required here?
Also, I notice that the result spec supports "languages" - but the
serialization for this doesn't support languages. Is that intended?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> I'm thinking of simplifying the CapabilityContainer class. Right now
> it has code to process input and well as output capabilities, but the
> input ones appear never to be used. Can anyone confirm that? If
> confirmed, I would propose to remove the part related to input
> capabilities.
Currently I think that is true. The idea behind this CapabilityContainer
was that maybe someone can create an sophisticated flow the computes the
best sequence for the engines based on their input and output
capabilities... But if that is needed we also add the input capabilities
again. :-)
>
> There is a HashMap, outputToFCapability, whose keys are Strings
> corresponding to an output type-or-feature name, for any language, for
> any capability-set. The values do not seem to be used. I'd like to
> replace this with a hashSet. Any objections?
Yes, that seems to be correct.
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
I'm thinking of simplifying the CapabilityContainer class. Right now it
has code to process input and well as output capabilities, but the input
ones appear never to be used. Can anyone confirm that? If confirmed, I
would propose to remove the part related to input capabilities.
There is a HashMap, outputToFCapability, whose keys are Strings
corresponding to an output type-or-feature name, for any language, for
any capability-set. The values do not seem to be used. I'd like to
replace this with a hashSet. Any objections?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
The code which checks if a type or feature is in a result spec, for a
particular language, always includes generalizing the language specifier
by dropping the part beyond the first "-". For example, "en-us" and
"en-uk" are simplified to en. Because of this, I'm thinking of
shrinking the result specification (for performance / space reasons) by
"normalizing" any language specs it uses by dropping the country
extensions, if present.
Any objections?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Jan 24, 2008 9:51 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>
>>> Without looking at the code, I didn't understand why this is a
>>> consequence of the behavior you described above. I thought you said
>>> "and if the type has subtypes, it adds those too"? Anyway, I
>>> definitely think that this should work. By the definition of subtype,
>>> A-subtype *IS A* A. So if an aggregate wants type A produced, then
>>> A-subtype should be produced.
>>>
>> Why should an ae or a flow produce A-subtype when only A is required?
>>
>>
>
> Because an instance of A-subtype is also by definition an instance of
> A. Say a downstream annotator wants input type Person. I have
> upstream annotators that can produce instances of GovernmentOfficial,
> Actor, and Author, all of which are subtypes of Person. Shouldn't the
> upstream annotator produce these types?
From my point of view, when using the capabilityLanguageFlow the
application must specify all three or four
person subtypes when they should occur in the result. I think this is
flow specific, another flow can it do different.
I absolutely agree that the result spec that is responsible for "what
can be produced" should contain all types automatically if
the Person type is added.
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 24, 2008 9:51 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> > Without looking at the code, I didn't understand why this is a
> > consequence of the behavior you described above. I thought you said
> > "and if the type has subtypes, it adds those too"? Anyway, I
> > definitely think that this should work. By the definition of subtype,
> > A-subtype *IS A* A. So if an aggregate wants type A produced, then
> > A-subtype should be produced.
> Why should an ae or a flow produce A-subtype when only A is required?
>
Because an instance of A-subtype is also by definition an instance of
A. Say a downstream annotator wants input type Person. I have
upstream annotators that can produce instances of GovernmentOfficial,
Actor, and Author, all of which are subtypes of Person. Shouldn't the
upstream annotator produce these types?
-Adam
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Jan 24, 2008 7:54 AM, Marshall Schor <ms...@schor.com> wrote:
>
>> If you recall, the compile method for results specifications augments
>> the set of types/features by doing 2 things: if the type has
>> allAnnotatorFeatures=true, it adds all the features of the type; and if
>> the type has subtypes, it adds those too, propagating the
>> allAnnotatorFeatures processing down.
>>
>> A consequence would be that the mFlowTable would miss these cases:
>>
>> An aggregate wants type A output, and has a delegate with output
>> capability A-subtype.
>>
>>
>
> Without looking at the code, I didn't understand why this is a
> consequence of the behavior you described above. I thought you said
> "and if the type has subtypes, it adds those too"? Anyway, I
> definitely think that this should work. By the definition of subtype,
> A-subtype *IS A* A. So if an aggregate wants type A produced, then
> A-subtype should be produced.
Why should an ae or a flow produce A-subtype when only A is required?
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
The thing that adds allAnnotatorFeatures and subtypes is "compiling" the
result spec. The builder of the mFlowTable doesn't compile the
resultspec before using it - so it doesn't have these consequences.
-Marshall
Adam Lally wrote:
> On Jan 24, 2008 7:54 AM, Marshall Schor <ms...@schor.com> wrote:
>
>> If you recall, the compile method for results specifications augments
>> the set of types/features by doing 2 things: if the type has
>> allAnnotatorFeatures=true, it adds all the features of the type; and if
>> the type has subtypes, it adds those too, propagating the
>> allAnnotatorFeatures processing down.
>>
>> A consequence would be that the mFlowTable would miss these cases:
>>
>> An aggregate wants type A output, and has a delegate with output
>> capability A-subtype.
>>
>>
>
> Without looking at the code, I didn't understand why this is a
> consequence of the behavior you described above. I thought you said
> "and if the type has subtypes, it adds those too"? Anyway, I
> definitely think that this should work. By the definition of subtype,
> A-subtype *IS A* A. So if an aggregate wants type A produced, then
> A-subtype should be produced.
>
>
>> An aggregate wants Feature F output, and has a delegate with output
>> capability type-A with allAnnotatorFeatures marked, having that feature.
>>
>>
>
> We should be supporting this as well. Again I didn't follow why the
> behavior you described above doesn't do this.
>
> -Adam
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 24, 2008 7:54 AM, Marshall Schor <ms...@schor.com> wrote:
> If you recall, the compile method for results specifications augments
> the set of types/features by doing 2 things: if the type has
> allAnnotatorFeatures=true, it adds all the features of the type; and if
> the type has subtypes, it adds those too, propagating the
> allAnnotatorFeatures processing down.
>
> A consequence would be that the mFlowTable would miss these cases:
>
> An aggregate wants type A output, and has a delegate with output
> capability A-subtype.
>
Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above. I thought you said
"and if the type has subtypes, it adds those too"? Anyway, I
definitely think that this should work. By the definition of subtype,
A-subtype *IS A* A. So if an aggregate wants type A produced, then
A-subtype should be produced.
> An aggregate wants Feature F output, and has a delegate with output
> capability type-A with allAnnotatorFeatures marked, having that feature.
>
We should be supporting this as well. Again I didn't follow why the
behavior you described above doesn't do this.
-Adam
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
From this point of view..
+1 to deprecate allAnnotatoreFeatures
-- Michael
Marshall Schor wrote:
> What about allAnnotatorFeatures? Supposed the aggregate says it needs
> a particular Feature of a particular type. Suppose a delegate is
> marked as producing that type, and has allAnnotatorFeatures marked.
> This wouldn't work.
> You could say in this case that the output capability of the delegate
> *must not* rely on allAnnotatorFeatures, but instead *must* explicitly
> list those features it produces. In one sense, this could be a good
> idea, because no delegate could *accurately* mark that it outputs
> allAnnotatorFeatures, anyway, due to the possiblity that some other
> component could add features to the type in question, completely
> unknown to this delegate - and of course, this delegate would not be
> setting those other features.
>
> This would lead to another question - should we deprecate
> allAnnotatoreFeatures because of this?
>
> -Marshall
>
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Without actually testing this (so this may be a wrong conclusion) -
>>> it seems to me that the code in CapabilityLanguageFlowController
>>> that sets up the result specs for components, by language, in the
>>> mFlowTable, ignores the typesOrFeatures that the result spec adds
>>> when compile() is called.
>>>
>>> If you recall, the compile method for results specifications
>>> augments the set of types/features by doing 2 things: if the type
>>> has allAnnotatorFeatures=true, it adds all the features of the type;
>>> and if the type has subtypes, it adds those too, propagating the
>>> allAnnotatorFeatures processing down.
>>>
>>> A consequence would be that the mFlowTable would miss these cases:
>>>
>>> An aggregate wants type A output, and has a delegate with output
>>> capability A-subtype.
>>>
>>> An aggregate wants Feature F output, and has a delegate with
>>> output capability type-A with allAnnotatorFeatures marked, having
>>> that feature.
>>>
>>> Can anyone confirm this? (perhaps adding a test case :-) )?
>>>
>>> Michael - do you know what the design intent was for this - if
>>> things are as I've conjectured above, is this something that needs
>>> to be fixed, or is it working as intended?
>> Yes that is correct. The mFlowTable only contains these output types
>> that are specified in the aggregate ae as output type. The guideline
>> for the capabilityLanguageFlow was to
>> specify all output results (with all interim results) in the
>> aggregate that must be produced.
>>
>> I we now change the mFlowTable content to match the resultSpec we
>> also changes the capabilityLanguageFlow. So if we do that, how can I
>> prevent the a sub types isn't produced if a super type must be
>> produced? So I prefer to stay with the current design - specify all
>> you need.
>>
>> What do you think?
>>
>> -- Michale
>>
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
What about allAnnotatorFeatures? Supposed the aggregate says it needs a
particular Feature of a particular type. Suppose a delegate is marked
as producing that type, and has allAnnotatorFeatures marked. This
wouldn't work.
You could say in this case that the output capability of the delegate
*must not* rely on allAnnotatorFeatures, but instead *must* explicitly
list those features it produces. In one sense, this could be a good
idea, because no delegate could *accurately* mark that it outputs
allAnnotatorFeatures, anyway, due to the possiblity that some other
component could add features to the type in question, completely unknown
to this delegate - and of course, this delegate would not be setting
those other features.
This would lead to another question - should we deprecate
allAnnotatoreFeatures because of this?
-Marshall
Michael Baessler wrote:
> Marshall Schor wrote:
>> Without actually testing this (so this may be a wrong conclusion) -
>> it seems to me that the code in CapabilityLanguageFlowController that
>> sets up the result specs for components, by language, in the
>> mFlowTable, ignores the typesOrFeatures that the result spec adds
>> when compile() is called.
>>
>> If you recall, the compile method for results specifications augments
>> the set of types/features by doing 2 things: if the type has
>> allAnnotatorFeatures=true, it adds all the features of the type; and
>> if the type has subtypes, it adds those too, propagating the
>> allAnnotatorFeatures processing down.
>>
>> A consequence would be that the mFlowTable would miss these cases:
>>
>> An aggregate wants type A output, and has a delegate with output
>> capability A-subtype.
>>
>> An aggregate wants Feature F output, and has a delegate with output
>> capability type-A with allAnnotatorFeatures marked, having that feature.
>>
>> Can anyone confirm this? (perhaps adding a test case :-) )?
>>
>> Michael - do you know what the design intent was for this - if things
>> are as I've conjectured above, is this something that needs to be
>> fixed, or is it working as intended?
> Yes that is correct. The mFlowTable only contains these output types
> that are specified in the aggregate ae as output type. The guideline
> for the capabilityLanguageFlow was to
> specify all output results (with all interim results) in the aggregate
> that must be produced.
>
> I we now change the mFlowTable content to match the resultSpec we also
> changes the capabilityLanguageFlow. So if we do that, how can I
> prevent the a sub types isn't produced if a super type must be
> produced? So I prefer to stay with the current design - specify all
> you need.
>
> What do you think?
>
> -- Michale
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Without actually testing this (so this may be a wrong conclusion) - it
> seems to me that the code in CapabilityLanguageFlowController that
> sets up the result specs for components, by language, in the
> mFlowTable, ignores the typesOrFeatures that the result spec adds when
> compile() is called.
>
> If you recall, the compile method for results specifications augments
> the set of types/features by doing 2 things: if the type has
> allAnnotatorFeatures=true, it adds all the features of the type; and
> if the type has subtypes, it adds those too, propagating the
> allAnnotatorFeatures processing down.
>
> A consequence would be that the mFlowTable would miss these cases:
>
> An aggregate wants type A output, and has a delegate with output
> capability A-subtype.
>
> An aggregate wants Feature F output, and has a delegate with output
> capability type-A with allAnnotatorFeatures marked, having that feature.
>
> Can anyone confirm this? (perhaps adding a test case :-) )?
>
> Michael - do you know what the design intent was for this - if things
> are as I've conjectured above, is this something that needs to be
> fixed, or is it working as intended?
Yes that is correct. The mFlowTable only contains these output types
that are specified in the aggregate ae as output type. The guideline for
the capabilityLanguageFlow was to
specify all output results (with all interim results) in the aggregate
that must be produced.
I we now change the mFlowTable content to match the resultSpec we also
changes the capabilityLanguageFlow. So if we do that, how can I prevent
the a sub types isn't produced if a super type must be produced? So I
prefer to stay with the current design - specify all you need.
What do you think?
-- Michale
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Without actually testing this (so this may be a wrong conclusion) - it
seems to me that the code in CapabilityLanguageFlowController that sets
up the result specs for components, by language, in the mFlowTable,
ignores the typesOrFeatures that the result spec adds when compile() is
called.
If you recall, the compile method for results specifications augments
the set of types/features by doing 2 things: if the type has
allAnnotatorFeatures=true, it adds all the features of the type; and if
the type has subtypes, it adds those too, propagating the
allAnnotatorFeatures processing down.
A consequence would be that the mFlowTable would miss these cases:
An aggregate wants type A output, and has a delegate with output
capability A-subtype.
An aggregate wants Feature F output, and has a delegate with output
capability type-A with allAnnotatorFeatures marked, having that feature.
Can anyone confirm this? (perhaps adding a test case :-) )?
Michael - do you know what the design intent was for this - if things
are as I've conjectured above, is this something that needs to be fixed,
or is it working as intended?
-Marshall
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 23, 2008 10:07 AM, Marshall Schor <ms...@schor.com> wrote:
> Given that (as far as I can tell - let's see, that would be AFAICT), the
> resultSpec is *always* used in compiled mode (because the wrapper always
> compiles it), the current implementation would have the effect that
>
> 1) the allFeatures flag would work
> 2) subtypes of a type specified in the resultSpec would also be
> implicitly in the resultSpec
>
> Therefore, to keep the implementation behavior constant (a good thing to
> try for, always :-) ) we should insure any changes continue to exhibit
> this behavior, and update the Javadocs and documentation to reflect this.
>
+1. It was certainly my intention to always compile the ResultSpec.
I believe that subtypes should be included. To not do that I think is
contrary to the expected semantics of the supertype/subtype
relationship. Plus I think it's been that way in UIMA for a long time
now.
-Adam
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Fine with me. Seems to be the way it works in the past, so we should not
change it!
-- Michael
Marshall Schor wrote:
> Given that (as far as I can tell - let's see, that would be AFAICT),
> the resultSpec is *always* used in compiled mode (because the wrapper
> always compiles it), the current implementation would have the effect
> that
>
> 1) the allFeatures flag would work
> 2) subtypes of a type specified in the resultSpec would also be
> implicitly in the resultSpec
>
> Therefore, to keep the implementation behavior constant (a good thing
> to try for, always :-) ) we should insure any changes continue to
> exhibit this behavior, and update the Javadocs and documentation to
> reflect this.
>
> Other opinions?
>
> -Marshall
>
> Michael Baessler wrote:
>>
>> Marshall Schor wrote:
>>> In looking thru the code for ResultSpecification_Impl, it seems
>>> there seems to be an inconsistency - unless I (quite possible :-) )
>>> missed something.
>>>
>>> The calls to the containsType(...) method operate in one of 2 ways,
>>> depending on whether or not the result specification has been
>>> "compiled" by calling the compile method.
>>>
>>> If the result spec has not been compiled, then containsType(...)
>>> returns true iff the type specified is "equal(...)" to a type in the
>>> Result Specification.
>>>
>>> If it has been compiled, then the containsType returns true iff the
>>> type specified is equal to a type *or any of its subtypes* in the
>>> Result Specification. This is because compiling a
>>> resultSpecification adds the subtypes.
>>>
>>> Can others confirm this? In actual use within annotators, it may be
>>> that the result spec is always compiled before use (I haven't yet
>>> traced that down).
>> Yes, you are right, when the result spec is compiled all subtypes of
>> a type are additionally added to the map. The same for features, if
>> the allAnnotationFeatures is set to true.
>>>
>>> Should the code and Javadocs be updated to have containsType return
>>> true for subtypes of types in the result spec, always?
>> I think both ways should return the same result. But which way is
>> correct? If I specify a type in the result spec is it correct that
>> all subtypes are also in?
>> If I just want to have the sub types in the result spec it is easy to
>> do, but what if I only want to have the super types in the result
>> spec without the subtypes?
>>
>> -- Michael
>>
>>
>>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
Given that (as far as I can tell - let's see, that would be AFAICT), the
resultSpec is *always* used in compiled mode (because the wrapper always
compiles it), the current implementation would have the effect that
1) the allFeatures flag would work
2) subtypes of a type specified in the resultSpec would also be
implicitly in the resultSpec
Therefore, to keep the implementation behavior constant (a good thing to
try for, always :-) ) we should insure any changes continue to exhibit
this behavior, and update the Javadocs and documentation to reflect this.
Other opinions?
-Marshall
Michael Baessler wrote:
>
> Marshall Schor wrote:
>> In looking thru the code for ResultSpecification_Impl, it seems there
>> seems to be an inconsistency - unless I (quite possible :-) ) missed
>> something.
>>
>> The calls to the containsType(...) method operate in one of 2 ways,
>> depending on whether or not the result specification has been
>> "compiled" by calling the compile method.
>>
>> If the result spec has not been compiled, then containsType(...)
>> returns true iff the type specified is "equal(...)" to a type in the
>> Result Specification.
>>
>> If it has been compiled, then the containsType returns true iff the
>> type specified is equal to a type *or any of its subtypes* in the
>> Result Specification. This is because compiling a
>> resultSpecification adds the subtypes.
>>
>> Can others confirm this? In actual use within annotators, it may be
>> that the result spec is always compiled before use (I haven't yet
>> traced that down).
> Yes, you are right, when the result spec is compiled all subtypes of a
> type are additionally added to the map. The same for features, if the
> allAnnotationFeatures is set to true.
>>
>> Should the code and Javadocs be updated to have containsType return
>> true for subtypes of types in the result spec, always?
> I think both ways should return the same result. But which way is
> correct? If I specify a type in the result spec is it correct that all
> subtypes are also in?
> If I just want to have the sub types in the result spec it is easy to
> do, but what if I only want to have the super types in the result spec
> without the subtypes?
>
> -- Michael
>
>
>
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> In looking thru the code for ResultSpecification_Impl, it seems there
> seems to be an inconsistency - unless I (quite possible :-) ) missed
> something.
>
> The calls to the containsType(...) method operate in one of 2 ways,
> depending on whether or not the result specification has been
> "compiled" by calling the compile method.
>
> If the result spec has not been compiled, then containsType(...)
> returns true iff the type specified is "equal(...)" to a type in the
> Result Specification.
>
> If it has been compiled, then the containsType returns true iff the
> type specified is equal to a type *or any of its subtypes* in the
> Result Specification. This is because compiling a resultSpecification
> adds the subtypes.
>
> Can others confirm this? In actual use within annotators, it may be
> that the result spec is always compiled before use (I haven't yet
> traced that down).
Yes, you are right, when the result spec is compiled all subtypes of a
type are additionally added to the map. The same for features, if the
allAnnotationFeatures is set to true.
>
> Should the code and Javadocs be updated to have containsType return
> true for subtypes of types in the result spec, always?
I think both ways should return the same result. But which way is
correct? If I specify a type in the result spec is it correct that all
subtypes are also in?
If I just want to have the sub types in the result spec it is easy to
do, but what if I only want to have the super types in the result spec
without the subtypes?
-- Michael
Re: capabilityLangugaeFlow - computeResultSpec
Posted by Marshall Schor <ms...@schor.com>.
In looking thru the code for ResultSpecification_Impl, it seems there
seems to be an inconsistency - unless I (quite possible :-) ) missed
something.
The calls to the containsType(...) method operate in one of 2 ways,
depending on whether or not the result specification has been "compiled"
by calling the compile method.
If the result spec has not been compiled, then containsType(...) returns
true iff the type specified is "equal(...)" to a type in the Result
Specification.
If it has been compiled, then the containsType returns true iff the type
specified is equal to a type *or any of its subtypes* in the Result
Specification. This is because compiling a resultSpecification adds the
subtypes.
Can others confirm this? In actual use within annotators, it may be
that the result spec is always compiled before use (I haven't yet traced
that down).
Should the code and Javadocs be updated to have containsType return true
for subtypes of types in the result spec, always?
-Marshall