You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by Michael Baessler <mb...@michael-baessler.de> on 2007/12/18 14:55:43 UTC

capabilityLangugaeFlow - computeResultSpec

Hi,
I got the request on my table that the computation of the result spec 
for the capabilityLanguageFlow takes to much time.
I looked at the code and found something interesting... maybe I'm wrong, 
I'm not sure.

When looking at the ASB_impl.java at processUntilNextOutputCas() I found 
the following:
        
               //check if we have to set result spec, to support 
capability language flow
                if (nextStep instanceof SimpleStepWithResultSpec) {
                  ResultSpecification rs = 
((SimpleStepWithResultSpec)nextStep).getResultSpecification();
                  if (rs != null) {
                    nextAe.setResultSpecification(rs);
                  }
                }
                // invoke next AE in flow
                CasIterator casIter = null;
                CAS outputCas = null; //used if the AE we call outputs a 
new CAS
                try {
                  casIter = nextAe.processAndOutputNewCASes(cas);

When a capabilityLanguageFlow is used, the ResultSpec for the flow 
engines are precomputed if possible. The code above takes this 
precomputed ResultSpec from the flow node and set it for the current AE.

When I go deeper to
        
     casIter = nextAe.processAndOutputNewCASes(cas);

I found in the PrimitiveAnalysisEngine_impl.java class in the 
callAnalysisComponentProcess() method the following:

        if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
          mLastTypeSystem = view.getTypeSystem();
          mCurrentResultSpecification.compile(mLastTypeSystem);
          // the actual ResultSpec we send to the component is formed by
          // looking at this primitive AE's declared output types and 
eliminiating
          // any that are not in mCurrentResultSpecification.
          ResultSpecification analysisComponentResultSpec = 
computeAnalysisComponentResultSpec(
                  mCurrentResultSpecification, 
getAnalysisEngineMetaData().getCapabilities());
          // compile result spec - necessary to get type subsumption to 
work properly
          analysisComponentResultSpec.compile(mLastTypeSystem);
          
mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
          mResultSpecChanged = false;
        }

any time when the ResultSpec changed, the ResultSpec is recomputed. But 
the ResultSpec is changed any time when setResultSpecification() is called.
So what does this mean. The first code fragment in the email shows how 
to get the ResultSpec from the flow controller and set it on the AE.
- So the result spec changed - The second code fragment shows what is 
executed if the ResultSpec has been changed and how it is recomputed.
This means that the ResultSpec is recomputed each time process is 
called. I don't think this is necessary.

Beyond that it seems to me that the ResultsSpec
       mCurrentResultSpecification
and the computed ResultSpec
       analysisComponentResultSpec
have the same content.

Opinions? Did I miss something?

-- Michael


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> The Capability Language Flow for an aggregate is computed in 
> CapabilityLanguageFlowController.computeFlowTable.
>
> This starts with the aggregates output capabilities, and figures out a 
> flow for each language, that produces all the outputs.
>
> Should this computation also include in the set of needed outputs, 
> inputs that downstream annotators need from upstream ones?  That part 
> seems to be missing in this computation?
>
> Here's an example:
>
> An aggregate G has delegates A & B.   If B needs A to produce some 
> type  T for some language, but T is not among G's outputs, but 
> something that B produces is among G's output, the flow controller 
> would need to tell A to produce T so that B could produce  the desired 
> output at the aggregate level.
>
> -Marshall
Adding the input capabilities automatically is fine with me.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
The Capability Language Flow for an aggregate is computed in 
CapabilityLanguageFlowController.computeFlowTable.

This starts with the aggregates output capabilities, and figures out a 
flow for each language, that produces all the outputs.

Should this computation also include in the set of needed outputs, 
inputs that downstream annotators need from upstream ones?  That part 
seems to be missing in this computation?

Here's an example:

An aggregate G has delegates A & B.   If B needs A to produce some type  
T for some language, but T is not among G's outputs, but something that 
B produces is among G's output, the flow controller would need to tell A 
to produce T so that B could produce  the desired output at the 
aggregate level.

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
OK, will do.

-- Michael

Marshall Schor wrote:
> Easy to see- just trace the test case...  -Marshall
>
> Michael Baessler wrote:
>> But it would still be interesting why this is never needed and how it 
>> works now.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> OK.  This would confirm that the other constructor is no longer 
>>> needed, since the test that passes a result-spec arg in the process 
>>> method no longer calls that.
>>>
>>> Thanks.  -Marshall
>>>
>>> Michael Baessler wrote:
>>>> When looking at the tests for the capability language flow I see 
>>>> both tests one with the result spec argument in the process() 
>>>> method and one without.
>>>> In older UIMA versions, when using the debugger I see that both 
>>>> constructors are used there.
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> Thanks.  I'll see about comparing the older method with the 
>>>>> current method, to verify this.  -Marshall
>>>>>
>>>>> Michael Baessler wrote:
>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>>>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>>>>> when the result was set by an application using the process 
>>>>>> method with the resultSpec argument. In the current version it 
>>>>>> seems that only the version with the precomputed FlowTable is 
>>>>>> used. But I can't say if that is correct or not since I don't 
>>>>>> know the details about the ResultSpec restructuring (maybe only 
>>>>>> Adam knows). But you are right, if this constructor isn't 
>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>
>>>>>> Seems that the architecture has changed here. :-)
>>>>>>
>>>>>> -- Michael
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> If this is removed or if it is never called, then there is a 
>>>>>>> section of the logic in CapabilityLanguageFlowObject which is 
>>>>>>> never used, because mNodeList == null:
>>>>>>>
>>>>>>>
>>>>>>> if (mNodeList != null) {
>>>>>>>  //  80 or lines of code elided
>>>>>>> }
>>>>>>>
>>>>>>> Can this logic be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined 
>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>> CapabilityLanguageFlowObject(List aNodeList, 
>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>
>>>>>>>> Can this be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
So in the older version of the capabilityLanguageFlow the inputs where 
not recognized. But I think it is not bad that these are added 
automatically since the flow can't work
if those are missing!

-- Michael

Marshall Schor wrote:
> I did this trace.  Here's how it works now, without calling this.
>
> The process(cas, result-spec) call goes to 
> AggregateAnalysisEngine_Impl which calls setResultSpecification on the 
> AEEngine_impl object, which
> 1) clones the result-spec object
> 2) adds capabilities to it from the *inputs* of all components of this 
> aggregate
> 3) uses this one cloned object as the result spec passed down to each 
> component.
>
> Before going further - Michael - a question: isn't this 
> union-with-all-inputs-behavior something you didn't want for 
> capability language flow?
>
> Maybe it doesn't matter in that the use of capability language flow is 
> not done in the real application use cases by passing the result spec 
> in the top level call to the process method of the analysis engine?
>
> -Marshall
>
> Marshall Schor wrote:
>> Easy to see- just trace the test case...  -Marshall
>>
>> Michael Baessler wrote:
>>> But it would still be interesting why this is never needed and how 
>>> it works now.
>>>
>>> -- Michael
>>>
>>> Marshall Schor wrote:
>>>> OK.  This would confirm that the other constructor is no longer 
>>>> needed, since the test that passes a result-spec arg in the process 
>>>> method no longer calls that.
>>>>
>>>> Thanks.  -Marshall
>>>>
>>>> Michael Baessler wrote:
>>>>> When looking at the tests for the capability language flow I see 
>>>>> both tests one with the result spec argument in the process() 
>>>>> method and one without.
>>>>> In older UIMA versions, when using the debugger I see that both 
>>>>> constructors are used there.
>>>>>
>>>>> -- Michael
>>>>>
>>>>> Marshall Schor wrote:
>>>>>> Thanks.  I'll see about comparing the older method with the 
>>>>>> current method, to verify this.  -Marshall
>>>>>>
>>>>>> Michael Baessler wrote:
>>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>>>>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>>>>>> when the result was set by an application using the process 
>>>>>>> method with the resultSpec argument. In the current version it 
>>>>>>> seems that only the version with the precomputed FlowTable is 
>>>>>>> used. But I can't say if that is correct or not since I don't 
>>>>>>> know the details about the ResultSpec restructuring (maybe only 
>>>>>>> Adam knows). But you are right, if this constructor isn't 
>>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>>
>>>>>>> Seems that the architecture has changed here. :-)
>>>>>>>
>>>>>>> -- Michael
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> If this is removed or if it is never called, then there is a 
>>>>>>>> section of the logic in CapabilityLanguageFlowObject which is 
>>>>>>>> never used, because mNodeList == null:
>>>>>>>>
>>>>>>>>
>>>>>>>> if (mNodeList != null) {
>>>>>>>>  //  80 or lines of code elided
>>>>>>>> }
>>>>>>>>
>>>>>>>> Can this logic be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>> Marshall Schor wrote:
>>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined 
>>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>>> CapabilityLanguageFlowObject(List aNodeList, 
>>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>>
>>>>>>>>> Can this be removed?
>>>>>>>>>
>>>>>>>>> -Marshall
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
I did this trace.  Here's how it works now, without calling this.

The process(cas, result-spec) call goes to AggregateAnalysisEngine_Impl 
which calls setResultSpecification on the AEEngine_impl object, which
1) clones the result-spec object
2) adds capabilities to it from the *inputs* of all components of this 
aggregate
3) uses this one cloned object as the result spec passed down to each 
component.

Before going further - Michael - a question: isn't this 
union-with-all-inputs-behavior something you didn't want for capability 
language flow?

Maybe it doesn't matter in that the use of capability language flow is 
not done in the real application use cases by passing the result spec in 
the top level call to the process method of the analysis engine?

-Marshall

Marshall Schor wrote:
> Easy to see- just trace the test case...  -Marshall
>
> Michael Baessler wrote:
>> But it would still be interesting why this is never needed and how it 
>> works now.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> OK.  This would confirm that the other constructor is no longer 
>>> needed, since the test that passes a result-spec arg in the process 
>>> method no longer calls that.
>>>
>>> Thanks.  -Marshall
>>>
>>> Michael Baessler wrote:
>>>> When looking at the tests for the capability language flow I see 
>>>> both tests one with the result spec argument in the process() 
>>>> method and one without.
>>>> In older UIMA versions, when using the debugger I see that both 
>>>> constructors are used there.
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> Thanks.  I'll see about comparing the older method with the 
>>>>> current method, to verify this.  -Marshall
>>>>>
>>>>> Michael Baessler wrote:
>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>>>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>>>>> when the result was set by an application using the process 
>>>>>> method with the resultSpec argument. In the current version it 
>>>>>> seems that only the version with the precomputed FlowTable is 
>>>>>> used. But I can't say if that is correct or not since I don't 
>>>>>> know the details about the ResultSpec restructuring (maybe only 
>>>>>> Adam knows). But you are right, if this constructor isn't 
>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>
>>>>>> Seems that the architecture has changed here. :-)
>>>>>>
>>>>>> -- Michael
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> If this is removed or if it is never called, then there is a 
>>>>>>> section of the logic in CapabilityLanguageFlowObject which is 
>>>>>>> never used, because mNodeList == null:
>>>>>>>
>>>>>>>
>>>>>>> if (mNodeList != null) {
>>>>>>>  //  80 or lines of code elided
>>>>>>> }
>>>>>>>
>>>>>>> Can this logic be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined 
>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>> CapabilityLanguageFlowObject(List aNodeList, 
>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>
>>>>>>>> Can this be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Here's the trace of how this works, when run from a top level 
process(cas) call:

1) the call goes to the AnalysisEngine_Impl process method, which calls 
processAndOutputNewCASes in the same object.  This calls the ASB_impl 
process method, which creates a new AggregateCasIterator(aCAS).  This 
constructor calls computeFlow on the ...asb.impl.FlowControllerContainer 
object.  This calls the particular flow controller's computeFlow 
method.  In this case, the flowController is the 
CapabilityLanguageFlowController.  Since this a new CAS coming in to the 
aggregate, the computeFlow method makes a new 
CapabilityLanguageFlowObject, passing in the pre-computed Flow Table). 

So that's how it uses this constructor, in the case where no specific 
result spec is passed.

-Marshall

Marshall Schor wrote:
> Easy to see- just trace the test case...  -Marshall
>
> Michael Baessler wrote:
>> But it would still be interesting why this is never needed and how it 
>> works now.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> OK.  This would confirm that the other constructor is no longer 
>>> needed, since the test that passes a result-spec arg in the process 
>>> method no longer calls that.
>>>
>>> Thanks.  -Marshall
>>>
>>> Michael Baessler wrote:
>>>> When looking at the tests for the capability language flow I see 
>>>> both tests one with the result spec argument in the process() 
>>>> method and one without.
>>>> In older UIMA versions, when using the debugger I see that both 
>>>> constructors are used there.
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> Thanks.  I'll see about comparing the older method with the 
>>>>> current method, to verify this.  -Marshall
>>>>>
>>>>> Michael Baessler wrote:
>>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>>>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>>>>> when the result was set by an application using the process 
>>>>>> method with the resultSpec argument. In the current version it 
>>>>>> seems that only the version with the precomputed FlowTable is 
>>>>>> used. But I can't say if that is correct or not since I don't 
>>>>>> know the details about the ResultSpec restructuring (maybe only 
>>>>>> Adam knows). But you are right, if this constructor isn't 
>>>>>> necessary both, the code and the constructor, can be removed.
>>>>>>
>>>>>> Seems that the architecture has changed here. :-)
>>>>>>
>>>>>> -- Michael
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> If this is removed or if it is never called, then there is a 
>>>>>>> section of the logic in CapabilityLanguageFlowObject which is 
>>>>>>> never used, because mNodeList == null:
>>>>>>>
>>>>>>>
>>>>>>> if (mNodeList != null) {
>>>>>>>  //  80 or lines of code elided
>>>>>>> }
>>>>>>>
>>>>>>> Can this logic be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>> Marshall Schor wrote:
>>>>>>>> The class CapabilityLanguageFlowObject has 2 defined 
>>>>>>>> constructors, but one is never used/referenced:
>>>>>>>> CapabilityLanguageFlowObject(List aNodeList, 
>>>>>>>> ResultSpecification resultSpec)
>>>>>>>>
>>>>>>>> Can this be removed?
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Easy to see- just trace the test case...  -Marshall

Michael Baessler wrote:
> But it would still be interesting why this is never needed and how it 
> works now.
>
> -- Michael
>
> Marshall Schor wrote:
>> OK.  This would confirm that the other constructor is no longer 
>> needed, since the test that passes a result-spec arg in the process 
>> method no longer calls that.
>>
>> Thanks.  -Marshall
>>
>> Michael Baessler wrote:
>>> When looking at the tests for the capability language flow I see 
>>> both tests one with the result spec argument in the process() method 
>>> and one without.
>>> In older UIMA versions, when using the debugger I see that both 
>>> constructors are used there.
>>>
>>> -- Michael
>>>
>>> Marshall Schor wrote:
>>>> Thanks.  I'll see about comparing the older method with the current 
>>>> method, to verify this.  -Marshall
>>>>
>>>> Michael Baessler wrote:
>>>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>>>> when the result was set by an application using the process method 
>>>>> with the resultSpec argument. In the current version it seems that 
>>>>> only the version with the precomputed FlowTable is used. But I 
>>>>> can't say if that is correct or not since I don't know the details 
>>>>> about the ResultSpec restructuring (maybe only Adam knows). But 
>>>>> you are right, if this constructor isn't necessary both, the code 
>>>>> and the constructor, can be removed.
>>>>>
>>>>> Seems that the architecture has changed here. :-)
>>>>>
>>>>> -- Michael
>>>>>
>>>>> Marshall Schor wrote:
>>>>>> If this is removed or if it is never called, then there is a 
>>>>>> section of the logic in CapabilityLanguageFlowObject which is 
>>>>>> never used, because mNodeList == null:
>>>>>>
>>>>>>
>>>>>> if (mNodeList != null) {
>>>>>>  //  80 or lines of code elided
>>>>>> }
>>>>>>
>>>>>> Can this logic be removed?
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>> Marshall Schor wrote:
>>>>>>> The class CapabilityLanguageFlowObject has 2 defined 
>>>>>>> constructors, but one is never used/referenced:
>>>>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
>>>>>>> resultSpec)
>>>>>>>
>>>>>>> Can this be removed?
>>>>>>>
>>>>>>> -Marshall
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
But it would still be interesting why this is never needed and how it 
works now.

-- Michael

Marshall Schor wrote:
> OK.  This would confirm that the other constructor is no longer 
> needed, since the test that passes a result-spec arg in the process 
> method no longer calls that.
>
> Thanks.  -Marshall
>
> Michael Baessler wrote:
>> When looking at the tests for the capability language flow I see both 
>> tests one with the result spec argument in the process() method and 
>> one without.
>> In older UIMA versions, when using the debugger I see that both 
>> constructors are used there.
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> Thanks.  I'll see about comparing the older method with the current 
>>> method, to verify this.  -Marshall
>>>
>>> Michael Baessler wrote:
>>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>>> when the result was set by an application using the process method 
>>>> with the resultSpec argument. In the current version it seems that 
>>>> only the version with the precomputed FlowTable is used. But I 
>>>> can't say if that is correct or not since I don't know the details 
>>>> about the ResultSpec restructuring (maybe only Adam knows). But you 
>>>> are right, if this constructor isn't necessary both, the code and 
>>>> the constructor, can be removed.
>>>>
>>>> Seems that the architecture has changed here. :-)
>>>>
>>>> -- Michael
>>>>
>>>> Marshall Schor wrote:
>>>>> If this is removed or if it is never called, then there is a 
>>>>> section of the logic in CapabilityLanguageFlowObject which is 
>>>>> never used, because mNodeList == null:
>>>>>
>>>>>
>>>>> if (mNodeList != null) {
>>>>>  //  80 or lines of code elided
>>>>> }
>>>>>
>>>>> Can this logic be removed?
>>>>>
>>>>> -Marshall
>>>>>
>>>>> Marshall Schor wrote:
>>>>>> The class CapabilityLanguageFlowObject has 2 defined 
>>>>>> constructors, but one is never used/referenced:
>>>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
>>>>>> resultSpec)
>>>>>>
>>>>>> Can this be removed?
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
OK.  This would confirm that the other constructor is no longer needed, 
since the test that passes a result-spec arg in the process method no 
longer calls that.

Thanks.  -Marshall

Michael Baessler wrote:
> When looking at the tests for the capability language flow I see both 
> tests one with the result spec argument in the process() method and 
> one without.
> In older UIMA versions, when using the debugger I see that both 
> constructors are used there.
>
> -- Michael
>
> Marshall Schor wrote:
>> Thanks.  I'll see about comparing the older method with the current 
>> method, to verify this.  -Marshall
>>
>> Michael Baessler wrote:
>>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>>> aNodeList, ResultSpecification resultSpec)  constructor was used 
>>> when the result was set by an application using the process method 
>>> with the resultSpec argument. In the current version it seems that 
>>> only the version with the precomputed FlowTable is used. But I can't 
>>> say if that is correct or not since I don't know the details about 
>>> the ResultSpec restructuring (maybe only Adam knows). But you are 
>>> right, if this constructor isn't necessary both, the code and the 
>>> constructor, can be removed.
>>>
>>> Seems that the architecture has changed here. :-)
>>>
>>> -- Michael
>>>
>>> Marshall Schor wrote:
>>>> If this is removed or if it is never called, then there is a 
>>>> section of the logic in CapabilityLanguageFlowObject which is never 
>>>> used, because mNodeList == null:
>>>>
>>>>
>>>> if (mNodeList != null) {
>>>>  //  80 or lines of code elided
>>>> }
>>>>
>>>> Can this logic be removed?
>>>>
>>>> -Marshall
>>>>
>>>> Marshall Schor wrote:
>>>>> The class CapabilityLanguageFlowObject has 2 defined constructors, 
>>>>> but one is never used/referenced:
>>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
>>>>> resultSpec)
>>>>>
>>>>> Can this be removed?
>>>>>
>>>>> -Marshall
>>>>>
>>>>>
>>>>
>>>
>>>
>>>
>>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
When looking at the tests for the capability language flow I see both 
tests one with the result spec argument in the process() method and one 
without.
In older UIMA versions, when using the debugger I see that both 
constructors are used there.

-- Michael

Marshall Schor wrote:
> Thanks.  I'll see about comparing the older method with the current 
> method, to verify this.  -Marshall
>
> Michael Baessler wrote:
>> In older UIMA versions the CapabilityLanguageFlowObject(List 
>> aNodeList, ResultSpecification resultSpec)  constructor was used when 
>> the result was set by an application using the process method with 
>> the resultSpec argument. In the current version it seems that only 
>> the version with the precomputed FlowTable is used. But I can't say 
>> if that is correct or not since I don't know the details about the 
>> ResultSpec restructuring (maybe only Adam knows). But you are right, 
>> if this constructor isn't necessary both, the code and the 
>> constructor, can be removed.
>>
>> Seems that the architecture has changed here. :-)
>>
>> -- Michael
>>
>> Marshall Schor wrote:
>>> If this is removed or if it is never called, then there is a section 
>>> of the logic in CapabilityLanguageFlowObject which is never used, 
>>> because mNodeList == null:
>>>
>>>
>>> if (mNodeList != null) {
>>>  //  80 or lines of code elided
>>> }
>>>
>>> Can this logic be removed?
>>>
>>> -Marshall
>>>
>>> Marshall Schor wrote:
>>>> The class CapabilityLanguageFlowObject has 2 defined constructors, 
>>>> but one is never used/referenced:
>>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
>>>> resultSpec)
>>>>
>>>> Can this be removed?
>>>>
>>>> -Marshall
>>>>
>>>>
>>>
>>
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Thanks.  I'll see about comparing the older method with the current 
method, to verify this.  -Marshall

Michael Baessler wrote:
> In older UIMA versions the CapabilityLanguageFlowObject(List 
> aNodeList, ResultSpecification resultSpec)  constructor was used when 
> the result was set by an application using the process method with the 
> resultSpec argument. In the current version it seems that only the 
> version with the precomputed FlowTable is used. But I can't say if 
> that is correct or not since I don't know the details about the 
> ResultSpec restructuring (maybe only Adam knows). But you are right, 
> if this constructor isn't necessary both, the code and the 
> constructor, can be removed.
>
> Seems that the architecture has changed here. :-)
>
> -- Michael
>
> Marshall Schor wrote:
>> If this is removed or if it is never called, then there is a section 
>> of the logic in CapabilityLanguageFlowObject which is never used, 
>> because mNodeList == null:
>>
>>
>> if (mNodeList != null) {
>>  //  80 or lines of code elided
>> }
>>
>> Can this logic be removed?
>>
>> -Marshall
>>
>> Marshall Schor wrote:
>>> The class CapabilityLanguageFlowObject has 2 defined constructors, 
>>> but one is never used/referenced:
>>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
>>> resultSpec)
>>>
>>> Can this be removed?
>>>
>>> -Marshall
>>>
>>>
>>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 23, 2008 8:06 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList,
> ResultSpecification resultSpec)  constructor was used when the result
> was set by an application using the process method with the resultSpec
> argument. In the current version it seems that only the version with the
> precomputed FlowTable is used. But I can't say if that is correct or not
> since I don't know the details about the ResultSpec restructuring (maybe
> only Adam knows).

I think no one knows exactly.  This area of the code grew somewhat
organically to address requirements over time. I don't think I ever
fully understood it how CapabilityLanguageFlow was implemented.  When
I was adding the custom flow controller in v2.0, I did my best to port
whatever behavior was there and make sure all the test cases passed.
It turned out we were missing some important test cases though, and
that's how we came around to adding the SimpleStepWithResultSpec class
in order to replicate the old behavior.  I think the key thing is to
make sure we have the right test cases in place to be sure we're
preserving backward compatibility, and then I'm all for having
Marshall clean up the code so it makes more sense.

  -Adam

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
In older UIMA versions the CapabilityLanguageFlowObject(List aNodeList, 
ResultSpecification resultSpec)  constructor was used when the result 
was set by an application using the process method with the resultSpec 
argument. In the current version it seems that only the version with the 
precomputed FlowTable is used. But I can't say if that is correct or not 
since I don't know the details about the ResultSpec restructuring (maybe 
only Adam knows). But you are right, if this constructor isn't necessary 
both, the code and the constructor, can be removed.

Seems that the architecture has changed here. :-)

-- Michael

Marshall Schor wrote:
> If this is removed or if it is never called, then there is a section 
> of the logic in CapabilityLanguageFlowObject which is never used, 
> because mNodeList == null:
>
>
> if (mNodeList != null) {
>  //  80 or lines of code elided
> }
>
> Can this logic be removed?
>
> -Marshall
>
> Marshall Schor wrote:
>> The class CapabilityLanguageFlowObject has 2 defined constructors, 
>> but one is never used/referenced:
>> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
>> resultSpec)
>>
>> Can this be removed?
>>
>> -Marshall
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
If this is removed or if it is never called, then there is a section of 
the logic in CapabilityLanguageFlowObject which is never used, because 
mNodeList == null:


if (mNodeList != null) {
  //  80 or lines of code elided
}

Can this logic be removed?

-Marshall

Marshall Schor wrote:
> The class CapabilityLanguageFlowObject has 2 defined constructors, but 
> one is never used/referenced:
> CapabilityLanguageFlowObject(List aNodeList, ResultSpecification 
> resultSpec)
>
> Can this be removed?
>
> -Marshall
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
The class CapabilityLanguageFlowObject has 2 defined constructors, but 
one is never used/referenced:
CapabilityLanguageFlowObject(List aNodeList, ResultSpecification resultSpec)

Can this be removed?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

Posted by Marshall Schor <ms...@schor.com>.
I'll fix the Javadocs to correspond to what the code does.  This will 
have the result that
   addResultFeature(1-feature, languages) will *add* to the existing 
languages, while
   addResultFeature(1-feature) will *replace* all existing languages 
with x-unspecified.

-Marshall


Marshall Schor wrote:
> I'm doing a redesign for the result spec area to improve performance.
>
> The basic idea is to put a hasBeenChanged flag into the result spec 
> object, and use it being "false" to enable users to avoid recomputing 
> things.
> Why not use "equal" ? because a single result spec object is shared 
> among multiple users, and when updated, the object is updated in place 
> (so there is no other object to compare it to).
> Looking at the ResultSpec object - it has a hashMap that stores the 
> Types and Features (TypeOrFeature objects) as the keys; the values are 
> hashSets holding languages for which these types and features are in 
> the result spec.  (There is a special hash set having just the entry 
> of the default language = UNSPECIFIED_LANGUAGE = "x-unspecified").
> I'm going to try and make the default language hash set a constant, 
> and create just one instance of it - this should improve performance, 
> especially when languages are not being used.
>
> There are 2 kinds of methods to add types/features to a result spec:  
> ones with language(s) and ones without.
>    The ones without reset any language spec associated with the type or
>    feature(s) to the UNSPECIFIED_LANGUAGE.
>
>    The ones with a language, sometimes "replace"  the language
>    associated with the type/feature, and other times, they "add" the
>    language (assuming the type/feature is already an entry in the
>    hashMap of types and features).
>
>    methods which are replacing any existing languages:
>
>        setResultTypesAndFeatures[array of TypeOrFeature)   << repl with
>        x-unspecified language
>        setResultTypesAndFeatures[array of TypeOrFeature, languages)  <<
>        repl with languages
>        addResultTypeOrFeature(1-TypeOrFeature)                << repl
>        with x-unspecified language
>        addResultTypeOrFeature(1-TypeOrFeature, languages) << repl with
>        languages
>        addResultType(String, boolean)    << repl with x-unspecified
>        language
>        addResultFeature(1-feature, languages)   << repl with
>        languagesx-unspecified
>
>    methods which are adding to existing languages:
>
>        addResultType(1-type, boolean, languages)  adds languages
>        addResultFeature(1-feature)  << adds x-unspecified
>
> The "set..." method essentially clears the result spec and sets it 
> with completely new information, so it is reasonable that it replaces 
> any existing language information.
>
> The addResult methods, when used to add a type or feature which 
> already present, are inconsistent - with one method adding, and the 
> others, replacing. This behavior is documented in the JavaDocs for the 
> class.
>
> The JavaDocs have the behavior for adding a Feature by name reversed 
> with the behavior for adding a Type by name.  In one case, including 
> the language is treated as a replace, in the other as an add.  This 
> seems likely a bug in the Javadocs. The code for the addResultFeature 
> is reversed from the Javadocs: the code will "add" languages if 
> specified, but "replaces" (with the x-unspecified) if languages are 
> not specified in the method call.
>
> Does anyone know what the "correct" behavior of these methods is 
> supposed to be?
>
> -Marshall
>
>
>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec - Question on ResultSpecification

Posted by Marshall Schor <ms...@schor.com>.
I'm doing a redesign for the result spec area to improve performance.

The basic idea is to put a hasBeenChanged flag into the result spec 
object, and use it being "false" to enable users to avoid recomputing 
things.
Why not use "equal" ? because a single result spec object is shared 
among multiple users, and when updated, the object is updated in place 
(so there is no other object to compare it to).
Looking at the ResultSpec object - it has a hashMap that stores the 
Types and Features (TypeOrFeature objects) as the keys; the values are 
hashSets holding languages for which these types and features are in the 
result spec.  (There is a special hash set having just the entry of the 
default language = UNSPECIFIED_LANGUAGE = "x-unspecified"). 

I'm going to try and make the default language hash set a constant, and 
create just one instance of it - this should improve performance, 
especially when languages are not being used.

There are 2 kinds of methods to add types/features to a result spec:  
ones with language(s) and ones without. 

    The ones without reset any language spec associated with the type or
    feature(s) to the UNSPECIFIED_LANGUAGE.

    The ones with a language, sometimes "replace"  the language
    associated with the type/feature, and other times, they "add" the
    language (assuming the type/feature is already an entry in the
    hashMap of types and features).

    methods which are replacing any existing languages:

        setResultTypesAndFeatures[array of TypeOrFeature)   << repl with
        x-unspecified language
        setResultTypesAndFeatures[array of TypeOrFeature, languages)  <<
        repl with languages
        addResultTypeOrFeature(1-TypeOrFeature)                << repl
        with x-unspecified language
        addResultTypeOrFeature(1-TypeOrFeature, languages) << repl with
        languages
        addResultType(String, boolean)    << repl with x-unspecified
        language
        addResultFeature(1-feature, languages)   << repl with
        languagesx-unspecified

    methods which are adding to existing languages:

        addResultType(1-type, boolean, languages)  adds languages
        addResultFeature(1-feature)  << adds x-unspecified

The "set..." method essentially clears the result spec and sets it with 
completely new information, so it is reasonable that it replaces any 
existing language information.

The addResult methods, when used to add a type or feature which already 
present, are inconsistent - with one method adding, and the others, 
replacing. This behavior is documented in the JavaDocs for the class.

The JavaDocs have the behavior for adding a Feature by name reversed 
with the behavior for adding a Type by name.  In one case, including the 
language is treated as a replace, in the other as an add.  This seems 
likely a bug in the Javadocs. The code for the addResultFeature is 
reversed from the Javadocs: the code will "add" languages if specified, 
but "replaces" (with the x-unspecified) if languages are not specified 
in the method call.

Does anyone know what the "correct" behavior of these methods is 
supposed to be?

-Marshall



 

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Ok, I didn't follow that... so fine with me to do the change!

-- Michael

Marshall Schor wrote:
> I may have missed something - I don't see what would need to be added 
> to the ResultSpecification class.  The method 
> hasOutputTypeOrFeature(...) is always called with doFuzzySearch== 
> true, which is how the containsType or containsFeature methods operate 
> (always) in the Result Specification class.
>
> Is there some other difference I'm missing?
>
> -Marshall
>
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Can I replace the class CapabilityContainer with the much more 
>>> efficient (now) ResultSpecification class?
>>>
>>> It seems to me they do the almost same thing, and the 
>>> ResultSpecification may be handling the corner cases better.
>>>
>>> Is there some subtle difference I'm missing?  It would be nice to 
>>> eliminate a class -
>>> smaller code base => less maintenance effort in the future :-)
>>>
>>> -Marshall
>> Yes, if it is possible to add the missing functionality to the 
>> ResultSpecification class, fine with me.
>> For example the important method - 
>> hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, 
>> doFuzzySearch) is currently
>> not available at the ResultSpecification class.
>>
>> -- Michael
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
I may have missed something - I don't see what would need to be added to 
the ResultSpecification class.  The method hasOutputTypeOrFeature(...) 
is always called with doFuzzySearch== true, which is how the 
containsType or containsFeature methods operate (always) in the Result 
Specification class.

Is there some other difference I'm missing?

-Marshall

Michael Baessler wrote:
> Marshall Schor wrote:
>> Can I replace the class CapabilityContainer with the much more 
>> efficient (now) ResultSpecification class?
>>
>> It seems to me they do the almost same thing, and the 
>> ResultSpecification may be handling the corner cases better.
>>
>> Is there some subtle difference I'm missing?  It would be nice to 
>> eliminate a class -
>> smaller code base => less maintenance effort in the future :-)
>>
>> -Marshall
> Yes, if it is possible to add the missing functionality to the 
> ResultSpecification class, fine with me.
> For example the important method - 
> hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, 
> doFuzzySearch) is currently
> not available at the ResultSpecification class.
>
> -- Michael
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Can I replace the class CapabilityContainer with the much more 
> efficient (now) ResultSpecification class?
>
> It seems to me they do the almost same thing, and the 
> ResultSpecification may be handling the corner cases better.
>
> Is there some subtle difference I'm missing?  It would be nice to 
> eliminate a class -
> smaller code base => less maintenance effort in the future :-)
>
> -Marshall
Yes, if it is possible to add the missing functionality to the 
ResultSpecification class, fine with me.
For example the important method - 
hasOutputTypeOrFeature(ouputCapabilitie, documentLanguage, 
doFuzzySearch) is currently
not available at the ResultSpecification class.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Yes, that is correct!

- Michael

Marshall Schor wrote:
> While experimenting with this approach, I found some tests wouldn't 
> run.  (By the way, the test cases are great - they have been a great 
> help :-) ).
>
> Here's a case I'm want to be sure I understand:
>
> Let's suppose that the aggregate says it produces type Foo with 
> language x-unspecified.
>
> Let's suppose there are 2 annotators in the flow:  the first one 
> produces Foo with language "en", the 2nd one produces Foo with 
> language "x-unspecified".  A flow given language "x-unspecified" 
> should run the 2nd annotator, skipping the first one.  (This is how it 
> works now).
>
> =======
>
> Here's another similar case, using the other language subsumption 
> between "en-us" and "en".
>
> Let's suppose that the aggregate says it produces type Foo with 
> language "en".
>
> Let's suppose there are 2 annotators in the flow:  the first one 
> produces Foo with language "en-us", the 2nd one produces Foo with 
> language "en".  A flow given language "en" should run the 2nd 
> annotator, skipping the first one. (This is how it works now, I think).
>
> With this explanation, I see there is a modification to the result 
> spec's containsType/Feature method with a language argument needed for 
> this use. Currently, the ResultSpecification matching works like this:
>  Language arg     RsltSpc     Matches
>   "en"            "en-us"       no
>   "en-us"         "en"          yes
>   "x-unspecified" *any*         yes    <<< behavior needs to be different
>   "en"            "x-unsp.."    yes
>
> Is this correct?
>
> -Marshall
>
> Marshall Schor wrote:
>> Can I replace the class CapabilityContainer with the much more 
>> efficient (now) ResultSpecification class?
>>
>> It seems to me they do the almost same thing, and the 
>> ResultSpecification may be handling the corner cases better.
>>
>> Is there some subtle difference I'm missing?  It would be nice to 
>> eliminate a class -
>> smaller code base => less maintenance effort in the future :-)
>>
>> -Marshall
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
I went back and checked the Javadocs for the ResultSpecification, prior 
to my reworking of it.  I think I treated the x-unspecified slightly 
wrong, and if I had done it right, then the anomaly noted in the 
previous note (below) would not be there.

The previous Javadocs all say that the setters for a typeOrFeature 
without a language argument, are equivalent to passing in the 
x-unspecified language.   The method containsType/Feature(foo, 
"x-unspecified") should be made to return true only if the Result 
specification for this contained x-unspecified.  It might not, if, for 
instance, the setting for Foo was only for languages "en" and "de". 

A consequence of making it work this way is the following:

   containsType(foo, "x-unspecified") will return "false" if foo is in 
the result spec
   only for particular languages.

    and the containsType(foo)  <<< no language argument
    would also return "false", if foo is in the result spec
    only for particular languages.

I plan correct the treatment of x-unspecified, along these lines, to 
work as described above.
Please post any concerns/objections :-)

-Marshall

Marshall Schor wrote:
> While experimenting with this approach, I found some tests wouldn't 
> run.  (By the way, the test cases are great - they have been a great 
> help :-) ).
>
> Here's a case I'm want to be sure I understand:
>
> Let's suppose that the aggregate says it produces type Foo with 
> language x-unspecified.
>
> Let's suppose there are 2 annotators in the flow:  the first one 
> produces Foo with language "en", the 2nd one produces Foo with 
> language "x-unspecified".  A flow given language "x-unspecified" 
> should run the 2nd annotator, skipping the first one.  (This is how it 
> works now).
>
> =======
>
> Here's another similar case, using the other language subsumption 
> between "en-us" and "en".
>
> Let's suppose that the aggregate says it produces type Foo with 
> language "en".
>
> Let's suppose there are 2 annotators in the flow:  the first one 
> produces Foo with language "en-us", the 2nd one produces Foo with 
> language "en".  A flow given language "en" should run the 2nd 
> annotator, skipping the first one. (This is how it works now, I think).
>
> With this explanation, I see there is a modification to the result 
> spec's containsType/Feature method with a language argument needed for 
> this use. Currently, the ResultSpecification matching works like this:
>  Language arg     RsltSpc     Matches
>   "en"            "en-us"       no
>   "en-us"         "en"          yes
>   "x-unspecified" *any*         yes    <<< behavior needs to be different
>   "en"            "x-unsp.."    yes
>
> Is this correct?
>
> -Marshall
>
> Marshall Schor wrote:
>> Can I replace the class CapabilityContainer with the much more 
>> efficient (now) ResultSpecification class?
>>
>> It seems to me they do the almost same thing, and the 
>> ResultSpecification may be handling the corner cases better.
>>
>> Is there some subtle difference I'm missing?  It would be nice to 
>> eliminate a class -
>> smaller code base => less maintenance effort in the future :-)
>>
>> -Marshall
>>
>>
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
While experimenting with this approach, I found some tests wouldn't 
run.  (By the way, the test cases are great - they have been a great 
help :-) ).

Here's a case I'm want to be sure I understand:

Let's suppose that the aggregate says it produces type Foo with language 
x-unspecified.

Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language "en", the 2nd one produces Foo with language 
"x-unspecified".  A flow given language "x-unspecified" should run the 
2nd annotator, skipping the first one.  (This is how it works now).

=======

Here's another similar case, using the other language subsumption 
between "en-us" and "en".

Let's suppose that the aggregate says it produces type Foo with language 
"en".

Let's suppose there are 2 annotators in the flow:  the first one 
produces Foo with language "en-us", the 2nd one produces Foo with 
language "en".  A flow given language "en" should run the 2nd annotator, 
skipping the first one. (This is how it works now, I think).

With this explanation, I see there is a modification to the result 
spec's containsType/Feature method with a language argument needed for 
this use. 
Currently, the ResultSpecification matching works like this:
  Language arg     RsltSpc     Matches
   "en"            "en-us"       no
   "en-us"         "en"          yes
   "x-unspecified" *any*         yes    <<< behavior needs to be different
   "en"            "x-unsp.."    yes

Is this correct?

-Marshall

Marshall Schor wrote:
> Can I replace the class CapabilityContainer with the much more 
> efficient (now) ResultSpecification class?
>
> It seems to me they do the almost same thing, and the 
> ResultSpecification may be handling the corner cases better.
>
> Is there some subtle difference I'm missing?  It would be nice to 
> eliminate a class -
> smaller code base => less maintenance effort in the future :-)
>
> -Marshall
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Can I replace the class CapabilityContainer with the much more efficient 
(now) ResultSpecification class?

It seems to me they do the almost same thing, and the 
ResultSpecification may be handling the corner cases better.

Is there some subtle difference I'm missing?  It would be nice to 
eliminate a class -
smaller code base => less maintenance effort in the future :-)

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Yes, I think so. This test dumps the result spec for each AE to a file 
to check if it was computed correctly.
The computation of the result spec is done during the initialization of 
the aggregate AE when the capability language flow is created.
The precomputed result spec can later be used in the document 
processing, but this is currently not used. It is recomputed each time.

For the my simple performance test I removed the second computation that 
is done during runtime processing
( PrimitiveAnalysisEngine_impl.java: protected ResultSpecification 
computeAnalysisComponentResultSpec() ). So the original computed result 
spec is used.
But we cannot remove this code completely since it can happen that a 
result spec is provided by the application and it must be recomputed 
dynamically.

-- Michael

Marshall Schor wrote:
> Michael -
>
> I'm confused about how this test is setup.  The test descriptor this 
> code uses loads an aggregate, and then runs a process method which 
> ends up calling some dummy process method called 
> SequencerTestAnnotator.  This process method dumps (to a file) the 
> result spec.  Is that the case you're running?
>
> How do you turn on and off the (re)computation of the result spec?
>
> -Marshall
>
> Michael Baessler wrote:
>> Michael Baessler wrote:
>>> Adam Lally wrote:
>>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>>>> wrote:
>>>>  
>>>>> I tried to figure out how the ResultSpecification handling in 
>>>>> uima-core
>>>>> works with all side effects to check how it can be done
>>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>>> to, there are to much open questions where I don't know
>>>>> exactly if it is right in any case ... :-(
>>>>>
>>>>> Adam can you please look at this issue?
>>>>>
>>>>>     
>>>>
>>>> I can try to take a look, but I don't have a lot of time.  Do you have
>>>> a test case for this, where you expect I would see a significant
>>>> performance improvement if I fix this?
>>>>   
>>> Sorry I have to performance test case. I checked my assumption using 
>>> the debugger.
>>>
>>> I used the following main() with a loop over the process call to 
>>> check if the result spec is recomputed each time.
>>> The descriptor is the same as used in the capabilityLanguageFlow 
>>> test case of the uimaj-core project.
>>> Maybe a sysout helps to detect if the unnecessary calls are done or 
>>> not.
>>>
>>> Maybe when iterating more than 10 times will give you performance 
>>> numbers before and after. Maybe adding additional capabilities
>>> that must be analyzed will increase the time used to compute the 
>>> result spec. I will look at this tomorrow.
>>>
>>>  public static void main(String[] args) {
>>>
>>>      AnalysisEngine ae = null;
>>>      try {
>>>
>>>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>>
>>>         XMLInputSource in = new 
>>> XMLInputSource(JUnitExtension.getFile(desc));
>>>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>>               .parseResourceSpecifier(in);
>>>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, 
>>> null);
>>>         CAS cas = ae.newCAS();
>>>         String text = "Hello world!";
>>>         cas.setDocumentText(text);
>>>         cas.setDocumentLanguage("en");
>>>         for (int i = 0; i < 10; i++) {
>>>            ae.process(cas);
>>>         }
>>>      } catch (Exception ex) {
>>>         ex.printStackTrace();
>>>      }
>>>   }
>>>
>>> -- Michael
>> When setting the loop counter to 1000 I have 6000ms without 
>> recomputing the result spec and
>> 27000ms when recomputing the result spec. I think this should be 
>> sufficient for testing.
>>
>> -- Michael
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Michael -

I'm confused about how this test is setup.  The test descriptor this 
code uses loads an aggregate, and then runs a process method which ends 
up calling some dummy process method called SequencerTestAnnotator.  
This process method dumps (to a file) the result spec.  Is that the case 
you're running?

How do you turn on and off the (re)computation of the result spec?

-Marshall

Michael Baessler wrote:
> Michael Baessler wrote:
>> Adam Lally wrote:
>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>>> wrote:
>>>  
>>>> I tried to figure out how the ResultSpecification handling in 
>>>> uima-core
>>>> works with all side effects to check how it can be done
>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>> to, there are to much open questions where I don't know
>>>> exactly if it is right in any case ... :-(
>>>>
>>>> Adam can you please look at this issue?
>>>>
>>>>     
>>>
>>> I can try to take a look, but I don't have a lot of time.  Do you have
>>> a test case for this, where you expect I would see a significant
>>> performance improvement if I fix this?
>>>   
>> Sorry I have to performance test case. I checked my assumption using 
>> the debugger.
>>
>> I used the following main() with a loop over the process call to 
>> check if the result spec is recomputed each time.
>> The descriptor is the same as used in the capabilityLanguageFlow test 
>> case of the uimaj-core project.
>> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>>
>> Maybe when iterating more than 10 times will give you performance 
>> numbers before and after. Maybe adding additional capabilities
>> that must be analyzed will increase the time used to compute the 
>> result spec. I will look at this tomorrow.
>>
>>  public static void main(String[] args) {
>>
>>      AnalysisEngine ae = null;
>>      try {
>>
>>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>
>>         XMLInputSource in = new 
>> XMLInputSource(JUnitExtension.getFile(desc));
>>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>               .parseResourceSpecifier(in);
>>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>>         CAS cas = ae.newCAS();
>>         String text = "Hello world!";
>>         cas.setDocumentText(text);
>>         cas.setDocumentLanguage("en");
>>         for (int i = 0; i < 10; i++) {
>>            ae.process(cas);
>>         }
>>      } catch (Exception ex) {
>>         ex.printStackTrace();
>>      }
>>   }
>>
>> -- Michael
> When setting the loop counter to 1000 I have 6000ms without 
> recomputing the result spec and
> 27000ms when recomputing the result spec. I think this should be 
> sufficient for testing.
>
> -- Michael
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> I think my change is ready for code review.  I kept all the 
> idiosyncratic behavior of the old code, so users should not notice any 
> difference.  All the tests run, and test case above runs at the 6000ms 
> range.
> There are 3 areas changed:
> 1) ResultSpecification_impl is restructured for speed and smaller 
> memory footprint
> 2) The "compiling" of this is deferred till the latest possible point; 
> operations that can be done with the uncompiled form are done that way.
> 3) The code in the CapabilityLanguageFlow where it returns a next step 
> now caches the result spec by component key, and only sends it down if 
> it is different from what this controller sent the last time in 
> invoked this component in the flow.
> This test depends on the precomputed result specs kept in the mTable 
> variable being constant - which I believe they are (once they are 
> computed) - but Michael -can you confirm this?
Yes the mTable variable contains the precomputed result specs for 
sequence engines. These result specs are constant and do not change 
during the processing. The computation is done based on the output types 
of the aggregate that defines the capabilityLanguageFlow. If the result 
spec is passed in by the process method, the precomputed mTable cannot 
be used since then results that should be may be different from the 
aggregate output types.
> With this change, the code in the framework to "intersect" the result 
> spec with a component's output capabilities, by language, is not 
> redone on every call, but only when the language changes.  That code 
> (to do the intersection) is running faster, in any case, due to the 
> restructuring.
>
> Because this is a big change it would be good to do a code review of 
> some kind - any thoughts on how to do this?
I hoped that Adam could look at this, since he know the code best from 
my point of view. All the capabilityLanguageFlow related items has been 
discussed already on the list in detail and I think now we also have 
some good tests for this.
If the code is checked in I can run again my performance tests to check 
the performance improvements.

Opinions?

-- Michael


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Michael Baessler wrote:
> Michael Baessler wrote:
>> Adam Lally wrote:
>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>>> wrote:
>>>  
>>>> I tried to figure out how the ResultSpecification handling in 
>>>> uima-core
>>>> works with all side effects to check how it can be done
>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>> to, there are to much open questions where I don't know
>>>> exactly if it is right in any case ... :-(
>>>>
>>>> Adam can you please look at this issue?
>>>>
>>>>     
>>>
>>> I can try to take a look, but I don't have a lot of time.  Do you have
>>> a test case for this, where you expect I would see a significant
>>> performance improvement if I fix this?
>>>   
>> Sorry I have to performance test case. I checked my assumption using 
>> the debugger.
>>
>> I used the following main() with a loop over the process call to 
>> check if the result spec is recomputed each time.
>> The descriptor is the same as used in the capabilityLanguageFlow test 
>> case of the uimaj-core project.
>> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>>
>> Maybe when iterating more than 10 times will give you performance 
>> numbers before and after. Maybe adding additional capabilities
>> that must be analyzed will increase the time used to compute the 
>> result spec. I will look at this tomorrow.
>>
>>  public static void main(String[] args) {
>>
>>      AnalysisEngine ae = null;
>>      try {
>>
>>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>
>>         XMLInputSource in = new 
>> XMLInputSource(JUnitExtension.getFile(desc));
>>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>               .parseResourceSpecifier(in);
>>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>>         CAS cas = ae.newCAS();
>>         String text = "Hello world!";
>>         cas.setDocumentText(text);
>>         cas.setDocumentLanguage("en");
>>         for (int i = 0; i < 10; i++) {
>>            ae.process(cas);
>>         }
>>      } catch (Exception ex) {
>>         ex.printStackTrace();
>>      }
>>   }
>>
>> -- Michael
> When setting the loop counter to 1000 I have 6000ms without 
> recomputing the result spec and
> 27000ms when recomputing the result spec. I think this should be 
> sufficient for testing.
I think my change is ready for code review.  I kept all the 
idiosyncratic behavior of the old code, so users should not notice any 
difference.  All the tests run, and test case above runs at the 6000ms 
range. 

There are 3 areas changed:
1) ResultSpecification_impl is restructured for speed and smaller memory 
footprint
2) The "compiling" of this is deferred till the latest possible point; 
operations that can be done with the uncompiled form are done that way.
3) The code in the CapabilityLanguageFlow where it returns a next step 
now caches the result spec by component key, and only sends it down if 
it is different from what this controller sent the last time in invoked 
this component in the flow. 

This test depends on the precomputed result specs kept in the mTable 
variable being constant - which I believe they are (once they are 
computed) - but Michael -can you confirm this? 

With this change, the code in the framework to "intersect" the result 
spec with a component's output capabilities, by language, is not redone 
on every call, but only when the language changes.  That code (to do the 
intersection) is running faster, in any case, due to the restructuring.

Because this is a big change it would be good to do a code review of 
some kind - any thoughts on how to do this?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Michael Baessler wrote:
> Adam Lally wrote:
>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>> wrote:
>>  
>>> I tried to figure out how the ResultSpecification handling in uima-core
>>> works with all side effects to check how it can be done
>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>> to, there are to much open questions where I don't know
>>> exactly if it is right in any case ... :-(
>>>
>>> Adam can you please look at this issue?
>>>
>>>     
>>
>> I can try to take a look, but I don't have a lot of time.  Do you have
>> a test case for this, where you expect I would see a significant
>> performance improvement if I fix this?
>>   
> Sorry I have to performance test case. I checked my assumption using 
> the debugger.
>
> I used the following main() with a loop over the process call to check 
> if the result spec is recomputed each time.
> The descriptor is the same as used in the capabilityLanguageFlow test 
> case of the uimaj-core project.
> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>
> Maybe when iterating more than 10 times will give you performance 
> numbers before and after. Maybe adding additional capabilities
> that must be analyzed will increase the time used to compute the 
> result spec. I will look at this tomorrow.
>
>  public static void main(String[] args) {
>
>      AnalysisEngine ae = null;
>      try {
>
>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>
>         XMLInputSource in = new 
> XMLInputSource(JUnitExtension.getFile(desc));
>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>               .parseResourceSpecifier(in);
>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>         CAS cas = ae.newCAS();
>         String text = "Hello world!";
>         cas.setDocumentText(text);
>         cas.setDocumentLanguage("en");
>         for (int i = 0; i < 10; i++) {
>            ae.process(cas);
>         }
>      } catch (Exception ex) {
>         ex.printStackTrace();
>      }
>   }
>
> -- Michael
When setting the loop counter to 1000 I have 6000ms without recomputing 
the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>   
>> I tried to figure out how the ResultSpecification handling in uima-core
>> works with all side effects to check how it can be done
>> to detect when a ResultSpec has changed. Unfortunately I was not able
>> to, there are to much open questions where I don't know
>> exactly if it is right in any case ... :-(
>>
>> Adam can you please look at this issue?
>>
>>     
>
> I can try to take a look, but I don't have a lot of time.  Do you have
> a test case for this, where you expect I would see a significant
> performance improvement if I fix this?
>   
Sorry I have to performance test case. I checked my assumption using the 
debugger.

I used the following main() with a loop over the process call to check 
if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.
Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the result 
spec. I will look at this tomorrow.

  public static void main(String[] args) {

      AnalysisEngine ae = null;
      try {

         String desc = "SequencerCapabilityLanguageAggregateES.xml";

         XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));
         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
               .parseResourceSpecifier(in);
         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
         CAS cas = ae.newCAS();
         String text = "Hello world!";
         cas.setDocumentText(text);
         cas.setDocumentLanguage("en");
         for (int i = 0; i < 10; i++) {
            ae.process(cas);
         }
      } catch (Exception ex) {
         ex.printStackTrace();
      }
   }

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> I tried to figure out how the ResultSpecification handling in uima-core
> works with all side effects to check how it can be done
> to detect when a ResultSpec has changed. Unfortunately I was not able
> to, there are to much open questions where I don't know
> exactly if it is right in any case ... :-(
>
> Adam can you please look at this issue?
>

I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?

-Adam

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Dec 18, 2007 8:55 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>   
>> Hi,
>> I got the request on my table that the computation of the result spec
>> for the capabilityLanguageFlow takes to much time.
>> I looked at the code and found something interesting... maybe I'm wrong,
>> I'm not sure.
>>
>> When looking at the ASB_impl.java at processUntilNextOutputCas() I found
>> the following:
>>
>>                //check if we have to set result spec, to support
>> capability language flow
>>                 if (nextStep instanceof SimpleStepWithResultSpec) {
>>                   ResultSpecification rs =
>> ((SimpleStepWithResultSpec)nextStep).getResultSpecification();
>>                   if (rs != null) {
>>                     nextAe.setResultSpecification(rs);
>>                   }
>>                 }
>>                 // invoke next AE in flow
>>                 CasIterator casIter = null;
>>                 CAS outputCas = null; //used if the AE we call outputs a
>> new CAS
>>                 try {
>>                   casIter = nextAe.processAndOutputNewCASes(cas);
>>
>> When a capabilityLanguageFlow is used, the ResultSpec for the flow
>> engines are precomputed if possible. The code above takes this
>> precomputed ResultSpec from the flow node and set it for the current AE.
>>
>> When I go deeper to
>>
>>      casIter = nextAe.processAndOutputNewCASes(cas);
>>
>> I found in the PrimitiveAnalysisEngine_impl.java class in the
>> callAnalysisComponentProcess() method the following:
>>
>>         if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
>>           mLastTypeSystem = view.getTypeSystem();
>>           mCurrentResultSpecification.compile(mLastTypeSystem);
>>           // the actual ResultSpec we send to the component is formed by
>>           // looking at this primitive AE's declared output types and
>> eliminiating
>>           // any that are not in mCurrentResultSpecification.
>>           ResultSpecification analysisComponentResultSpec =
>> computeAnalysisComponentResultSpec(
>>                   mCurrentResultSpecification,
>> getAnalysisEngineMetaData().getCapabilities());
>>           // compile result spec - necessary to get type subsumption to
>> work properly
>>           analysisComponentResultSpec.compile(mLastTypeSystem);
>>
>> mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
>>           mResultSpecChanged = false;
>>         }
>>
>> any time when the ResultSpec changed, the ResultSpec is recomputed. But
>> the ResultSpec is changed any time when setResultSpecification() is called.
>> So what does this mean. The first code fragment in the email shows how
>> to get the ResultSpec from the flow controller and set it on the AE.
>> - So the result spec changed - The second code fragment shows what is
>> executed if the ResultSpec has been changed and how it is recomputed.
>> This means that the ResultSpec is recomputed each time process is
>> called. I don't think this is necessary.
>>
>>     
>
> That seems like a good analysis of the situation.  I think what we
> need is to detect when the ResultSpecification has actually changed
> and when it hasn't.  That might be tricky to do right.  If we just
> check if the new ResultSpecification is == to the existing
> ResultSpecification, that wouldn't work if the ResultSpecification had
> been modified (it would be == but the contents wouldn't be the same).
> Perhaps we could add a dirty flag to the ResultSpecification to catch
> this.
I tried to figure out how the ResultSpecification handling in uima-core 
works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able 
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?

Thanks Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.
On Dec 18, 2007 8:55 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> Hi,
> I got the request on my table that the computation of the result spec
> for the capabilityLanguageFlow takes to much time.
> I looked at the code and found something interesting... maybe I'm wrong,
> I'm not sure.
>
> When looking at the ASB_impl.java at processUntilNextOutputCas() I found
> the following:
>
>                //check if we have to set result spec, to support
> capability language flow
>                 if (nextStep instanceof SimpleStepWithResultSpec) {
>                   ResultSpecification rs =
> ((SimpleStepWithResultSpec)nextStep).getResultSpecification();
>                   if (rs != null) {
>                     nextAe.setResultSpecification(rs);
>                   }
>                 }
>                 // invoke next AE in flow
>                 CasIterator casIter = null;
>                 CAS outputCas = null; //used if the AE we call outputs a
> new CAS
>                 try {
>                   casIter = nextAe.processAndOutputNewCASes(cas);
>
> When a capabilityLanguageFlow is used, the ResultSpec for the flow
> engines are precomputed if possible. The code above takes this
> precomputed ResultSpec from the flow node and set it for the current AE.
>
> When I go deeper to
>
>      casIter = nextAe.processAndOutputNewCASes(cas);
>
> I found in the PrimitiveAnalysisEngine_impl.java class in the
> callAnalysisComponentProcess() method the following:
>
>         if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
>           mLastTypeSystem = view.getTypeSystem();
>           mCurrentResultSpecification.compile(mLastTypeSystem);
>           // the actual ResultSpec we send to the component is formed by
>           // looking at this primitive AE's declared output types and
> eliminiating
>           // any that are not in mCurrentResultSpecification.
>           ResultSpecification analysisComponentResultSpec =
> computeAnalysisComponentResultSpec(
>                   mCurrentResultSpecification,
> getAnalysisEngineMetaData().getCapabilities());
>           // compile result spec - necessary to get type subsumption to
> work properly
>           analysisComponentResultSpec.compile(mLastTypeSystem);
>
> mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
>           mResultSpecChanged = false;
>         }
>
> any time when the ResultSpec changed, the ResultSpec is recomputed. But
> the ResultSpec is changed any time when setResultSpecification() is called.
> So what does this mean. The first code fragment in the email shows how
> to get the ResultSpec from the flow controller and set it on the AE.
> - So the result spec changed - The second code fragment shows what is
> executed if the ResultSpec has been changed and how it is recomputed.
> This means that the ResultSpec is recomputed each time process is
> called. I don't think this is necessary.
>

That seems like a good analysis of the situation.  I think what we
need is to detect when the ResultSpecification has actually changed
and when it hasn't.  That might be tricky to do right.  If we just
check if the new ResultSpecification is == to the existing
ResultSpecification, that wouldn't work if the ResultSpecification had
been modified (it would be == but the contents wouldn't be the same).
Perhaps we could add a dirty flag to the ResultSpecification to catch
this.

> Beyond that it seems to me that the ResultsSpec
>        mCurrentResultSpecification
> and the computed ResultSpec
>        analysisComponentResultSpec
> have the same content.
>

Not in all cases.  The computeAnalysisComponentResultSpec() method
does an intersection of the ResultSpec with the component's output
capabilities.  I suppose with CapabilityLanguageFlow, it would never
output any type that's not in the component's output capabilities.
However think of the case of a nested aggregate where
CapabilityLanguageFlow is used in the outermost aggregate.  This would
cause setResultSpecification to be called on the sub-aggregate.  That
in turn causes the ResultSpecificaiton for each annotator to be
computed by the intersection of the sub-aggregate's
ResultSpecification with that annotator's output capabilities.

-Adam

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
LeHouillier, Frank D. wrote:
> While making this change wouldn't affect us in any way as I can see now,
> it would still be possible to use the Features in the Result Spec in a
> similar way.  
>
> Suppose you have an information extraction component that extracts
> entities with attributes and you want to control which attributes are
> actually being added to the CAS with the Result Spec.  You might have
> type Person, with a range of features such as Address, Phone number,
> Age, etc. some of which you want to output in a given configuration and
> others not.  Suppose the information extraction component also extracts
> attributes which are so useless that you don't include them as features
> in the type system at all such as an internal id number.  Currently,
> with a compiled Result Spec you could have the annotator look up the
> feature on the basis of the name of the feature and then you could
> reliably instantiate the feature without further ado.  After your
> change, the feature would have to be checked to see if it actually
> exists.  
>   
We added code in the actual change that now checks to see if the feature 
actually exists (for a "compiled" Result Spec).  I thought it was better 
to preserve the status quo here, rather than remove this check (for 
performance reasons).  It didn't seem like it would have any measurable 
performance impact - it's one hash table lookup, basically.

Cheers. -Marshall
> Again, this doesn't seem like it is that big a deal to me but I thought
> I might just point out that it might have a use case.  In practice, it
> seems to me that most annotators figure out the features available
> either during compilation by using the JCas or during the initialization
> of the Annotator.  
>
> -----Original Message-----
> From: Marshall Schor [mailto:msa@schor.com] 
> Sent: Friday, January 25, 2008 3:57 PM
> To: uima-dev@incubator.apache.org
> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>
> LeHouillier, Frank D. wrote:
>   
>> We have an annotator that wraps a black box information extraction
>> component that can return objects of a variety of types.  We check the
>> result specification to see if the object is something we want to
>>     
> output
>   
>> based the actual string of the name of the type.  If you take away the
>> compiled version of the ResultSpecification then we will have to also
>> check whether the type that we get back from the type system is null
>>     
> or
>   
>> not.  
>>     
> Hi Frank -
>
> This change would *not* take away the compiled version of the Result 
> Spec.  It would only change 1 behavior - that of returning "true" if a 
> *feature* (not a type, as in your example above) was associated with a 
> type where the capability was marked "allAnnotatorFeatures", even if the
>
> Feature didn't exist.
>
> Suppose you had a type T1, and a type T2 whose super-type was T1, and 
> features T1:f1 T2:f2, with an output capability = T1 with 
> allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
> feature T3:f3,  and the output capability including T3 with 
> allAnnotatorFeatures = false
>
>
> Here's the current behavior:
>
> Before compile:  The following would all return true except as marked:
>    containsType(T1)
>    containsType(T2)  << returns false, T2 not in output capability, and 
> before compile, T2 isn't recognized as a subtype of T1
>    containsType(T2:f2)  << returns false, not in output, etc.
>    containsFeature(T1:f1)
>    containsFeature(T1:asdfasdfasdfasdf) <<< yes... that's what it does -
>
> it ignores the actual feature name because allAnnotatorFeatures is true
>
> After compile the following return true except as marked:
>    containsType(T1)
>    containsType(T2)  << T2 not in output capability, but is recognized 
> as a subtype of T1
>    containsType(T2:f2)  << T1's *allAnnotatorFeatures* is "inherited"
>    containsFeature(T1:f1)
>    containsFeature(T1:asdfasdfasdfasdf) << false: the actual features 
> are looked up
>   
> After the change I'm proposing, everything would be same except that
>    containsFeature(T1:asdfasdfasdfasdf) would return true.
>
> I don't think this would affect the way you are using result specs, but 
> please let me know if I've misunderstood something.  We don't want to 
> impact users with this change.
>
> Thanks for your comments :-)
>
> -Marshall
>   
>> -----Original Message-----
>> From: Marshall Schor [mailto:msa@schor.com] 
>> Sent: Friday, January 25, 2008 5:06 AM
>> To: uima-dev@incubator.apache.org
>> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>>
>> The implementation for checking if a feature is in the result spec
>>     
> does 
>   
>> the following:
>>
>> If the result-spec is not "compiled", it says the feature is present
>>     
> if 
>   
>> it specifically put in, or if its type has the allAnnotatorFeatures
>>     
> flag
>   
>> set.
>>
>> If the result-spec is "compiled", it says the feature is present if it
>>     
>
>   
>> is specifically put in, or if its type has the allAnnotatorFeatures
>>     
> flag
>   
>> set and the feature exists in the type system.
>>
>> For performance / space reasons, I'd like to drop the 2nd case; this 
>> would have the consequence of changing the result spec to return true 
>> for features not in the type system where the type had the 
>> allAnnotatorFeatures flag set.  This case shouldn't come up in
>>     
> practice 
>   
>> because I can't think of good reason an annotator would ask if a
>>     
> feature
>   
>> not in its type system was present. 
>>
>> Any objections?
>>
>> -Marshall
>>
>>
>>   
>>     
>
>
>
>   


RE: capabilityLangugaeFlow - computeResultSpec

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.
While making this change wouldn't affect us in any way as I can see now,
it would still be possible to use the Features in the Result Spec in a
similar way.  

Suppose you have an information extraction component that extracts
entities with attributes and you want to control which attributes are
actually being added to the CAS with the Result Spec.  You might have
type Person, with a range of features such as Address, Phone number,
Age, etc. some of which you want to output in a given configuration and
others not.  Suppose the information extraction component also extracts
attributes which are so useless that you don't include them as features
in the type system at all such as an internal id number.  Currently,
with a compiled Result Spec you could have the annotator look up the
feature on the basis of the name of the feature and then you could
reliably instantiate the feature without further ado.  After your
change, the feature would have to be checked to see if it actually
exists.  

Again, this doesn't seem like it is that big a deal to me but I thought
I might just point out that it might have a use case.  In practice, it
seems to me that most annotators figure out the features available
either during compilation by using the JCas or during the initialization
of the Annotator.  

-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com] 
Sent: Friday, January 25, 2008 3:57 PM
To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

LeHouillier, Frank D. wrote:
> We have an annotator that wraps a black box information extraction
> component that can return objects of a variety of types.  We check the
> result specification to see if the object is something we want to
output
> based the actual string of the name of the type.  If you take away the
> compiled version of the ResultSpecification then we will have to also
> check whether the type that we get back from the type system is null
or
> not.  
Hi Frank -

This change would *not* take away the compiled version of the Result 
Spec.  It would only change 1 behavior - that of returning "true" if a 
*feature* (not a type, as in your example above) was associated with a 
type where the capability was marked "allAnnotatorFeatures", even if the

Feature didn't exist.

Suppose you had a type T1, and a type T2 whose super-type was T1, and 
features T1:f1 T2:f2, with an output capability = T1 with 
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
feature T3:f3,  and the output capability including T3 with 
allAnnotatorFeatures = false


Here's the current behavior:

Before compile:  The following would all return true except as marked:
   containsType(T1)
   containsType(T2)  << returns false, T2 not in output capability, and 
before compile, T2 isn't recognized as a subtype of T1
   containsType(T2:f2)  << returns false, not in output, etc.
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf) <<< yes... that's what it does -

it ignores the actual feature name because allAnnotatorFeatures is true

After compile the following return true except as marked:
   containsType(T1)
   containsType(T2)  << T2 not in output capability, but is recognized 
as a subtype of T1
   containsType(T2:f2)  << T1's *allAnnotatorFeatures* is "inherited"
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf) << false: the actual features 
are looked up
  
After the change I'm proposing, everything would be same except that
   containsFeature(T1:asdfasdfasdfasdf) would return true.

I don't think this would affect the way you are using result specs, but 
please let me know if I've misunderstood something.  We don't want to 
impact users with this change.

Thanks for your comments :-)

-Marshall
>
> -----Original Message-----
> From: Marshall Schor [mailto:msa@schor.com] 
> Sent: Friday, January 25, 2008 5:06 AM
> To: uima-dev@incubator.apache.org
> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>
> The implementation for checking if a feature is in the result spec
does 
> the following:
>
> If the result-spec is not "compiled", it says the feature is present
if 
> it specifically put in, or if its type has the allAnnotatorFeatures
flag
>
> set.
>
> If the result-spec is "compiled", it says the feature is present if it

> is specifically put in, or if its type has the allAnnotatorFeatures
flag
>
> set and the feature exists in the type system.
>
> For performance / space reasons, I'd like to drop the 2nd case; this 
> would have the consequence of changing the result spec to return true 
> for features not in the type system where the type had the 
> allAnnotatorFeatures flag set.  This case shouldn't come up in
practice 
> because I can't think of good reason an annotator would ask if a
feature
>
> not in its type system was present. 
>
> Any objections?
>
> -Marshall
>
>
>   


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
LeHouillier, Frank D. wrote:
> We have an annotator that wraps a black box information extraction
> component that can return objects of a variety of types.  We check the
> result specification to see if the object is something we want to output
> based the actual string of the name of the type.  If you take away the
> compiled version of the ResultSpecification then we will have to also
> check whether the type that we get back from the type system is null or
> not.  
Hi Frank -

This change would *not* take away the compiled version of the Result 
Spec.  It would only change 1 behavior - that of returning "true" if a 
*feature* (not a type, as in your example above) was associated with a 
type where the capability was marked "allAnnotatorFeatures", even if the 
Feature didn't exist.

Suppose you had a type T1, and a type T2 whose super-type was T1, and 
features T1:f1 T2:f2, with an output capability = T1 with 
allAnnotatorFeatures = true, and finally T3 (not inheriting from T1 and 
feature T3:f3,  and the output capability including T3 with 
allAnnotatorFeatures = false


Here's the current behavior:

Before compile:  The following would all return true except as marked:
   containsType(T1)
   containsType(T2)  << returns false, T2 not in output capability, and 
before compile, T2 isn't recognized as a subtype of T1
   containsType(T2:f2)  << returns false, not in output, etc.
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf) <<< yes... that's what it does - 
it ignores the actual feature name because allAnnotatorFeatures is true

After compile the following return true except as marked:
   containsType(T1)
   containsType(T2)  << T2 not in output capability, but is recognized 
as a subtype of T1
   containsType(T2:f2)  << T1's *allAnnotatorFeatures* is "inherited"
   containsFeature(T1:f1)
   containsFeature(T1:asdfasdfasdfasdf) << false: the actual features 
are looked up
  
After the change I'm proposing, everything would be same except that
   containsFeature(T1:asdfasdfasdfasdf) would return true.

I don't think this would affect the way you are using result specs, but 
please let me know if I've misunderstood something.  We don't want to 
impact users with this change.

Thanks for your comments :-)

-Marshall
>
> -----Original Message-----
> From: Marshall Schor [mailto:msa@schor.com] 
> Sent: Friday, January 25, 2008 5:06 AM
> To: uima-dev@incubator.apache.org
> Subject: Re: capabilityLangugaeFlow - computeResultSpec
>
> The implementation for checking if a feature is in the result spec does 
> the following:
>
> If the result-spec is not "compiled", it says the feature is present if 
> it specifically put in, or if its type has the allAnnotatorFeatures flag
>
> set.
>
> If the result-spec is "compiled", it says the feature is present if it 
> is specifically put in, or if its type has the allAnnotatorFeatures flag
>
> set and the feature exists in the type system.
>
> For performance / space reasons, I'd like to drop the 2nd case; this 
> would have the consequence of changing the result spec to return true 
> for features not in the type system where the type had the 
> allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
> because I can't think of good reason an annotator would ask if a feature
>
> not in its type system was present. 
>
> Any objections?
>
> -Marshall
>
>
>   


RE: capabilityLangugaeFlow - computeResultSpec

Posted by "LeHouillier, Frank D." <Fr...@gd-ais.com>.
We have an annotator that wraps a black box information extraction
component that can return objects of a variety of types.  We check the
result specification to see if the object is something we want to output
based the actual string of the name of the type.  If you take away the
compiled version of the ResultSpecification then we will have to also
check whether the type that we get back from the type system is null or
not.  It isn't terribly onerous to have to check for null but it does
actually take some code modification and this situation might be present
in other people's analysis engines too.

-----Original Message-----
From: Marshall Schor [mailto:msa@schor.com] 
Sent: Friday, January 25, 2008 5:06 AM
To: uima-dev@incubator.apache.org
Subject: Re: capabilityLangugaeFlow - computeResultSpec

The implementation for checking if a feature is in the result spec does 
the following:

If the result-spec is not "compiled", it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag

set.

If the result-spec is "compiled", it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag

set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature

not in its type system was present. 

Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
The implementation for checking if a feature is in the result spec does 
the following:

If the result-spec is not "compiled", it says the feature is present if 
it specifically put in, or if its type has the allAnnotatorFeatures flag 
set.

If the result-spec is "compiled", it says the feature is present if it 
is specifically put in, or if its type has the allAnnotatorFeatures flag 
set and the feature exists in the type system.

For performance / space reasons, I'd like to drop the 2nd case; this 
would have the consequence of changing the result spec to return true 
for features not in the type system where the type had the 
allAnnotatorFeatures flag set.  This case shouldn't come up in practice 
because I can't think of good reason an annotator would ask if a feature 
not in its type system was present. 

Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Some corner cases.

Case 1:  If using the method to alter an existing result spec by adding 
a single type with an associated set of languages,  the passed in 
"allAnnotatorFeatures" boolean will now be "unioned" with any existing 
setting of this.  Javadocs updated to reflect this.

Case 2: If you have a capability for language 1 which says output type A 
(not all features), and have another capability for language 2 which 
says output type A (allAnnotatorFeatures), this will be represented in 
the result spec by having language 1 also be for all features.

Case 3: when setting the result spec, passing null in as the value of 
the languages (for those set/add things that take language arrays) will 
be equivalent to passing in the one language x-unspecified.  So, in 
particular, if a spec says produce type A for lang 1 and 2, and then you 
use the addResultType(for type A, null-passed-in-for-language-spec) this 
will add the language x-unspecified for type A. 

I will attempt to document these in the Javadocs.  Please post a 
response if these corner cases need to be handled differently.

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Eddie Epstein <ea...@gmail.com>.
Very possible that results specification doesn't work correctly
through the JNI. Nobody has ever used them in C++ since I've been
working with it.

Eddie

On Wed, Jan 23, 2008 at 4:02 PM, Marshall Schor <ms...@schor.com> wrote:
> Eddie - this is for you to check I think:
>
>  There is code in UimacppEngine in method serializeResultSpecification
>  which adds result spec types and features to 2 IntVector arrays (one for
>  Types, one for Features).  As currently designed, these "miss" getting
>  the subtypes of types, and all the features for types marked with the
>  all-features flag in the capabilities.
>
>  Are these required here?
>
>  Also, I notice that the result spec supports "languages" - but the
>  serialization for this doesn't support languages.  Is that intended?
>
>  -Marshall
>

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Eddie - this is for you to check I think:

There is code in UimacppEngine in method serializeResultSpecification 
which adds result spec types and features to 2 IntVector arrays (one for 
Types, one for Features).  As currently designed, these "miss" getting 
the subtypes of types, and all the features for types marked with the 
all-features flag in the capabilities. 

Are these required here? 

Also, I notice that the result spec supports "languages" - but the 
serialization for this doesn't support languages.  Is that intended?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> I'm thinking of simplifying the CapabilityContainer class.  Right now 
> it has code to process input and well as output capabilities, but the 
> input ones appear never to be used.  Can anyone confirm that?  If 
> confirmed, I would propose to remove the part related to input 
> capabilities.
Currently I think that is true. The idea behind this CapabilityContainer 
was that maybe someone can create an sophisticated flow the computes the 
best sequence for the engines based on their input and output 
capabilities... But if that is needed we also add the input capabilities 
again. :-)
>
> There is a HashMap, outputToFCapability, whose keys are Strings 
> corresponding to an output type-or-feature name, for any language, for 
> any capability-set.  The values do not seem to be used.  I'd like to 
> replace this with a hashSet.  Any objections?
Yes, that seems to be correct.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
I'm thinking of simplifying the CapabilityContainer class.  Right now it 
has code to process input and well as output capabilities, but the input 
ones appear never to be used.  Can anyone confirm that?  If confirmed, I 
would propose to remove the part related to input capabilities.

There is a HashMap, outputToFCapability, whose keys are Strings 
corresponding to an output type-or-feature name, for any language, for 
any capability-set.  The values do not seem to be used.  I'd like to 
replace this with a hashSet.  Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
The code which checks if a type or feature is in a result spec, for a 
particular language, always includes generalizing the language specifier 
by dropping the part beyond the first "-".  For example, "en-us" and 
"en-uk" are simplified to en.  Because of this, I'm thinking of 
shrinking the result specification (for performance / space reasons) by 
"normalizing" any language specs it uses by dropping the country 
extensions, if present.

Any objections?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Jan 24, 2008 9:51 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>   
>>> Without looking at the code, I didn't understand why this is a
>>> consequence of the behavior you described above.  I thought you said
>>> "and if the type has subtypes, it adds those too"?  Anyway, I
>>> definitely think that this should work.  By the definition of subtype,
>>> A-subtype *IS A* A.  So if an aggregate wants type A produced, then
>>> A-subtype should be produced.
>>>       
>> Why should an ae or a flow produce A-subtype when only A is required?
>>
>>     
>
> Because an instance of A-subtype is also by definition an instance of
> A.  Say a downstream annotator wants input type Person.  I have
> upstream annotators that can produce instances of GovernmentOfficial,
> Actor, and Author, all of which are subtypes of Person.  Shouldn't the
> upstream annotator produce these types?
 From my point of view, when using the capabilityLanguageFlow the 
application must specify all three or four
person subtypes when they should occur in the result. I think this is 
flow specific, another flow can it do different.

I absolutely agree that the result spec that is responsible for "what 
can be produced" should contain all types automatically if
the Person type is added.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 24, 2008 9:51 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> > Without looking at the code, I didn't understand why this is a
> > consequence of the behavior you described above.  I thought you said
> > "and if the type has subtypes, it adds those too"?  Anyway, I
> > definitely think that this should work.  By the definition of subtype,
> > A-subtype *IS A* A.  So if an aggregate wants type A produced, then
> > A-subtype should be produced.
> Why should an ae or a flow produce A-subtype when only A is required?
>

Because an instance of A-subtype is also by definition an instance of
A.  Say a downstream annotator wants input type Person.  I have
upstream annotators that can produce instances of GovernmentOfficial,
Actor, and Author, all of which are subtypes of Person.  Shouldn't the
upstream annotator produce these types?

  -Adam

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Adam Lally wrote:
> On Jan 24, 2008 7:54 AM, Marshall Schor <ms...@schor.com> wrote:
>   
>> If you recall, the compile method for results specifications augments
>> the set of types/features by doing 2 things:  if the type has
>> allAnnotatorFeatures=true, it adds all the features of the type; and if
>> the type has subtypes, it adds those too, propagating the
>> allAnnotatorFeatures processing down.
>>
>> A consequence would be that the mFlowTable would miss these cases:
>>
>>    An aggregate wants type A output, and has a delegate with output
>> capability A-subtype.
>>
>>     
>
> Without looking at the code, I didn't understand why this is a
> consequence of the behavior you described above.  I thought you said
> "and if the type has subtypes, it adds those too"?  Anyway, I
> definitely think that this should work.  By the definition of subtype,
> A-subtype *IS A* A.  So if an aggregate wants type A produced, then
> A-subtype should be produced.
Why should an ae or a flow produce A-subtype when only A is required?

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
The thing that adds allAnnotatorFeatures and subtypes is "compiling" the 
result spec. The builder of the mFlowTable doesn't compile the 
resultspec before using it - so it doesn't have these consequences.

-Marshall

Adam Lally wrote:
> On Jan 24, 2008 7:54 AM, Marshall Schor <ms...@schor.com> wrote:
>   
>> If you recall, the compile method for results specifications augments
>> the set of types/features by doing 2 things:  if the type has
>> allAnnotatorFeatures=true, it adds all the features of the type; and if
>> the type has subtypes, it adds those too, propagating the
>> allAnnotatorFeatures processing down.
>>
>> A consequence would be that the mFlowTable would miss these cases:
>>
>>    An aggregate wants type A output, and has a delegate with output
>> capability A-subtype.
>>
>>     
>
> Without looking at the code, I didn't understand why this is a
> consequence of the behavior you described above.  I thought you said
> "and if the type has subtypes, it adds those too"?  Anyway, I
> definitely think that this should work.  By the definition of subtype,
> A-subtype *IS A* A.  So if an aggregate wants type A produced, then
> A-subtype should be produced.
>
>   
>>    An aggregate wants Feature F output, and has a delegate with output
>> capability type-A with allAnnotatorFeatures marked, having that feature.
>>
>>     
>
> We should be supporting this as well.  Again I didn't follow why the
> behavior you described above doesn't do this.
>
> -Adam
>
>
>   


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 24, 2008 7:54 AM, Marshall Schor <ms...@schor.com> wrote:
> If you recall, the compile method for results specifications augments
> the set of types/features by doing 2 things:  if the type has
> allAnnotatorFeatures=true, it adds all the features of the type; and if
> the type has subtypes, it adds those too, propagating the
> allAnnotatorFeatures processing down.
>
> A consequence would be that the mFlowTable would miss these cases:
>
>    An aggregate wants type A output, and has a delegate with output
> capability A-subtype.
>

Without looking at the code, I didn't understand why this is a
consequence of the behavior you described above.  I thought you said
"and if the type has subtypes, it adds those too"?  Anyway, I
definitely think that this should work.  By the definition of subtype,
A-subtype *IS A* A.  So if an aggregate wants type A produced, then
A-subtype should be produced.

>    An aggregate wants Feature F output, and has a delegate with output
> capability type-A with allAnnotatorFeatures marked, having that feature.
>

We should be supporting this as well.  Again I didn't follow why the
behavior you described above doesn't do this.

-Adam

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
 From this point of view..

+1 to deprecate allAnnotatoreFeatures

-- Michael

Marshall Schor wrote:
> What about allAnnotatorFeatures?  Supposed the aggregate says it needs 
> a particular Feature of a particular type.  Suppose a delegate is 
> marked as producing that type, and has allAnnotatorFeatures marked.  
> This wouldn't work.
> You could say in this case that the output capability of the delegate 
> *must not* rely on allAnnotatorFeatures, but instead *must* explicitly 
> list those features it produces.  In one sense, this could be a good 
> idea, because no delegate could *accurately* mark that it outputs 
> allAnnotatorFeatures, anyway, due to the possiblity that some other 
> component could add features to the type in question, completely 
> unknown to this delegate - and of course, this delegate would not be 
> setting those other features.
>
> This would lead to another question - should we deprecate 
> allAnnotatoreFeatures because of this?
>
> -Marshall
>
> Michael Baessler wrote:
>> Marshall Schor wrote:
>>> Without actually testing this (so this may be a wrong conclusion) - 
>>> it seems to me that the code in CapabilityLanguageFlowController 
>>> that sets up the result specs for components, by language, in the 
>>> mFlowTable, ignores the typesOrFeatures that the result spec adds 
>>> when compile() is called.
>>>
>>> If you recall, the compile method for results specifications 
>>> augments the set of types/features by doing 2 things:  if the type 
>>> has allAnnotatorFeatures=true, it adds all the features of the type; 
>>> and if the type has subtypes, it adds those too, propagating the 
>>> allAnnotatorFeatures processing down.
>>>
>>> A consequence would be that the mFlowTable would miss these cases:
>>>
>>>   An aggregate wants type A output, and has a delegate with output 
>>> capability A-subtype.
>>>
>>>   An aggregate wants Feature F output, and has a delegate with 
>>> output capability type-A with allAnnotatorFeatures marked, having 
>>> that feature.
>>>
>>> Can anyone confirm this?  (perhaps adding a test case :-) )?
>>>
>>> Michael - do you know what the design intent was for this - if 
>>> things are as I've conjectured above, is this something that needs 
>>> to be fixed, or is it working as intended?
>> Yes that is correct. The mFlowTable only contains these output types 
>> that are specified in the aggregate ae as output type. The guideline 
>> for the capabilityLanguageFlow was to
>> specify all output results (with all interim results) in the 
>> aggregate that must be produced.
>>
>> I we now change the mFlowTable content to match the resultSpec we 
>> also changes the capabilityLanguageFlow. So if we do that, how can  I 
>> prevent the  a sub types  isn't produced if a super type must be 
>> produced? So I prefer to stay with the current design - specify all 
>> you need.
>>
>> What do you think?
>>
>> -- Michale
>>
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
What about allAnnotatorFeatures?  Supposed the aggregate says it needs a 
particular Feature of a particular type.  Suppose a delegate is marked 
as producing that type, and has allAnnotatorFeatures marked.  This 
wouldn't work. 

You could say in this case that the output capability of the delegate 
*must not* rely on allAnnotatorFeatures, but instead *must* explicitly 
list those features it produces.  In one sense, this could be a good 
idea, because no delegate could *accurately* mark that it outputs 
allAnnotatorFeatures, anyway, due to the possiblity that some other 
component could add features to the type in question, completely unknown 
to this delegate - and of course, this delegate would not be setting 
those other features.

This would lead to another question - should we deprecate 
allAnnotatoreFeatures because of this?

-Marshall

Michael Baessler wrote:
> Marshall Schor wrote:
>> Without actually testing this (so this may be a wrong conclusion) - 
>> it seems to me that the code in CapabilityLanguageFlowController that 
>> sets up the result specs for components, by language, in the 
>> mFlowTable, ignores the typesOrFeatures that the result spec adds 
>> when compile() is called.
>>
>> If you recall, the compile method for results specifications augments 
>> the set of types/features by doing 2 things:  if the type has 
>> allAnnotatorFeatures=true, it adds all the features of the type; and 
>> if the type has subtypes, it adds those too, propagating the 
>> allAnnotatorFeatures processing down.
>>
>> A consequence would be that the mFlowTable would miss these cases:
>>
>>   An aggregate wants type A output, and has a delegate with output 
>> capability A-subtype.
>>
>>   An aggregate wants Feature F output, and has a delegate with output 
>> capability type-A with allAnnotatorFeatures marked, having that feature.
>>
>> Can anyone confirm this?  (perhaps adding a test case :-) )?
>>
>> Michael - do you know what the design intent was for this - if things 
>> are as I've conjectured above, is this something that needs to be 
>> fixed, or is it working as intended?
> Yes that is correct. The mFlowTable only contains these output types 
> that are specified in the aggregate ae as output type. The guideline 
> for the capabilityLanguageFlow was to
> specify all output results (with all interim results) in the aggregate 
> that must be produced.
>
> I we now change the mFlowTable content to match the resultSpec we also 
> changes the capabilityLanguageFlow. So if we do that, how can  I 
> prevent the  a sub types  isn't produced if a super type must be 
> produced? So I prefer to stay with the current design - specify all 
> you need.
>
> What do you think?
>
> -- Michale
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> Without actually testing this (so this may be a wrong conclusion) - it 
> seems to me that the code in CapabilityLanguageFlowController that 
> sets up the result specs for components, by language, in the 
> mFlowTable, ignores the typesOrFeatures that the result spec adds when 
> compile() is called.
>
> If you recall, the compile method for results specifications augments 
> the set of types/features by doing 2 things:  if the type has 
> allAnnotatorFeatures=true, it adds all the features of the type; and 
> if the type has subtypes, it adds those too, propagating the 
> allAnnotatorFeatures processing down.
>
> A consequence would be that the mFlowTable would miss these cases:
>
>   An aggregate wants type A output, and has a delegate with output 
> capability A-subtype.
>
>   An aggregate wants Feature F output, and has a delegate with output 
> capability type-A with allAnnotatorFeatures marked, having that feature.
>
> Can anyone confirm this?  (perhaps adding a test case :-) )?
>
> Michael - do you know what the design intent was for this - if things 
> are as I've conjectured above, is this something that needs to be 
> fixed, or is it working as intended?
Yes that is correct. The mFlowTable only contains these output types 
that are specified in the aggregate ae as output type. The guideline for 
the capabilityLanguageFlow was to
specify all output results (with all interim results) in the aggregate 
that must be produced.

I we now change the mFlowTable content to match the resultSpec we also 
changes the capabilityLanguageFlow. So if we do that, how can  I prevent 
the  a sub types  isn't produced if a super type must be produced? So I 
prefer to stay with the current design - specify all you need.

What do you think?

-- Michale


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Without actually testing this (so this may be a wrong conclusion) - it 
seems to me that the code in CapabilityLanguageFlowController that sets 
up the result specs for components, by language, in the mFlowTable, 
ignores the typesOrFeatures that the result spec adds when compile() is 
called.

If you recall, the compile method for results specifications augments 
the set of types/features by doing 2 things:  if the type has 
allAnnotatorFeatures=true, it adds all the features of the type; and if 
the type has subtypes, it adds those too, propagating the 
allAnnotatorFeatures processing down.

A consequence would be that the mFlowTable would miss these cases:

   An aggregate wants type A output, and has a delegate with output 
capability A-subtype.

   An aggregate wants Feature F output, and has a delegate with output 
capability type-A with allAnnotatorFeatures marked, having that feature.

Can anyone confirm this?  (perhaps adding a test case :-) )?

Michael - do you know what the design intent was for this - if things 
are as I've conjectured above, is this something that needs to be fixed, 
or is it working as intended?

-Marshall

 

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.
On Jan 23, 2008 10:07 AM, Marshall Schor <ms...@schor.com> wrote:
> Given that (as far as I can tell - let's see, that would be AFAICT), the
> resultSpec is *always* used in compiled mode (because the wrapper always
> compiles it), the current implementation would have the effect that
>
>   1) the allFeatures flag would work
>   2) subtypes of a type specified in the resultSpec would also be
> implicitly in the resultSpec
>
> Therefore, to keep the implementation behavior constant (a good thing to
> try for, always :-) ) we should insure any changes continue to exhibit
> this behavior, and update the Javadocs and documentation to reflect this.
>

+1.  It was certainly my intention to always compile the ResultSpec.
I believe that subtypes should be included.  To not do that I think is
contrary to the expected semantics of the supertype/subtype
relationship.  Plus I think it's been that way in UIMA for a long time
now.

 -Adam

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Fine with me. Seems to be the way it works in the past, so we should not 
change it!

-- Michael

Marshall Schor wrote:
> Given that (as far as I can tell - let's see, that would be AFAICT), 
> the resultSpec is *always* used in compiled mode (because the wrapper 
> always compiles it), the current implementation would have the effect 
> that
>
>  1) the allFeatures flag would work
>  2) subtypes of a type specified in the resultSpec would also be 
> implicitly in the resultSpec
>
> Therefore, to keep the implementation behavior constant (a good thing 
> to try for, always :-) ) we should insure any changes continue to 
> exhibit this behavior, and update the Javadocs and documentation to 
> reflect this.
>
> Other opinions?
>
> -Marshall
>
> Michael Baessler wrote:
>>
>> Marshall Schor wrote:
>>> In looking thru the code for ResultSpecification_Impl, it seems 
>>> there seems to be an inconsistency - unless I (quite possible :-) ) 
>>> missed something.
>>>
>>> The calls to the containsType(...) method operate in one of 2 ways, 
>>> depending on whether or not the result specification has been 
>>> "compiled" by calling the compile method.
>>>
>>> If the result spec has not been compiled, then containsType(...) 
>>> returns true iff the type specified is "equal(...)" to a type in the 
>>> Result Specification.
>>>
>>> If it has been compiled, then the containsType returns true iff the 
>>> type specified is equal to a type *or any of its subtypes* in the 
>>> Result Specification.  This is because compiling a 
>>> resultSpecification adds the subtypes.
>>>
>>> Can others confirm this?  In actual use within annotators, it may be 
>>> that the result spec is always compiled before use (I haven't yet 
>>> traced that down).
>> Yes, you are right, when the result spec is compiled all subtypes of 
>> a type are additionally added to the map. The same for features, if 
>> the allAnnotationFeatures is set to true.
>>>
>>> Should the code and Javadocs be updated to have containsType return 
>>> true for subtypes of types in the result spec, always?
>> I think both ways should return the same result. But which way is 
>> correct? If I specify a type in the result spec is it correct that 
>> all subtypes are also in?
>> If I just want to have the sub types in the result spec it is easy to 
>> do, but what if I only want to have the super types in the result 
>> spec without the subtypes?
>>
>> -- Michael
>>
>>
>>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
Given that (as far as I can tell - let's see, that would be AFAICT), the 
resultSpec is *always* used in compiled mode (because the wrapper always 
compiles it), the current implementation would have the effect that

  1) the allFeatures flag would work
  2) subtypes of a type specified in the resultSpec would also be 
implicitly in the resultSpec

Therefore, to keep the implementation behavior constant (a good thing to 
try for, always :-) ) we should insure any changes continue to exhibit 
this behavior, and update the Javadocs and documentation to reflect this.

Other opinions?

-Marshall

Michael Baessler wrote:
>
> Marshall Schor wrote:
>> In looking thru the code for ResultSpecification_Impl, it seems there 
>> seems to be an inconsistency - unless I (quite possible :-) ) missed 
>> something.
>>
>> The calls to the containsType(...) method operate in one of 2 ways, 
>> depending on whether or not the result specification has been 
>> "compiled" by calling the compile method.
>>
>> If the result spec has not been compiled, then containsType(...) 
>> returns true iff the type specified is "equal(...)" to a type in the 
>> Result Specification.
>>
>> If it has been compiled, then the containsType returns true iff the 
>> type specified is equal to a type *or any of its subtypes* in the 
>> Result Specification.  This is because compiling a 
>> resultSpecification adds the subtypes.
>>
>> Can others confirm this?  In actual use within annotators, it may be 
>> that the result spec is always compiled before use (I haven't yet 
>> traced that down).
> Yes, you are right, when the result spec is compiled all subtypes of a 
> type are additionally added to the map. The same for features, if the 
> allAnnotationFeatures is set to true.
>>
>> Should the code and Javadocs be updated to have containsType return 
>> true for subtypes of types in the result spec, always?
> I think both ways should return the same result. But which way is 
> correct? If I specify a type in the result spec is it correct that all 
> subtypes are also in?
> If I just want to have the sub types in the result spec it is easy to 
> do, but what if I only want to have the super types in the result spec 
> without the subtypes?
>
> -- Michael
>
>
>


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.
Marshall Schor wrote:
> In looking thru the code for ResultSpecification_Impl, it seems there 
> seems to be an inconsistency - unless I (quite possible :-) ) missed 
> something.
>
> The calls to the containsType(...) method operate in one of 2 ways, 
> depending on whether or not the result specification has been 
> "compiled" by calling the compile method.
>
> If the result spec has not been compiled, then containsType(...) 
> returns true iff the type specified is "equal(...)" to a type in the 
> Result Specification.
>
> If it has been compiled, then the containsType returns true iff the 
> type specified is equal to a type *or any of its subtypes* in the 
> Result Specification.  This is because compiling a resultSpecification 
> adds the subtypes.
>
> Can others confirm this?  In actual use within annotators, it may be 
> that the result spec is always compiled before use (I haven't yet 
> traced that down).
Yes, you are right, when the result spec is compiled all subtypes of a 
type are additionally added to the map. The same for features, if the 
allAnnotationFeatures is set to true.
>
> Should the code and Javadocs be updated to have containsType return 
> true for subtypes of types in the result spec, always?
I think both ways should return the same result. But which way is 
correct? If I specify a type in the result spec is it correct that all 
subtypes are also in?
If I just want to have the sub types in the result spec it is easy to 
do, but what if I only want to have the super types in the result spec 
without the subtypes?

-- Michael


Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.
In looking thru the code for ResultSpecification_Impl, it seems there 
seems to be an inconsistency - unless I (quite possible :-) ) missed 
something.

The calls to the containsType(...) method operate in one of 2 ways, 
depending on whether or not the result specification has been "compiled" 
by calling the compile method.

If the result spec has not been compiled, then containsType(...) returns 
true iff the type specified is "equal(...)" to a type in the Result 
Specification.

If it has been compiled, then the containsType returns true iff the type 
specified is equal to a type *or any of its subtypes* in the Result 
Specification.  This is because compiling a resultSpecification adds the 
subtypes.

Can others confirm this?  In actual use within annotators, it may be 
that the result spec is always compiled before use (I haven't yet traced 
that down).

Should the code and Javadocs be updated to have containsType return true 
for subtypes of types in the result spec, always?

-Marshall