You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@uima.apache.org by Michael Baessler <mb...@michael-baessler.de> on 2008/01/07 12:56:12 UTC

Re: capabilityLangugaeFlow - computeResultSpec

Adam Lally wrote:
> On Dec 18, 2007 8:55 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>   
>> Hi,
>> I got the request on my table that the computation of the result spec
>> for the capabilityLanguageFlow takes to much time.
>> I looked at the code and found something interesting... maybe I'm wrong,
>> I'm not sure.
>>
>> When looking at the ASB_impl.java at processUntilNextOutputCas() I found
>> the following:
>>
>>                //check if we have to set result spec, to support
>> capability language flow
>>                 if (nextStep instanceof SimpleStepWithResultSpec) {
>>                   ResultSpecification rs =
>> ((SimpleStepWithResultSpec)nextStep).getResultSpecification();
>>                   if (rs != null) {
>>                     nextAe.setResultSpecification(rs);
>>                   }
>>                 }
>>                 // invoke next AE in flow
>>                 CasIterator casIter = null;
>>                 CAS outputCas = null; //used if the AE we call outputs a
>> new CAS
>>                 try {
>>                   casIter = nextAe.processAndOutputNewCASes(cas);
>>
>> When a capabilityLanguageFlow is used, the ResultSpec for the flow
>> engines are precomputed if possible. The code above takes this
>> precomputed ResultSpec from the flow node and set it for the current AE.
>>
>> When I go deeper to
>>
>>      casIter = nextAe.processAndOutputNewCASes(cas);
>>
>> I found in the PrimitiveAnalysisEngine_impl.java class in the
>> callAnalysisComponentProcess() method the following:
>>
>>         if (mResultSpecChanged || mLastTypeSystem != view.getTypeSystem()) {
>>           mLastTypeSystem = view.getTypeSystem();
>>           mCurrentResultSpecification.compile(mLastTypeSystem);
>>           // the actual ResultSpec we send to the component is formed by
>>           // looking at this primitive AE's declared output types and
>> eliminiating
>>           // any that are not in mCurrentResultSpecification.
>>           ResultSpecification analysisComponentResultSpec =
>> computeAnalysisComponentResultSpec(
>>                   mCurrentResultSpecification,
>> getAnalysisEngineMetaData().getCapabilities());
>>           // compile result spec - necessary to get type subsumption to
>> work properly
>>           analysisComponentResultSpec.compile(mLastTypeSystem);
>>
>> mAnalysisComponent.setResultSpecification(analysisComponentResultSpec);
>>           mResultSpecChanged = false;
>>         }
>>
>> any time when the ResultSpec changed, the ResultSpec is recomputed. But
>> the ResultSpec is changed any time when setResultSpecification() is called.
>> So what does this mean. The first code fragment in the email shows how
>> to get the ResultSpec from the flow controller and set it on the AE.
>> - So the result spec changed - The second code fragment shows what is
>> executed if the ResultSpec has been changed and how it is recomputed.
>> This means that the ResultSpec is recomputed each time process is
>> called. I don't think this is necessary.
>>
>>     
>
> That seems like a good analysis of the situation.  I think what we
> need is to detect when the ResultSpecification has actually changed
> and when it hasn't.  That might be tricky to do right.  If we just
> check if the new ResultSpecification is == to the existing
> ResultSpecification, that wouldn't work if the ResultSpecification had
> been modified (it would be == but the contents wouldn't be the same).
> Perhaps we could add a dirty flag to the ResultSpecification to catch
> this.
I tried to figure out how the ResultSpecification handling in uima-core 
works with all side effects to check how it can be done
to detect when a ResultSpec has changed. Unfortunately I was not able 
to, there are to much open questions where I don't know
exactly if it is right in any case ... :-(

Adam can you please look at this issue?

Thanks Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.

Yes, I think so. This test dumps the result spec for each AE to a file 
to check if it was computed correctly.
The computation of the result spec is done during the initialization of 
the aggregate AE when the capability language flow is created.
The precomputed result spec can later be used in the document 
processing, but this is currently not used. It is recomputed each time.

For the my simple performance test I removed the second computation that 
is done during runtime processing
( PrimitiveAnalysisEngine_impl.java: protected ResultSpecification 
computeAnalysisComponentResultSpec() ). So the original computed result 
spec is used.
But we cannot remove this code completely since it can happen that a 
result spec is provided by the application and it must be recomputed 
dynamically.

-- Michael

Marshall Schor wrote:
> Michael -
>
> I'm confused about how this test is setup.  The test descriptor this 
> code uses loads an aggregate, and then runs a process method which 
> ends up calling some dummy process method called 
> SequencerTestAnnotator.  This process method dumps (to a file) the 
> result spec.  Is that the case you're running?
>
> How do you turn on and off the (re)computation of the result spec?
>
> -Marshall
>
> Michael Baessler wrote:
>> Michael Baessler wrote:
>>> Adam Lally wrote:
>>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>>>> wrote:
>>>>  
>>>>> I tried to figure out how the ResultSpecification handling in 
>>>>> uima-core
>>>>> works with all side effects to check how it can be done
>>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>>> to, there are to much open questions where I don't know
>>>>> exactly if it is right in any case ... :-(
>>>>>
>>>>> Adam can you please look at this issue?
>>>>>
>>>>>     
>>>>
>>>> I can try to take a look, but I don't have a lot of time.  Do you have
>>>> a test case for this, where you expect I would see a significant
>>>> performance improvement if I fix this?
>>>>   
>>> Sorry I have to performance test case. I checked my assumption using 
>>> the debugger.
>>>
>>> I used the following main() with a loop over the process call to 
>>> check if the result spec is recomputed each time.
>>> The descriptor is the same as used in the capabilityLanguageFlow 
>>> test case of the uimaj-core project.
>>> Maybe a sysout helps to detect if the unnecessary calls are done or 
>>> not.
>>>
>>> Maybe when iterating more than 10 times will give you performance 
>>> numbers before and after. Maybe adding additional capabilities
>>> that must be analyzed will increase the time used to compute the 
>>> result spec. I will look at this tomorrow.
>>>
>>>  public static void main(String[] args) {
>>>
>>>      AnalysisEngine ae = null;
>>>      try {
>>>
>>>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>>
>>>         XMLInputSource in = new 
>>> XMLInputSource(JUnitExtension.getFile(desc));
>>>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>>               .parseResourceSpecifier(in);
>>>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, 
>>> null);
>>>         CAS cas = ae.newCAS();
>>>         String text = "Hello world!";
>>>         cas.setDocumentText(text);
>>>         cas.setDocumentLanguage("en");
>>>         for (int i = 0; i < 10; i++) {
>>>            ae.process(cas);
>>>         }
>>>      } catch (Exception ex) {
>>>         ex.printStackTrace();
>>>      }
>>>   }
>>>
>>> -- Michael
>> When setting the loop counter to 1000 I have 6000ms without 
>> recomputing the result spec and
>> 27000ms when recomputing the result spec. I think this should be 
>> sufficient for testing.
>>
>> -- Michael
>>
>>
>

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.

Michael -

I'm confused about how this test is setup.  The test descriptor this 
code uses loads an aggregate, and then runs a process method which ends 
up calling some dummy process method called SequencerTestAnnotator.  
This process method dumps (to a file) the result spec.  Is that the case 
you're running?

How do you turn on and off the (re)computation of the result spec?

-Marshall

Michael Baessler wrote:
> Michael Baessler wrote:
>> Adam Lally wrote:
>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>>> wrote:
>>>  
>>>> I tried to figure out how the ResultSpecification handling in 
>>>> uima-core
>>>> works with all side effects to check how it can be done
>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>> to, there are to much open questions where I don't know
>>>> exactly if it is right in any case ... :-(
>>>>
>>>> Adam can you please look at this issue?
>>>>
>>>>     
>>>
>>> I can try to take a look, but I don't have a lot of time.  Do you have
>>> a test case for this, where you expect I would see a significant
>>> performance improvement if I fix this?
>>>   
>> Sorry I have to performance test case. I checked my assumption using 
>> the debugger.
>>
>> I used the following main() with a loop over the process call to 
>> check if the result spec is recomputed each time.
>> The descriptor is the same as used in the capabilityLanguageFlow test 
>> case of the uimaj-core project.
>> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>>
>> Maybe when iterating more than 10 times will give you performance 
>> numbers before and after. Maybe adding additional capabilities
>> that must be analyzed will increase the time used to compute the 
>> result spec. I will look at this tomorrow.
>>
>>  public static void main(String[] args) {
>>
>>      AnalysisEngine ae = null;
>>      try {
>>
>>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>
>>         XMLInputSource in = new 
>> XMLInputSource(JUnitExtension.getFile(desc));
>>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>               .parseResourceSpecifier(in);
>>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>>         CAS cas = ae.newCAS();
>>         String text = "Hello world!";
>>         cas.setDocumentText(text);
>>         cas.setDocumentLanguage("en");
>>         for (int i = 0; i < 10; i++) {
>>            ae.process(cas);
>>         }
>>      } catch (Exception ex) {
>>         ex.printStackTrace();
>>      }
>>   }
>>
>> -- Michael
> When setting the loop counter to 1000 I have 6000ms without 
> recomputing the result spec and
> 27000ms when recomputing the result spec. I think this should be 
> sufficient for testing.
>
> -- Michael
>
>

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.

Marshall Schor wrote:
> I think my change is ready for code review.  I kept all the 
> idiosyncratic behavior of the old code, so users should not notice any 
> difference.  All the tests run, and test case above runs at the 6000ms 
> range.
> There are 3 areas changed:
> 1) ResultSpecification_impl is restructured for speed and smaller 
> memory footprint
> 2) The "compiling" of this is deferred till the latest possible point; 
> operations that can be done with the uncompiled form are done that way.
> 3) The code in the CapabilityLanguageFlow where it returns a next step 
> now caches the result spec by component key, and only sends it down if 
> it is different from what this controller sent the last time in 
> invoked this component in the flow.
> This test depends on the precomputed result specs kept in the mTable 
> variable being constant - which I believe they are (once they are 
> computed) - but Michael -can you confirm this?
Yes the mTable variable contains the precomputed result specs for 
sequence engines. These result specs are constant and do not change 
during the processing. The computation is done based on the output types 
of the aggregate that defines the capabilityLanguageFlow. If the result 
spec is passed in by the process method, the precomputed mTable cannot 
be used since then results that should be may be different from the 
aggregate output types.
> With this change, the code in the framework to "intersect" the result 
> spec with a component's output capabilities, by language, is not 
> redone on every call, but only when the language changes.  That code 
> (to do the intersection) is running faster, in any case, due to the 
> restructuring.
>
> Because this is a big change it would be good to do a code review of 
> some kind - any thoughts on how to do this?
I hoped that Adam could look at this, since he know the code best from 
my point of view. All the capabilityLanguageFlow related items has been 
discussed already on the list in detail and I think now we also have 
some good tests for this.
If the code is checked in I can run again my performance tests to check 
the performance improvements.

Opinions?

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Marshall Schor <ms...@schor.com>.

Michael Baessler wrote:
> Michael Baessler wrote:
>> Adam Lally wrote:
>>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>>> wrote:
>>>  
>>>> I tried to figure out how the ResultSpecification handling in 
>>>> uima-core
>>>> works with all side effects to check how it can be done
>>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>>> to, there are to much open questions where I don't know
>>>> exactly if it is right in any case ... :-(
>>>>
>>>> Adam can you please look at this issue?
>>>>
>>>>     
>>>
>>> I can try to take a look, but I don't have a lot of time.  Do you have
>>> a test case for this, where you expect I would see a significant
>>> performance improvement if I fix this?
>>>   
>> Sorry I have to performance test case. I checked my assumption using 
>> the debugger.
>>
>> I used the following main() with a loop over the process call to 
>> check if the result spec is recomputed each time.
>> The descriptor is the same as used in the capabilityLanguageFlow test 
>> case of the uimaj-core project.
>> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>>
>> Maybe when iterating more than 10 times will give you performance 
>> numbers before and after. Maybe adding additional capabilities
>> that must be analyzed will increase the time used to compute the 
>> result spec. I will look at this tomorrow.
>>
>>  public static void main(String[] args) {
>>
>>      AnalysisEngine ae = null;
>>      try {
>>
>>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>>
>>         XMLInputSource in = new 
>> XMLInputSource(JUnitExtension.getFile(desc));
>>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>>               .parseResourceSpecifier(in);
>>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>>         CAS cas = ae.newCAS();
>>         String text = "Hello world!";
>>         cas.setDocumentText(text);
>>         cas.setDocumentLanguage("en");
>>         for (int i = 0; i < 10; i++) {
>>            ae.process(cas);
>>         }
>>      } catch (Exception ex) {
>>         ex.printStackTrace();
>>      }
>>   }
>>
>> -- Michael
> When setting the loop counter to 1000 I have 6000ms without 
> recomputing the result spec and
> 27000ms when recomputing the result spec. I think this should be 
> sufficient for testing.
I think my change is ready for code review.  I kept all the 
idiosyncratic behavior of the old code, so users should not notice any 
difference.  All the tests run, and test case above runs at the 6000ms 
range. 

There are 3 areas changed:
1) ResultSpecification_impl is restructured for speed and smaller memory 
footprint
2) The "compiling" of this is deferred till the latest possible point; 
operations that can be done with the uncompiled form are done that way.
3) The code in the CapabilityLanguageFlow where it returns a next step 
now caches the result spec by component key, and only sends it down if 
it is different from what this controller sent the last time in invoked 
this component in the flow. 

This test depends on the precomputed result specs kept in the mTable 
variable being constant - which I believe they are (once they are 
computed) - but Michael -can you confirm this? 

With this change, the code in the framework to "intersect" the result 
spec with a component's output capabilities, by language, is not redone 
on every call, but only when the language changes.  That code (to do the 
intersection) is running faster, in any case, due to the restructuring.

Because this is a big change it would be good to do a code review of 
some kind - any thoughts on how to do this?

-Marshall

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.

Michael Baessler wrote:
> Adam Lally wrote:
>> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> 
>> wrote:
>>  
>>> I tried to figure out how the ResultSpecification handling in uima-core
>>> works with all side effects to check how it can be done
>>> to detect when a ResultSpec has changed. Unfortunately I was not able
>>> to, there are to much open questions where I don't know
>>> exactly if it is right in any case ... :-(
>>>
>>> Adam can you please look at this issue?
>>>
>>>     
>>
>> I can try to take a look, but I don't have a lot of time.  Do you have
>> a test case for this, where you expect I would see a significant
>> performance improvement if I fix this?
>>   
> Sorry I have to performance test case. I checked my assumption using 
> the debugger.
>
> I used the following main() with a loop over the process call to check 
> if the result spec is recomputed each time.
> The descriptor is the same as used in the capabilityLanguageFlow test 
> case of the uimaj-core project.
> Maybe a sysout helps to detect if the unnecessary calls are done or not.
>
> Maybe when iterating more than 10 times will give you performance 
> numbers before and after. Maybe adding additional capabilities
> that must be analyzed will increase the time used to compute the 
> result spec. I will look at this tomorrow.
>
>  public static void main(String[] args) {
>
>      AnalysisEngine ae = null;
>      try {
>
>         String desc = "SequencerCapabilityLanguageAggregateES.xml";
>
>         XMLInputSource in = new 
> XMLInputSource(JUnitExtension.getFile(desc));
>         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
>               .parseResourceSpecifier(in);
>         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
>         CAS cas = ae.newCAS();
>         String text = "Hello world!";
>         cas.setDocumentText(text);
>         cas.setDocumentLanguage("en");
>         for (int i = 0; i < 10; i++) {
>            ae.process(cas);
>         }
>      } catch (Exception ex) {
>         ex.printStackTrace();
>      }
>   }
>
> -- Michael
When setting the loop counter to 1000 I have 6000ms without recomputing 
the result spec and
27000ms when recomputing the result spec. I think this should be 
sufficient for testing.

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Michael Baessler <mb...@michael-baessler.de>.

Adam Lally wrote:
> On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
>   
>> I tried to figure out how the ResultSpecification handling in uima-core
>> works with all side effects to check how it can be done
>> to detect when a ResultSpec has changed. Unfortunately I was not able
>> to, there are to much open questions where I don't know
>> exactly if it is right in any case ... :-(
>>
>> Adam can you please look at this issue?
>>
>>     
>
> I can try to take a look, but I don't have a lot of time.  Do you have
> a test case for this, where you expect I would see a significant
> performance improvement if I fix this?
>   
Sorry I have to performance test case. I checked my assumption using the 
debugger.

I used the following main() with a loop over the process call to check 
if the result spec is recomputed each time.
The descriptor is the same as used in the capabilityLanguageFlow test 
case of the uimaj-core project.
Maybe a sysout helps to detect if the unnecessary calls are done or not.

Maybe when iterating more than 10 times will give you performance 
numbers before and after. Maybe adding additional capabilities
that must be analyzed will increase the time used to compute the result 
spec. I will look at this tomorrow.

  public static void main(String[] args) {

      AnalysisEngine ae = null;
      try {

         String desc = "SequencerCapabilityLanguageAggregateES.xml";

         XMLInputSource in = new 
XMLInputSource(JUnitExtension.getFile(desc));
         ResourceSpecifier specifier = UIMAFramework.getXMLParser()
               .parseResourceSpecifier(in);
         ae = UIMAFramework.produceAnalysisEngine(specifier, null, null);
         CAS cas = ae.newCAS();
         String text = "Hello world!";
         cas.setDocumentText(text);
         cas.setDocumentLanguage("en");
         for (int i = 0; i < 10; i++) {
            ae.process(cas);
         }
      } catch (Exception ex) {
         ex.printStackTrace();
      }
   }

-- Michael

Re: capabilityLangugaeFlow - computeResultSpec

Posted by Adam Lally <al...@alum.rpi.edu>.

On Jan 7, 2008 6:56 AM, Michael Baessler <mb...@michael-baessler.de> wrote:
> I tried to figure out how the ResultSpecification handling in uima-core
> works with all side effects to check how it can be done
> to detect when a ResultSpec has changed. Unfortunately I was not able
> to, there are to much open questions where I don't know
> exactly if it is right in any case ... :-(
>
> Adam can you please look at this issue?
>

I can try to take a look, but I don't have a lot of time.  Do you have
a test case for this, where you expect I would see a significant
performance improvement if I fix this?

-Adam