You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@uima.apache.org by Peter Klügl <pk...@ki.informatik.uni-wuerzburg.de> on 2010/01/27 13:37:30 UTC

CASImpl.createFilteredIterator performance / Eclipse source plugin

Hello,

I have some performance issues with a current application. The profiler 
tells me that over 80% of the execution time was spent on the about 200 
calls of the method CASImpl.createFilteredIterator(). These 80% are 
sometimes more than 1000s for one AE.process() and there is a lot more 
moving on the index going on within those 20%.

I can't investigate the cause for this performace hot spot any further, 
also because I am missing the source plugins for UIMA runtime plugin. 
The application is running within Eclipse. My first question: Is there 
an easy way to get/create a source plugin for the UIMA core/runtime? At 
best without using maven? Any best practices for profiling UIMA in Eclipse?

My second question: Is that a normal behavior or can anyone give me a 
hint how I could increase the performance?

Some exemplary information about the usage of the method:
The CAS contains about 40 pages of plain text with about 50 lines per 
page. Part of the text (maybe 3 pages) is annotated and for each line of 
the segment the methods createFilteredIterator() is called with some 
constraints about types and of course about the window of the iterator 
(that is the line). I also tried the replace the filtered iterator with 
a window constraint with a filtered iterator of a subiterator of the 
annotation index resulting in no real improvement of performance. The 
UIMA version is 2.2.2

Looking forward to some hint or directions.

Peter

-- 
Peter Klügl
pkluegl@uni-wuerzburg.de

Re: CASImpl.createFilteredIterator performance / Eclipse source plugin

Posted by Peter Klügl <pk...@ki.informatik.uni-wuerzburg.de>.

Hi Thilo,

thank you for your answer.

I normally have a lot of different types. This method call is used on a 
disjunct partition of the artifact using about 15 different types. 
However, the TypeConstraint points to a single parent type.

I will try the approach using a subiterator once again. Maybe I missed 
something. And, of course, I use the 2.3 version of UIMA,  but I think I 
can wait for the official release.

I'll let you know if there are news...

Peter


Thilo Goetz schrieb:
> Hi Peter,
>
> On 1/27/2010 13:37, Peter Klügl wrote:
>   
>> Hello,
>>
>> I have some performance issues with a current application. The profiler
>> tells me that over 80% of the execution time was spent on the about 200
>> calls of the method CASImpl.createFilteredIterator(). These 80% are
>> sometimes more than 1000s for one AE.process() and there is a lot more
>> moving on the index going on within those 20%.
>>
>> I can't investigate the cause for this performace hot spot any further,
>> also because I am missing the source plugins for UIMA runtime plugin.
>> The application is running within Eclipse. My first question: Is there
>> an easy way to get/create a source plugin for the UIMA core/runtime? At
>> best without using maven? Any best practices for profiling UIMA in Eclipse?
>>
>> My second question: Is that a normal behavior or can anyone give me a
>> hint how I could increase the performance?
>>
>> Some exemplary information about the usage of the method:
>> The CAS contains about 40 pages of plain text with about 50 lines per
>> page. Part of the text (maybe 3 pages) is annotated and for each line of
>> the segment the methods createFilteredIterator() is called with some
>> constraints about types and of course about the window of the iterator
>> (that is the line). I also tried the replace the filtered iterator with
>> a window constraint with a filtered iterator of a subiterator of the
>> annotation index resulting in no real improvement of performance. The
>> UIMA version is 2.2.2
>>
>> Looking forward to some hint or directions.
>>
>> Peter
>>
>>     
>
> I'm surprised that creating a filtered iterator from a
> subiterator did not improve performance.  A window
> constraint in a filtered iterator will generally yield
> bad performance because the filter is just that: a filter.
> There is no intelligence behind it.  What you see in
> createFilteredIterator() is the iterator being advanced
> to the first annotation that passes the filter.  This is
> done by simply starting at the beginning and looking at
> each annotation in turn.  A subiterator is smarter, it
> uses binary search to find its starting position and
> should be significantly faster.
>
> Do you have lots of different annotation types?  That's
> also a performance killer, but we've made some improvements
> here in 2.3.  You may wish to try the latest code from
> trunk and see if it gives you any improvements.  The code
> is stable, the release has been approved, it's only a
> matter of days until it's generally available.
>
> If your type organization allows it, you can also ask
> the CAS for a more specific index/iterator, as opposed
> to an iterator over all annotations.  Iterating over
> annotations of a leaf type is generally much faster.
> That's another thing you don't want to do in a filter
> if performance is critical.
>
> So basically, try to get the iterator that goes into
> createFilteredIterator() to be as small as possible
> to begin with.  Anything you can do not by filtering,
> but instead by starting with a smaller collection,
> should help.
>
> --Thilo
>   


-- 
Peter Klügl
pkluegl@uni-wuerzburg.de

Re: CASImpl.createFilteredIterator performance / Eclipse source plugin

Posted by Thilo Goetz <tw...@gmx.de>.

Hi Peter,

On 1/27/2010 13:37, Peter Klügl wrote:
> Hello,
> 
> I have some performance issues with a current application. The profiler
> tells me that over 80% of the execution time was spent on the about 200
> calls of the method CASImpl.createFilteredIterator(). These 80% are
> sometimes more than 1000s for one AE.process() and there is a lot more
> moving on the index going on within those 20%.
> 
> I can't investigate the cause for this performace hot spot any further,
> also because I am missing the source plugins for UIMA runtime plugin.
> The application is running within Eclipse. My first question: Is there
> an easy way to get/create a source plugin for the UIMA core/runtime? At
> best without using maven? Any best practices for profiling UIMA in Eclipse?
> 
> My second question: Is that a normal behavior or can anyone give me a
> hint how I could increase the performance?
> 
> Some exemplary information about the usage of the method:
> The CAS contains about 40 pages of plain text with about 50 lines per
> page. Part of the text (maybe 3 pages) is annotated and for each line of
> the segment the methods createFilteredIterator() is called with some
> constraints about types and of course about the window of the iterator
> (that is the line). I also tried the replace the filtered iterator with
> a window constraint with a filtered iterator of a subiterator of the
> annotation index resulting in no real improvement of performance. The
> UIMA version is 2.2.2
> 
> Looking forward to some hint or directions.
> 
> Peter
> 

I'm surprised that creating a filtered iterator from a
subiterator did not improve performance.  A window
constraint in a filtered iterator will generally yield
bad performance because the filter is just that: a filter.
There is no intelligence behind it.  What you see in
createFilteredIterator() is the iterator being advanced
to the first annotation that passes the filter.  This is
done by simply starting at the beginning and looking at
each annotation in turn.  A subiterator is smarter, it
uses binary search to find its starting position and
should be significantly faster.

Do you have lots of different annotation types?  That's
also a performance killer, but we've made some improvements
here in 2.3.  You may wish to try the latest code from
trunk and see if it gives you any improvements.  The code
is stable, the release has been approved, it's only a
matter of days until it's generally available.

If your type organization allows it, you can also ask
the CAS for a more specific index/iterator, as opposed
to an iterator over all annotations.  Iterating over
annotations of a leaf type is generally much faster.
That's another thing you don't want to do in a filter
if performance is critical.

So basically, try to get the iterator that goes into
createFilteredIterator() to be as small as possible
to begin with.  Anything you can do not by filtering,
but instead by starting with a smaller collection,
should help.

--Thilo