You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@uima.apache.org by Philip Ogren <ph...@ogren.info> on 2010/06/16 22:23:25 UTC

performance of JCas.reset()

When I run the following loop it takes about 6 seconds on my 2GHz machine:

for(int i=0; i<10000; i++) {

jCas.reset();

}

Which comes out to a .6 milliseconds per call. This is pretty slow for 
cases in which you have many short documents. For example, this would 
add 10 minutes of processing time for 1M document corpus. Is this a 
known issue and is there anything that I can do to minimize this impact?

Thanks,

Philip


Re: performance of JCas.reset()

Posted by Chris Roeder <ch...@ucdenver.edu>.
Philip,

I've not spent a lot of time reading UIMA source, but some browsing revealed
interesting code (below).
 
Have you run jconsole on it to see the GC load you are putting on the JVM
and how it responds?

-Chris

private final void initHeap() {
    this.heap = new int[this.initialSize];

public static final int DEFAULT_SIZE = 500000; // 2 MB pages


Philip Ogren wrote:
> I did this using version 2.2.2.  The JCas is initialized with a very 
> simple type system consisting of a single 'Sentence' type which has no 
> features.  There are no additional user-defined indexes.
>
>
> On 6/17/2010 5:06 AM, Marshall Schor wrote:
>   
>> What release of UIMA are you testing with?
>>
>> Are there any UIMA index definitions in your test case instance?
>>
>> -Marshall
>>
>> On 6/16/2010 4:23 PM, Philip Ogren wrote:
>>    
>>     
>>> When I run the following loop it takes about 6 seconds on my 2GHz
>>> machine:
>>>
>>> for(int i=0; i<10000; i++) {
>>>
>>> jCas.reset();
>>>
>>> }
>>>
>>> Which comes out to a .6 milliseconds per call. This is pretty slow for
>>> cases in which you have many short documents. For example, this would
>>> add 10 minutes of processing time for 1M document corpus. Is this a
>>> known issue and is there anything that I can do to minimize this impact?
>>>
>>> Thanks,
>>>
>>> Philip
>>>
>>>      
>>>
>>>       
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.829 / Virus Database: 271.1.1/2942 - Release Date: 06/16/10 12:35:00
>>
>>    
>>     


Re: performance of JCas.reset()

Posted by Philip Ogren <ph...@ogren.info>.
I did this using version 2.2.2.  The JCas is initialized with a very 
simple type system consisting of a single 'Sentence' type which has no 
features.  There are no additional user-defined indexes.


On 6/17/2010 5:06 AM, Marshall Schor wrote:
> What release of UIMA are you testing with?
>
> Are there any UIMA index definitions in your test case instance?
>
> -Marshall
>
> On 6/16/2010 4:23 PM, Philip Ogren wrote:
>    
>> When I run the following loop it takes about 6 seconds on my 2GHz
>> machine:
>>
>> for(int i=0; i<10000; i++) {
>>
>> jCas.reset();
>>
>> }
>>
>> Which comes out to a .6 milliseconds per call. This is pretty slow for
>> cases in which you have many short documents. For example, this would
>> add 10 minutes of processing time for 1M document corpus. Is this a
>> known issue and is there anything that I can do to minimize this impact?
>>
>> Thanks,
>>
>> Philip
>>
>>      
> >
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.829 / Virus Database: 271.1.1/2942 - Release Date: 06/16/10 12:35:00
>
>    

Re: performance of JCas.reset()

Posted by Marshall Schor <ms...@schor.com>.
What release of UIMA are you testing with?

Are there any UIMA index definitions in your test case instance?

-Marshall

On 6/16/2010 4:23 PM, Philip Ogren wrote:
> When I run the following loop it takes about 6 seconds on my 2GHz
> machine:
>
> for(int i=0; i<10000; i++) {
>
> jCas.reset();
>
> }
>
> Which comes out to a .6 milliseconds per call. This is pretty slow for
> cases in which you have many short documents. For example, this would
> add 10 minutes of processing time for 1M document corpus. Is this a
> known issue and is there anything that I can do to minimize this impact?
>
> Thanks,
>
> Philip
>
>

Re: performance of JCas.reset()

Posted by Marshall Schor <ms...@schor.com>.

On 6/17/2010 10:16 PM, Eddie Epstein wrote:
> Using 2.3.0, and a CAS defined by the PersonTitleAnnotator, this code
> runs in 100ms on my laptop.
>   

Philip, 2.3.0 has some performance improvements to reset(); please see
if it helps in your test configuration.

-Marshall
> Eddie
>
> On Wed, Jun 16, 2010 at 4:23 PM, Philip Ogren <ph...@ogren.info> wrote:
>   
>> When I run the following loop it takes about 6 seconds on my 2GHz machine:
>>
>> for(int i=0; i<10000; i++) {
>>
>> jCas.reset();
>>
>> }
>>
>> Which comes out to a .6 milliseconds per call. This is pretty slow for cases
>> in which you have many short documents. For example, this would add 10
>> minutes of processing time for 1M document corpus. Is this a known issue and
>> is there anything that I can do to minimize this impact?
>>
>> Thanks,
>>
>> Philip
>>
>>
>>     
>
>   

Re: performance of JCas.reset()

Posted by Philip Ogren <ph...@ogren.info>.
I upgraded to 2.3.0 and reran my script with no other changes.  It ran 
in 19ms!  Nice!



On 6/17/2010 8:16 PM, Eddie Epstein wrote:
> Using 2.3.0, and a CAS defined by the PersonTitleAnnotator, this code
> runs in 100ms on my laptop.
>
> Eddie
>
> On Wed, Jun 16, 2010 at 4:23 PM, Philip Ogren<ph...@ogren.info>  wrote:
>    
>> When I run the following loop it takes about 6 seconds on my 2GHz machine:
>>
>> for(int i=0; i<10000; i++) {
>>
>> jCas.reset();
>>
>> }
>>
>> Which comes out to a .6 milliseconds per call. This is pretty slow for cases
>> in which you have many short documents. For example, this would add 10
>> minutes of processing time for 1M document corpus. Is this a known issue and
>> is there anything that I can do to minimize this impact?
>>
>> Thanks,
>>
>> Philip
>>
>>      
> >
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.829 / Virus Database: 271.1.1/2945 - Release Date: 06/17/10 12:35:00
>
>    

Re: performance of JCas.reset()

Posted by Eddie Epstein <ea...@gmail.com>.
Using 2.3.0, and a CAS defined by the PersonTitleAnnotator, this code
runs in 100ms on my laptop.

Eddie

On Wed, Jun 16, 2010 at 4:23 PM, Philip Ogren <ph...@ogren.info> wrote:
> When I run the following loop it takes about 6 seconds on my 2GHz machine:
>
> for(int i=0; i<10000; i++) {
>
> jCas.reset();
>
> }
>
> Which comes out to a .6 milliseconds per call. This is pretty slow for cases
> in which you have many short documents. For example, this would add 10
> minutes of processing time for 1M document corpus. Is this a known issue and
> is there anything that I can do to minimize this impact?
>
> Thanks,
>
> Philip
>
>