You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Kim Ebert <ki...@perfectsearchcorp.com> on 2013/02/21 19:09:11 UTC

cTAKES 3.0 appears to be 10x slower than cTAKES 2.5

Hi All,

I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100 document 
test corpus.

Timing how long it took, I found that cTAKES 2.5 took 1,490.397 seconds 
while cTAKES 3.0 took 21,119.485 seconds. It seems like a major slowdown 
in performance.

I used the following analysis engine for cTAKES 3.0:

desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml 


I used the following analysis engine for cTAKES 2.5:

cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml

Any thoughts on why such a difference in performance?

Thanks,

-- 
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/


Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5

Posted by Kim Ebert <ki...@perfectsearchcorp.com>.
I haven't been able to reproduce this. I wonder if this is a JVM issue. 
I will let everyone know if I see the issue again.

Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/


On 02/22/2013 05:22 PM, Kim Ebert wrote:
> I re-ran ctakes 2.5 using the full LVG, and that didn't cause the 
> performance to change for cTAKES 2.5.
>
> I did try removing the "hsqldb-1.8.0.10.jar" in the lib folder for the 
> 3.0 release, and I found that my performance was better at 4,037 
> seconds. I would like to re-run my cTAKES 3.0 with the 
> "hsqldb-1.8.0.10.jar" to see if these results are consistent.
>
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
>
>
> On 02/21/2013 04:12 PM, Kim Ebert wrote:
>> I think this may have been user error on my part. I'll post a follow 
>> up if it is something other than user error.
>>
>> Thanks,
>>
>> Kim Ebert
>> 1.801.669.7342
>> Perfect Search Corp
>> http://www.perfectsearchcorp.com/
>>
>>
>> On 02/21/2013 01:01 PM, Masanz, James J. wrote:
>>> I couldn't think of anything offhand that would account for that, so 
>>> I looked at several AE descriptors including the assertion 
>>> component, plus the LookupDesc_Db.xml, and didn't see anything 
>>> obvious. To narrow down as Pei suggested, perhaps use the CVD to 
>>> annotate one of the larger files and compare the performance reports 
>>> generated.
>>>
>>> -- James
>>>
>>>> -----Original Message-----
>>>> From: 
>>>> ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
>>>> [mailto:ctakes-dev-return-1266-
>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
>>>> Sent: Thursday, February 21, 2013 1:13 PM
>>>> To:<ct...@incubator.apache.org>
>>>> Cc: ctakes-dev@incubator.apache.org
>>>> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>>>>
>>>> This is interesting. Just curious, were you able to narrow down which
>>>> component was slower?  I know that 3.0 includes the full LVG while 2.5
>>>> has simple/test LVG by default. But 10x seems pretty extreme...
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
>>>> <ki...@perfectsearchcorp.com>  wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
>>>> document test corpus.
>>>>> Timing how long it took, I found that cTAKES 2.5 took 1,490.397
>>>> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a 
>>>> major
>>>> slowdown in performance.
>>>>> I used the following analysis engine for cTAKES 3.0:
>>>>>
>>>>> desc/ctakes-clinical-
>>>> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>>> I used the following analysis engine for cTAKES 2.5:
>>>>>
>>>>> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml 
>>>>>
>>>>>
>>>>> Any thoughts on why such a difference in performance?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -- 
>>>>> Kim Ebert
>>>>> 1.801.669.7342
>>>>> Perfect Search Corp
>>>>> http://www.perfectsearchcorp.com/
>>>>>

Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5

Posted by Kim Ebert <ki...@perfectsearchcorp.com>.
I re-ran ctakes 2.5 using the full LVG, and that didn't cause the 
performance to change for cTAKES 2.5.

I did try removing the "hsqldb-1.8.0.10.jar" in the lib folder for the 
3.0 release, and I found that my performance was better at 4,037 
seconds. I would like to re-run my cTAKES 3.0 with the 
"hsqldb-1.8.0.10.jar" to see if these results are consistent.

Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/


On 02/21/2013 04:12 PM, Kim Ebert wrote:
> I think this may have been user error on my part. I'll post a follow 
> up if it is something other than user error.
>
> Thanks,
>
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
>
>
> On 02/21/2013 01:01 PM, Masanz, James J. wrote:
>> I couldn't think of anything offhand that would account for that, so 
>> I looked at several AE descriptors including the assertion component, 
>> plus the LookupDesc_Db.xml, and didn't see anything obvious. To 
>> narrow down as Pei suggested, perhaps use the CVD to annotate one of 
>> the larger files and compare the performance reports generated.
>>
>> -- James
>>
>>> -----Original Message-----
>>> From: ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
>>> [mailto:ctakes-dev-return-1266-
>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
>>> Sent: Thursday, February 21, 2013 1:13 PM
>>> To:<ct...@incubator.apache.org>
>>> Cc: ctakes-dev@incubator.apache.org
>>> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>>>
>>> This is interesting. Just curious, were you able to narrow down which
>>> component was slower?  I know that 3.0 includes the full LVG while 2.5
>>> has simple/test LVG by default. But 10x seems pretty extreme...
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
>>> <ki...@perfectsearchcorp.com>  wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
>>> document test corpus.
>>>> Timing how long it took, I found that cTAKES 2.5 took 1,490.397
>>> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major
>>> slowdown in performance.
>>>> I used the following analysis engine for cTAKES 3.0:
>>>>
>>>> desc/ctakes-clinical-
>>> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>> I used the following analysis engine for cTAKES 2.5:
>>>>
>>>> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>>
>>>> Any thoughts on why such a difference in performance?
>>>>
>>>> Thanks,
>>>>
>>>> -- 
>>>> Kim Ebert
>>>> 1.801.669.7342
>>>> Perfect Search Corp
>>>> http://www.perfectsearchcorp.com/
>>>>

Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5

Posted by Kim Ebert <ki...@perfectsearchcorp.com>.
I think this may have been user error on my part. I'll post a follow up 
if it is something other than user error.

Thanks,

Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/


On 02/21/2013 01:01 PM, Masanz, James J. wrote:
> I couldn't think of anything offhand that would account for that, so I looked at several AE descriptors including the assertion component, plus the LookupDesc_Db.xml, and didn't see anything obvious. To narrow down as Pei suggested, perhaps use the CVD to annotate one of the larger files and compare the performance reports generated.
>
> -- James
>
>> -----Original Message-----
>> From: ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:ctakes-dev-return-1266-
>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
>> Sent: Thursday, February 21, 2013 1:13 PM
>> To:<ct...@incubator.apache.org>
>> Cc: ctakes-dev@incubator.apache.org
>> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>>
>> This is interesting. Just curious, were you able to narrow down which
>> component was slower?  I know that 3.0 includes the full LVG while 2.5
>> has simple/test LVG by default. But 10x seems pretty extreme...
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
>> <ki...@perfectsearchcorp.com>  wrote:
>>
>>> Hi All,
>>>
>>> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
>> document test corpus.
>>> Timing how long it took, I found that cTAKES 2.5 took 1,490.397
>> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major
>> slowdown in performance.
>>> I used the following analysis engine for cTAKES 3.0:
>>>
>>> desc/ctakes-clinical-
>> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>> I used the following analysis engine for cTAKES 2.5:
>>>
>>> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>
>>> Any thoughts on why such a difference in performance?
>>>
>>> Thanks,
>>>
>>> --
>>> Kim Ebert
>>> 1.801.669.7342
>>> Perfect Search Corp
>>> http://www.perfectsearchcorp.com/
>>>

RE: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5

Posted by "Masanz, James J." <Ma...@mayo.edu>.
I couldn't think of anything offhand that would account for that, so I looked at several AE descriptors including the assertion component, plus the LookupDesc_Db.xml, and didn't see anything obvious. To narrow down as Pei suggested, perhaps use the CVD to annotate one of the larger files and compare the performance reports generated.

-- James

> -----Original Message-----
> From: ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1266-
> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
> Sent: Thursday, February 21, 2013 1:13 PM
> To: <ct...@incubator.apache.org>
> Cc: ctakes-dev@incubator.apache.org
> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
> 
> This is interesting. Just curious, were you able to narrow down which
> component was slower?  I know that 3.0 includes the full LVG while 2.5
> has simple/test LVG by default. But 10x seems pretty extreme...
> 
> Sent from my iPhone
> 
> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
> <ki...@perfectsearchcorp.com> wrote:
> 
> > Hi All,
> >
> > I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
> document test corpus.
> >
> > Timing how long it took, I found that cTAKES 2.5 took 1,490.397
> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major
> slowdown in performance.
> >
> > I used the following analysis engine for cTAKES 3.0:
> >
> > desc/ctakes-clinical-
> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
> >
> > I used the following analysis engine for cTAKES 2.5:
> >
> > cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
> >
> > Any thoughts on why such a difference in performance?
> >
> > Thanks,
> >
> > --
> > Kim Ebert
> > 1.801.669.7342
> > Perfect Search Corp
> > http://www.perfectsearchcorp.com/
> >

Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
This is interesting. Just curious, were you able to narrow down which component was slower?  I know that 3.0 includes the full LVG while 2.5 has simple/test LVG by default. But 10x seems pretty extreme...

Sent from my iPhone

On Feb 21, 2013, at 1:09 PM, "Kim Ebert" <ki...@perfectsearchcorp.com> wrote:

> Hi All,
> 
> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100 document test corpus.
> 
> Timing how long it took, I found that cTAKES 2.5 took 1,490.397 seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major slowdown in performance.
> 
> I used the following analysis engine for cTAKES 3.0:
> 
> desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml 
> 
> I used the following analysis engine for cTAKES 2.5:
> 
> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
> 
> Any thoughts on why such a difference in performance?
> 
> Thanks,
> 
> -- 
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
>