You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Kim Ebert <ki...@perfectsearchcorp.com> on 2013/02/21 19:09:11 UTC
cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
Hi All,
I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100 document
test corpus.
Timing how long it took, I found that cTAKES 2.5 took 1,490.397 seconds
while cTAKES 3.0 took 21,119.485 seconds. It seems like a major slowdown
in performance.
I used the following analysis engine for cTAKES 3.0:
desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
I used the following analysis engine for cTAKES 2.5:
cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
Any thoughts on why such a difference in performance?
Thanks,
--
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
Posted by Kim Ebert <ki...@perfectsearchcorp.com>.
I haven't been able to reproduce this. I wonder if this is a JVM issue.
I will let everyone know if I see the issue again.
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 02/22/2013 05:22 PM, Kim Ebert wrote:
> I re-ran ctakes 2.5 using the full LVG, and that didn't cause the
> performance to change for cTAKES 2.5.
>
> I did try removing the "hsqldb-1.8.0.10.jar" in the lib folder for the
> 3.0 release, and I found that my performance was better at 4,037
> seconds. I would like to re-run my cTAKES 3.0 with the
> "hsqldb-1.8.0.10.jar" to see if these results are consistent.
>
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
>
>
> On 02/21/2013 04:12 PM, Kim Ebert wrote:
>> I think this may have been user error on my part. I'll post a follow
>> up if it is something other than user error.
>>
>> Thanks,
>>
>> Kim Ebert
>> 1.801.669.7342
>> Perfect Search Corp
>> http://www.perfectsearchcorp.com/
>>
>>
>> On 02/21/2013 01:01 PM, Masanz, James J. wrote:
>>> I couldn't think of anything offhand that would account for that, so
>>> I looked at several AE descriptors including the assertion
>>> component, plus the LookupDesc_Db.xml, and didn't see anything
>>> obvious. To narrow down as Pei suggested, perhaps use the CVD to
>>> annotate one of the larger files and compare the performance reports
>>> generated.
>>>
>>> -- James
>>>
>>>> -----Original Message-----
>>>> From:
>>>> ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
>>>> [mailto:ctakes-dev-return-1266-
>>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
>>>> Sent: Thursday, February 21, 2013 1:13 PM
>>>> To:<ct...@incubator.apache.org>
>>>> Cc: ctakes-dev@incubator.apache.org
>>>> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>>>>
>>>> This is interesting. Just curious, were you able to narrow down which
>>>> component was slower? I know that 3.0 includes the full LVG while 2.5
>>>> has simple/test LVG by default. But 10x seems pretty extreme...
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
>>>> <ki...@perfectsearchcorp.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
>>>> document test corpus.
>>>>> Timing how long it took, I found that cTAKES 2.5 took 1,490.397
>>>> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a
>>>> major
>>>> slowdown in performance.
>>>>> I used the following analysis engine for cTAKES 3.0:
>>>>>
>>>>> desc/ctakes-clinical-
>>>> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>>> I used the following analysis engine for cTAKES 2.5:
>>>>>
>>>>> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>>>
>>>>>
>>>>> Any thoughts on why such a difference in performance?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> --
>>>>> Kim Ebert
>>>>> 1.801.669.7342
>>>>> Perfect Search Corp
>>>>> http://www.perfectsearchcorp.com/
>>>>>
Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
Posted by Kim Ebert <ki...@perfectsearchcorp.com>.
I re-ran ctakes 2.5 using the full LVG, and that didn't cause the
performance to change for cTAKES 2.5.
I did try removing the "hsqldb-1.8.0.10.jar" in the lib folder for the
3.0 release, and I found that my performance was better at 4,037
seconds. I would like to re-run my cTAKES 3.0 with the
"hsqldb-1.8.0.10.jar" to see if these results are consistent.
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 02/21/2013 04:12 PM, Kim Ebert wrote:
> I think this may have been user error on my part. I'll post a follow
> up if it is something other than user error.
>
> Thanks,
>
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
>
>
> On 02/21/2013 01:01 PM, Masanz, James J. wrote:
>> I couldn't think of anything offhand that would account for that, so
>> I looked at several AE descriptors including the assertion component,
>> plus the LookupDesc_Db.xml, and didn't see anything obvious. To
>> narrow down as Pei suggested, perhaps use the CVD to annotate one of
>> the larger files and compare the performance reports generated.
>>
>> -- James
>>
>>> -----Original Message-----
>>> From: ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
>>> [mailto:ctakes-dev-return-1266-
>>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
>>> Sent: Thursday, February 21, 2013 1:13 PM
>>> To:<ct...@incubator.apache.org>
>>> Cc: ctakes-dev@incubator.apache.org
>>> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>>>
>>> This is interesting. Just curious, were you able to narrow down which
>>> component was slower? I know that 3.0 includes the full LVG while 2.5
>>> has simple/test LVG by default. But 10x seems pretty extreme...
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
>>> <ki...@perfectsearchcorp.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
>>> document test corpus.
>>>> Timing how long it took, I found that cTAKES 2.5 took 1,490.397
>>> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major
>>> slowdown in performance.
>>>> I used the following analysis engine for cTAKES 3.0:
>>>>
>>>> desc/ctakes-clinical-
>>> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>> I used the following analysis engine for cTAKES 2.5:
>>>>
>>>> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>>
>>>> Any thoughts on why such a difference in performance?
>>>>
>>>> Thanks,
>>>>
>>>> --
>>>> Kim Ebert
>>>> 1.801.669.7342
>>>> Perfect Search Corp
>>>> http://www.perfectsearchcorp.com/
>>>>
Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
Posted by Kim Ebert <ki...@perfectsearchcorp.com>.
I think this may have been user error on my part. I'll post a follow up
if it is something other than user error.
Thanks,
Kim Ebert
1.801.669.7342
Perfect Search Corp
http://www.perfectsearchcorp.com/
On 02/21/2013 01:01 PM, Masanz, James J. wrote:
> I couldn't think of anything offhand that would account for that, so I looked at several AE descriptors including the assertion component, plus the LookupDesc_Db.xml, and didn't see anything obvious. To narrow down as Pei suggested, perhaps use the CVD to annotate one of the larger files and compare the performance reports generated.
>
> -- James
>
>> -----Original Message-----
>> From: ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
>> [mailto:ctakes-dev-return-1266-
>> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
>> Sent: Thursday, February 21, 2013 1:13 PM
>> To:<ct...@incubator.apache.org>
>> Cc: ctakes-dev@incubator.apache.org
>> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>>
>> This is interesting. Just curious, were you able to narrow down which
>> component was slower? I know that 3.0 includes the full LVG while 2.5
>> has simple/test LVG by default. But 10x seems pretty extreme...
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
>> <ki...@perfectsearchcorp.com> wrote:
>>
>>> Hi All,
>>>
>>> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
>> document test corpus.
>>> Timing how long it took, I found that cTAKES 2.5 took 1,490.397
>> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major
>> slowdown in performance.
>>> I used the following analysis engine for cTAKES 3.0:
>>>
>>> desc/ctakes-clinical-
>> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>> I used the following analysis engine for cTAKES 2.5:
>>>
>>> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>>>
>>> Any thoughts on why such a difference in performance?
>>>
>>> Thanks,
>>>
>>> --
>>> Kim Ebert
>>> 1.801.669.7342
>>> Perfect Search Corp
>>> http://www.perfectsearchcorp.com/
>>>
RE: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
Posted by "Masanz, James J." <Ma...@mayo.edu>.
I couldn't think of anything offhand that would account for that, so I looked at several AE descriptors including the assertion component, plus the LookupDesc_Db.xml, and didn't see anything obvious. To narrow down as Pei suggested, perhaps use the CVD to annotate one of the larger files and compare the performance reports generated.
-- James
> -----Original Message-----
> From: ctakes-dev-return-1266-Masanz.James=mayo.edu@incubator.apache.org
> [mailto:ctakes-dev-return-1266-
> Masanz.James=mayo.edu@incubator.apache.org] On Behalf Of Chen, Pei
> Sent: Thursday, February 21, 2013 1:13 PM
> To: <ct...@incubator.apache.org>
> Cc: ctakes-dev@incubator.apache.org
> Subject: Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
>
> This is interesting. Just curious, were you able to narrow down which
> component was slower? I know that 3.0 includes the full LVG while 2.5
> has simple/test LVG by default. But 10x seems pretty extreme...
>
> Sent from my iPhone
>
> On Feb 21, 2013, at 1:09 PM, "Kim Ebert"
> <ki...@perfectsearchcorp.com> wrote:
>
> > Hi All,
> >
> > I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100
> document test corpus.
> >
> > Timing how long it took, I found that cTAKES 2.5 took 1,490.397
> seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major
> slowdown in performance.
> >
> > I used the following analysis engine for cTAKES 3.0:
> >
> > desc/ctakes-clinical-
> pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
> >
> > I used the following analysis engine for cTAKES 2.5:
> >
> > cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
> >
> > Any thoughts on why such a difference in performance?
> >
> > Thanks,
> >
> > --
> > Kim Ebert
> > 1.801.669.7342
> > Perfect Search Corp
> > http://www.perfectsearchcorp.com/
> >
Re: cTAKES 3.0 appears to be 10x slower than cTAKES 2.5
Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
This is interesting. Just curious, were you able to narrow down which component was slower? I know that 3.0 includes the full LVG while 2.5 has simple/test LVG by default. But 10x seems pretty extreme...
Sent from my iPhone
On Feb 21, 2013, at 1:09 PM, "Kim Ebert" <ki...@perfectsearchcorp.com> wrote:
> Hi All,
>
> I am doing a comparison of cTAKES 2.5 and cTAKES 3.0 for a 100 document test corpus.
>
> Timing how long it took, I found that cTAKES 2.5 took 1,490.397 seconds while cTAKES 3.0 took 21,119.485 seconds. It seems like a major slowdown in performance.
>
> I used the following analysis engine for cTAKES 3.0:
>
> desc/ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>
> I used the following analysis engine for cTAKES 2.5:
>
> cTAKESdesc/cdpdesc/analysis_engine/AggregatePlaintextUMLSProcessor.xml
>
> Any thoughts on why such a difference in performance?
>
> Thanks,
>
> --
> Kim Ebert
> 1.801.669.7342
> Perfect Search Corp
> http://www.perfectsearchcorp.com/
>