You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Milinovich, Alex" <mi...@ccf.org> on 2023/04/13 16:04:50 UTC

RE: [EXT] Re: cTAKES running slower with each run

I am running a customized version of 3.2.2 so that’s not the best for troubleshooting.  But running jvisualvm and the only thing I’m noticing is the metaspace size increasing a small amount when a large document is run through ctakes, but its only increasing about 100KB over the course of an hour.  The heap size peaks and valleys for each document run is fairly consistent.  The total loaded classes goes from 8828 to 8833 over the same period.

From: Peter Abramowitsch [mailto:pabramowitsch@gmail.com]
Sent: Wednesday, April 12, 2023 4:22 PM
To: dev@ctakes.apache.org
Subject: [EXT] Re: cTAKES running slower with each run

CAUTION CYBER RISK: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender, expected to receive this content and trust that it’s safe. If you determine that the email isn’t from a trusted source, you can delete the email, submit it via the BlueFish button in Outlook for investigation or forward the email as an attachment to phishtanktriage@ccf.org<ma...@ccf.org> if you don’t have the Bluefish button or are on a mobile device.

There are many ways to package ctakes, and admittedly ours is unlike the
console app and we have our own multithreaded API, but we regularly do
millions of documents at a time and haven't seen this issue. The core
application with our fairly standard pipeline is up for a month at a time
with no degradation

Are you using any unusual or deprecated annotators? It could be that one
of the less used ones doesn't separate initialization properly from it's
processing method and is caching something that it shouldn't.. Are you
seeing a concomitant growth in memory footprint?

Try running it under jvisualvm it may give you a clue.

Peter

On Wed, Apr 12, 2023 at 7:41 PM Milinovich, Alex <mi...@ccf.org>> wrote:

> I’m running ctakes both as an API and as a console app. Each time I hit
> ctakes, the run time per document is getting incrementally slower by a few
> thousands of a millisecond per xml element (to normalize for different
> document sizes) than the previous document. Compound this over 1000
> documents in 20 minutes and the runs are going from 0.06 milliseconds per
> xml element to 1.5 milliseconds per xml element. It’s a very consistent
> 0.002 millisecond increase in the rate for each subsequent document I throw
> at cTAKES.
>
>
>
> Is there any caching or garbage collection or something I should be on the
> lookout to adjust or fix?
>
>
>
> Thanks
>
>
>
> ~Alex
>
>
>
>
> *Alex Milinovich*
>
> Director of Research – Data Science Analytics | Quantitative Health
> Sciences
> 9500 Euclid Ave. – JJN3 | Cleveland, OH 44195 | m: (216) 245-7655
>
>
>
>
>
> Please consider the environment before printing this e-mail
> Cleveland Clinic is currently ranked as one of the nation’s top hospitals
> by *U.S. News & World Report* (2022-2023). Visit us online at
> http://www.clevelandclinic.org<http://www.clevelandclinic.org> for a complete listing of our services,
> staff and locations. Confidentiality Note: This message is intended for use
> only by the individual or entity to which it is addressed and may contain
> information that is privileged, confidential, and exempt from disclosure
> under applicable law. If the reader of this message is not the intended
> recipient or the employee or agent responsible for delivering the message
> to the intended recipient, you are hereby notified that any dissemination,
> distribution or copying of this communication is strictly prohibited. If
> you have received this communication in error, please contact the sender
> immediately and destroy the material in its entirety, whether electronic or
> hard copy. Thank you.
>


Please consider the environment before printing this e-mail

Cleveland Clinic is currently ranked as the No. 2 hospital in the country by U.S. News & World Report (2017-2018). Visit us online at http://www.clevelandclinic.org for a complete listing of our services, staff and locations. Confidentiality Note: This message is intended for use only by the individual or entity to which it is addressed and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and destroy the material in its entirety, whether electronic or hard copy. Thank you.