You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Budi Wibowo <he...@umich.edu> on 2014/11/16 08:06:10 UTC

Using Ctakes takes a long time to process text

Hello,
i'm using CTAKES for class project. 
I'm using CPE to process clinical text notes. 
I ran the software with -Xmx10g command. 
I have 16g in my machine. 

my problem is: 
running the CPE takes a long long time. 
I'm processing 175 clinical notes with a total of 40MB for all 
the notes. 
CPE has been running close to 4 hours now, 
and it's only been able to process 104 out of 
175. 

I'm using the "test1.xml" CPE descriptor and 
"AggregatePlaintextUMLSProcessor.xml" analysis 
engine(clinical-text-pipeline).

 it seems like java is only using 2 core out of the 16 I have. RAM usage hover between 5-8gb. 

is there anyway i can make the software run a bit faster?




RE: Using Ctakes takes a long time to process text

Posted by "Chen, Pei" <Pe...@childrens.harvard.edu>.
Budi,
You can also try out ctakes-clinical-pipeline/desc/analysis_engine/AggregatePlaintextFastUMLSProcessor.xml available in the current 3.2.1-rc.
It contains a new dictionary lookup algorithm from Sean that is roughly 1000% faster for each pipeline.

--Pei

From: Kim Ebert [mailto:kim.ebert@imatsolutions.com]
Sent: Monday, November 17, 2014 11:55 AM
To: dev@ctakes.apache.org
Subject: Re: Using Ctakes takes a long time to process text

Hi,

cTakes currently is single threaded. To increase throughput, we use a patch or two to run several pipelines at once. To Increase your performance, I would recommend splitting up the work into multiple batches.

To get cTakes to run multiple threads, you have to patch the LVG. I believe the patch is available in the bug tracker.

You also would need to run them as separate pipelines inside of UIMA. You can't just say increase the number of threads for this operation. While we aren't using static variables, state is maintained inside of the object. http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded

https://issues.apache.org/jira/browse/CTAKES-151

[IMAT Solutions]<http://imatsolutions.com>
Kim Ebert
Software Engineer
[Office:]801.669.7342
kim.ebert@imatsolutions.com<ma...@imatsolutions.com>

On 11/16/2014 12:06 AM, Budi Wibowo wrote:

Hello,

i'm using CTAKES for class project.

I'm using CPE to process clinical text notes.

I ran the software with -Xmx10g command.

I have 16g in my machine.



my problem is:

running the CPE takes a long long time.

I'm processing 175 clinical notes with a total of 40MB for all

the notes.

CPE has been running close to 4 hours now,

and it's only been able to process 104 out of

175.



I'm using the "test1.xml" CPE descriptor and

"AggregatePlaintextUMLSProcessor.xml" analysis

engine(clinical-text-pipeline).



 it seems like java is only using 2 core out of the 16 I have. RAM usage hover between 5-8gb.



is there anyway i can make the software run a bit faster?










Re: Using Ctakes takes a long time to process text

Posted by Kim Ebert <ki...@imatsolutions.com>.
Hi,

cTakes currently is single threaded. To increase throughput, we use a
patch or two to run several pipelines at once. To Increase your
performance, I would recommend splitting up the work into multiple batches.

To get cTakes to run multiple threads, you have to patch the LVG. I
believe the patch is available in the bug tracker.

You also would need to run them as separate pipelines inside of UIMA.
You can't just say increase the number of threads for this operation.
While we aren't using static variables, state is maintained inside of
the object.
http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html#ugr.tug.applications.multi_threaded

https://issues.apache.org/jira/browse/CTAKES-151

IMAT Solutions <http://imatsolutions.com>
Kim Ebert
Software Engineer
Office: 801.669.7342
kim.ebert@imatsolutions.com <ma...@imatsolutions.com>

On 11/16/2014 12:06 AM, Budi Wibowo wrote:
> Hello,
> i'm using CTAKES for class project. 
> I'm using CPE to process clinical text notes. 
> I ran the software with -Xmx10g command. 
> I have 16g in my machine. 
>
> my problem is: 
> running the CPE takes a long long time. 
> I'm processing 175 clinical notes with a total of 40MB for all 
> the notes. 
> CPE has been running close to 4 hours now, 
> and it's only been able to process 104 out of 
> 175. 
>
> I'm using the "test1.xml" CPE descriptor and 
> "AggregatePlaintextUMLSProcessor.xml" analysis 
> engine(clinical-text-pipeline).
>
>  it seems like java is only using 2 core out of the 16 I have. RAM usage hover between 5-8gb. 
>
> is there anyway i can make the software run a bit faster?
>
>
>
>