You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2017/06/05 15:42:30 UTC

RE: negation/uncertainty: pipeline runs very slowly

Hi Dima,

It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.

The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
  @Override
  public void process(JCas jCas) throws AnalysisEngineProcessException
  {
    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);

So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...

My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.

At any rate, I have not seen this problem in other pipelines.

Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddligach@luc.edu] 
Sent: Wednesday, May 24, 2017 10:34 AM
To: cTAKES Developer list
Subject: negation/uncertainty: pipeline runs very slowly

Dear cTAKES developers, 

I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:

aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); aggregateBuilder.add( UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );

the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.

It seems to get stuck often right after it outputs this warning:
WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation

If I remove the two AEs, this warning disappears.

The full pipeline is here:
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E&e= 

Any clues?

Thank you very much,

Dima

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Mullane, Sean *HS" <SP...@hscmail.mcc.virginia.edu>.

I just wanted to chime in that I'm also very interested in having faster assertion annotations. We're currently seeing 30s/doc/thread required for annotation using the default pipeline with DB read/write. I would like to get that down below 10s if possible. I would be willing to help test a solution.

Sean

-----Original Message-----
From: Miller, Timothy [mailto:Timothy.Miller@childrens.harvard.edu] 
Sent: Friday, June 23, 2017 7:31 AM
To: dev@ctakes.apache.org
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Something I just thought of is that if you are using the new (beta) sentence detector trained on Mimic, it is a bit of a "lumper" rather than a "splitter," meaning it is more likely to miss a sentence break and make longer sentences, sometimes absurdly long if there are no clear cues. I know that will slow down the constituency parser and dependency parser, but not sure why it would only slow down when negation processing is added. So, not a solution but something to keep in mind while debugging, especially if it interacts with Steve and Sean's feedback.
Tim

________________________________________
From: Dligach, Dmitriy <dd...@luc.edu>
Sent: Wednesday, June 21, 2017 9:18 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Sean, thanks for your comments. You are right. The slowdown doesn't have anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven't seen this in other pipelines is that you've probably been processing relatively small files.

I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.

Dima

> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
>
> Hi Dima,
>
> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
>
> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException  
> {
>    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>
> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
>
> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
>
> At any rate, I have not seen this problem in other pipelines.
>
> Sean
>
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
>
> Dear cTAKES developers,
>
> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
>
> aggregateBuilder.add( 
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
> aggregateBuilder.add( 
> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
>
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>
> If I remove the two AEs, this warning disappears.
>
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitri
> ydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pip
> elines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14
> JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9
> lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWpl
> biPO2RYdGSI9E&e=
>
> Any clues?
>
> Thank you very much,
>
> Dima
>
>
>

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hey Tim,

I have recently been testing with the "smoker" notes in ctakes-examples-res, and using your new sentence detector (Lumpy) has definitely been the way to go for those notes.  They have the random cr/lf within sentences.  It is great that we have some example notes in ctakes that can show off your work.

Cheers,
Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddligach@luc.edu] 
Sent: Friday, June 30, 2017 11:03 AM
To: dev@ctakes.apache.org
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Hi Tim,

Good point, but I happen to be using the ctakes-core sentence detector.

Dima



> On Jun 23, 2017, at 06:31, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> 
> Something I just thought of is that if you are using the new (beta) sentence detector trained on Mimic, it is a bit of a "lumper" rather than a "splitter," meaning it is more likely to miss a sentence break and make longer sentences, sometimes absurdly long if there are no clear cues. I know that will slow down the constituency parser and dependency parser, but not sure why it would only slow down when negation processing is added. So, not a solution but something to keep in mind while debugging, especially if it interacts with Steve and Sean's feedback.
> Tim
> 
> 
> ________________________________________
> From: Dligach, Dmitriy <dd...@luc.edu>
> Sent: Wednesday, June 21, 2017 9:18 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy
> Subject: Re: negation/uncertainty: pipeline runs very slowly 
> [EXTERNAL]
> 
> Sean, thanks for your comments. You are right. The slowdown doesn't have anything to do with documentID.
> 
> I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven't seen this in other pipelines is that you've probably been processing relatively small files.
> 
> I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.
> 
> 
> Dima
> 
> 
> 
>> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
>> 
>> Hi Dima,
>> 
>> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
>> 
>> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>> @Override
>> public void process(JCas jCas) throws AnalysisEngineProcessException 
>> {
>>   String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>> 
>> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
>> 
>> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
>> 
>> At any rate, I have not seen this problem in other pipelines.
>> 
>> Sean
>> 
>> -----Original Message-----
>> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
>> Sent: Wednesday, May 24, 2017 10:34 AM
>> To: cTAKES Developer list
>> Subject: negation/uncertainty: pipeline runs very slowly
>> 
>> Dear cTAKES developers,
>> 
>> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
>> 
>> aggregateBuilder.add( 
>> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
>> aggregateBuilder.add( 
>> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>> 
>> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
>> 
>> It seems to get stuck often right after it outputs this warning:
>> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>> 
>> If I remove the two AEs, this warning disappears.
>> 
>> The full pipeline is here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitr
>> iydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_p
>> ipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwE
>> W14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQ
>> RgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7
>> PVWplbiPO2RYdGSI9E&e=
>> 
>> Any clues?
>> 
>> Thank you very much,
>> 
>> Dima
>> 
>> 
>> 
>

Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Dligach, Dmitriy" <dd...@luc.edu>.

Hi Tim,

Good point, but I happen to be using the ctakes-core sentence detector.

Dima



> On Jun 23, 2017, at 06:31, Miller, Timothy <Ti...@childrens.harvard.edu> wrote:
> 
> Something I just thought of is that if you are using the new (beta) sentence detector trained on Mimic, it is a bit of a "lumper" rather than a "splitter," meaning it is more likely to miss a sentence break and make longer sentences, sometimes absurdly long if there are no clear cues. I know that will slow down the constituency parser and dependency parser, but not sure why it would only slow down when negation processing is added. So, not a solution but something to keep in mind while debugging, especially if it interacts with Steve and Sean's feedback.
> Tim
> 
> 
> ________________________________________
> From: Dligach, Dmitriy <dd...@luc.edu>
> Sent: Wednesday, June 21, 2017 9:18 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy
> Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]
> 
> Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.
> 
> I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files.
> 
> I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.
> 
> 
> Dima
> 
> 
> 
>> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
>> 
>> Hi Dima,
>> 
>> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
>> 
>> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>> @Override
>> public void process(JCas jCas) throws AnalysisEngineProcessException
>> {
>>   String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>> 
>> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
>> 
>> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
>> 
>> At any rate, I have not seen this problem in other pipelines.
>> 
>> Sean
>> 
>> -----Original Message-----
>> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
>> Sent: Wednesday, May 24, 2017 10:34 AM
>> To: cTAKES Developer list
>> Subject: negation/uncertainty: pipeline runs very slowly
>> 
>> Dear cTAKES developers,
>> 
>> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
>> 
>> aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); aggregateBuilder.add( UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>> 
>> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
>> 
>> It seems to get stuck often right after it outputs this warning:
>> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>> 
>> If I remove the two AEs, this warning disappears.
>> 
>> The full pipeline is here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E&e=
>> 
>> Any clues?
>> 
>> Thank you very much,
>> 
>> Dima
>> 
>> 
>> 
>

Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.

Something I just thought of is that if you are using the new (beta) sentence detector trained on Mimic, it is a bit of a "lumper" rather than a "splitter," meaning it is more likely to miss a sentence break and make longer sentences, sometimes absurdly long if there are no clear cues. I know that will slow down the constituency parser and dependency parser, but not sure why it would only slow down when negation processing is added. So, not a solution but something to keep in mind while debugging, especially if it interacts with Steve and Sean's feedback.
Tim


________________________________________
From: Dligach, Dmitriy <dd...@luc.edu>
Sent: Wednesday, June 21, 2017 9:18 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files.

I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
>
> Hi Dima,
>
> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
>
> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException
>  {
>    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>
> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
>
> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
>
> At any rate, I have not seen this problem in other pipelines.
>
> Sean
>
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
>
> Dear cTAKES developers,
>
> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
>
> aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); aggregateBuilder.add( UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
>
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>
> If I remove the two AEs, this warning disappears.
>
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E&e=
>
> Any clues?
>
> Thank you very much,
>
> Dima
>
>
>

Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Dligach, Dmitriy" <dd...@luc.edu>.

Hi Sean,

First of all, thank you Sean, Steve, and Tim for giving this a thought. I definitely agree that the problem lies in this line:

List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class, entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

The negation AE runs fine on shorter documents but as soon as I try to run it on large documents, which have LOTS of sentences, it becomes extremely slow.

I am sorry I haven’t been able to try the proposed solutions. I may have a little time after the long weekend. 

In the meantime, I created a JIRA issue: 
https://issues.apache.org/jira/browse/CTAKES-449

Thanks again for your help.

Dima



> On Jun 30, 2017, at 07:26, Finan, Sean <Se...@childrens.harvard.edu> wrote:
> 
> Hi Dima,
> Have you had a chance to play with the proposed solutions?  If not then let us know and somebody will eventually get to it.
> Meanwhile, would you mind submitting a tar on jira?
> Thanks,
> Sean
> 
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu] 
> Sent: Wednesday, June 21, 2017 3:18 PM
> To: dev@ctakes.apache.org
> Cc: Miller, Timothy
> Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]
> 
> Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.
> 
> I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files. 
> 
> I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.
> 
> 
> Dima
> 
> 
> 
>> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
>> 
>> Hi Dima,
>> 
>> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
>> 
>> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>> @Override
>> public void process(JCas jCas) throws AnalysisEngineProcessException  
>> {
>>   String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
>> 
>> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
>> 
>> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
>> 
>> At any rate, I have not seen this problem in other pipelines.
>> 
>> Sean
>> 
>> -----Original Message-----
>> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
>> Sent: Wednesday, May 24, 2017 10:34 AM
>> To: cTAKES Developer list
>> Subject: negation/uncertainty: pipeline runs very slowly
>> 
>> Dear cTAKES developers,
>> 
>> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
>> 
>> aggregateBuilder.add( 
>> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
>> aggregateBuilder.add( 
>> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
>> 
>> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
>> 
>> It seems to get stuck often right after it outputs this warning:
>> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
>> 
>> If I remove the two AEs, this warning disappears.
>> 
>> The full pipeline is here:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitri
>> ydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pip
>> elines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14
>> JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9
>> lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWpl
>> biPO2RYdGSI9E&e=
>> 
>> Any clues?
>> 
>> Thank you very much,
>> 
>> Dima
>> 
>> 
>> 
>

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Dima,
Have you had a chance to play with the proposed solutions?  If not then let us know and somebody will eventually get to it.
Meanwhile, would you mind submitting a tar on jira?
Thanks,
Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddligach@luc.edu] 
Sent: Wednesday, June 21, 2017 3:18 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files. 

I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
> 
> Hi Dima,
> 
> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
> 
> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException  
> {
>    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
> 
> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
> 
> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
> 
> At any rate, I have not seen this problem in other pipelines.
> 
> Sean
> 
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
> 
> Dear cTAKES developers,
> 
> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
> 
> aggregateBuilder.add( 
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
> aggregateBuilder.add( 
> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
> 
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
> 
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
> 
> If I remove the two AEs, this warning disappears.
> 
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitri
> ydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pip
> elines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14
> JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9
> lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWpl
> biPO2RYdGSI9E&e=
> 
> Any clues?
> 
> Thank you very much,
> 
> Dima
> 
> 
>

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL] [SUSPICIOUS]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Oops, I misquoted.
"Make it work, make it right, make it fast".

http://wiki.c2.com/?MakeItWorkMakeItRightMakeItFast

Sean

-----Original Message-----
From: Finan, Sean [mailto:Sean.Finan@childrens.harvard.edu] 
Sent: Thursday, June 22, 2017 12:04 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL] [SUSPICIOUS]

Hi Dima,

There is possibly one major cause for things slowing down on large documents:

There is some suboptimal looping in AssertioncCleartkAnalysisEngine.  For instance:
    Collection<IdentifiedAnnotation> entities = JCasUtil.select(identifiedAnnotationView, IdentifiedAnnotation.class);
    for (IdentifiedAnnotation identifiedAnnotation : entities)    {
		...
	      List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class, entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

Where entityOrEventMention is an (unnecessary btw) rename of identifiedAnnotation.  This gets all of the ia, then for each one gets the sentence.  Because of the way casutil iterates through the fslist, starting at the beginning each time, it would be slightly faster to get the sentences first and iterate over them.  It would be much faster to put identifiedAnnotations in a list and then mark a "previous index" while iterating through sentences, instead of repeated calls to jcasutil.  Something like:
prevIndex = 0;
foreach sentence {
   for ( i=prevIndex; i<aiList.size(); i++ ) {
      ai = aiList.get( i );
     while ( ai.getEnd() <= sentence.getEnd() ) {
        if ( ai.getBegin() >= sentence.getBegin() ) {
          sentenceAis.add( ai );
       }
     }
     prevIndex = i;
     break;
   }
   ----  handle sentence using sentenceAis (within sentence bounds) ---
   sentenceAis.clear();
}

I haven't run the code above, but I think it should speed up the process.  The same can be done for AssertionCuePhraseAnnotation.  Every selectCovered(..) call to jcasutil starts iterating at the beginning of the fslist.  If the fslist is long (eg big doc) then this will be slow.
The same thing goes for getting the BaseTokens between cue and ai using selectBetween(..).  Though there may be none, the entire fslist is iterated until ai.getEnd() is reached.

So, replace the repeated jcasutil.select*(..) calls with self-aware list iteration.
AssertionCleartkAnalysisEngine:
356 : IdentifiedAnnotation
399 : Sentence
422 : AssertionCuePhraseAnnotation
426 : BaseToken

I should note that there are probably many other places in ctakes where we could make similar improvements and thank you for bringing it to my attention - this class just happens to be a notable example but I know for certain that I have relied upon repeated jcasutil.select*(..) calls for coding expediency.  "Make it work, then make it better."  Sometimes we don't get a chance to hit that second act.

If you get some time, please make the single change with ai only and see if there is a speed improvement.  If there is then we know that we are on the right track.  If so, or if you find other improvements, can you check in the changes?  Otherwise just let me know where you have success and maybe I can build on it.

Thanks,
Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
Sent: Wednesday, June 21, 2017 3:18 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files. 

I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
> 
> Hi Dima,
> 
> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
> 
> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException 
> {
>    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
> 
> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
> 
> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
> 
> At any rate, I have not seen this problem in other pipelines.
> 
> Sean
> 
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
> 
> Dear cTAKES developers,
> 
> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
> 
> aggregateBuilder.add(
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
> aggregateBuilder.add(
> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
> 
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
> 
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
> 
> If I remove the two AEs, this warning disappears.
> 
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitri
> ydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pip
> elines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14
> JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9
> lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWpl
> biPO2RYdGSI9E&e=
> 
> Any clues?
> 
> Thank you very much,
> 
> Dima
> 
> 
>

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Steve,

Thanks for the link to the Javadoc!  At a glance it looks like indexCovering(..) would be better than a ContainmentIndex for the assertion purposes since a reverse map isn't required.  It is excellent that uimafit has that class - I can probably make use of it in the future.

Dima, you now have a couple of options!  I don't think that anybody is parsing a bio 101 textbook so the indexCovering(..) maps memory shouldn't be a problem.

Cheers,
Sean

-----Original Message-----
From: Steven Bethard [mailto:steven.bethard@gmail.com] 
Sent: Thursday, June 22, 2017 12:32 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

On Thu, Jun 22, 2017 at 9:04 AM Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> There is some suboptimal looping in AssertioncCleartkAnalysisEngine.  
> For
> instance:
>     Collection<IdentifiedAnnotation> entities = 
> JCasUtil.select(identifiedAnnotationView, IdentifiedAnnotation.class);
>     for (IdentifiedAnnotation identifiedAnnotation : entities)    {
>                 ...
>               List<Sentence> sents = new 
> ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class, 
> entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

This should definitely never be done. According to the UimaFIT documentation for selectCovering:

Note: this is REALLY SLOW! You don't want to use this. Instead, consider using indexCovering(JCas, Class, Class) or a ContainmentIndex.
https://urldefense.proofpoint.com/v2/url?u=https-3A__uima.apache.org_d_uimafit-2Dcurrent_api_org_apache_uima_fit_util_JCasUtil.html-23selectCovering-2Djava.lang.Class-2Dorg.apache.uima.cas.text.AnnotationFS-2D&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=WZDzMoLjtvG7WlioYEMjz0VJhC1J1UASEY8uvm75D7k&s=n24YXYqPGamSmyLvGF7nTnoE7CIf_RL3TA44DQbqcYQ&e= 

Steve

Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by Steven Bethard <st...@gmail.com>.

On Thu, Jun 22, 2017 at 9:04 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> There is some suboptimal looping in AssertioncCleartkAnalysisEngine.  For
> instance:
>     Collection<IdentifiedAnnotation> entities =
> JCasUtil.select(identifiedAnnotationView, IdentifiedAnnotation.class);
>     for (IdentifiedAnnotation identifiedAnnotation : entities)    {
>                 ...
>               List<Sentence> sents = new
> ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class,
> entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

This should definitely never be done. According to the UimaFIT
documentation for selectCovering:

Note: this is REALLY SLOW! You don't want to use this. Instead, consider
using indexCovering(JCas, Class, Class) or a ContainmentIndex.
https://uima.apache.org/d/uimafit-current/api/org/apache/uima/fit/util/JCasUtil.html#selectCovering-java.lang.Class-org.apache.uima.cas.text.AnnotationFS-

Steve

RE: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Dima,

There is possibly one major cause for things slowing down on large documents:

There is some suboptimal looping in AssertioncCleartkAnalysisEngine.  For instance:
    Collection<IdentifiedAnnotation> entities = JCasUtil.select(identifiedAnnotationView, IdentifiedAnnotation.class);
    for (IdentifiedAnnotation identifiedAnnotation : entities)    {
		...
	      List<Sentence> sents = new ArrayList<>(JCasUtil.selectCovering(jCas, Sentence.class, entityOrEventMention.getBegin(), entityOrEventMention.getEnd()));

Where entityOrEventMention is an (unnecessary btw) rename of identifiedAnnotation.  This gets all of the ia, then for each one gets the sentence.  Because of the way casutil iterates through the fslist, starting at the beginning each time, it would be slightly faster to get the sentences first and iterate over them.  It would be much faster to put identifiedAnnotations in a list and then mark a "previous index" while iterating through sentences, instead of repeated calls to jcasutil.  Something like:
prevIndex = 0;
foreach sentence {
   for ( i=prevIndex; i<aiList.size(); i++ ) {
      ai = aiList.get( i );
     while ( ai.getEnd() <= sentence.getEnd() ) {
        if ( ai.getBegin() >= sentence.getBegin() ) {
          sentenceAis.add( ai );
       }
     }
     prevIndex = i;
     break;
   }
   ----  handle sentence using sentenceAis (within sentence bounds) ---
   sentenceAis.clear();
}

I haven't run the code above, but I think it should speed up the process.  The same can be done for AssertionCuePhraseAnnotation.  Every selectCovered(..) call to jcasutil starts iterating at the beginning of the fslist.  If the fslist is long (eg big doc) then this will be slow.
The same thing goes for getting the BaseTokens between cue and ai using selectBetween(..).  Though there may be none, the entire fslist is iterated until ai.getEnd() is reached.

So, replace the repeated jcasutil.select*(..) calls with self-aware list iteration.
AssertionCleartkAnalysisEngine:
356 : IdentifiedAnnotation
399 : Sentence
422 : AssertionCuePhraseAnnotation
426 : BaseToken

I should note that there are probably many other places in ctakes where we could make similar improvements and thank you for bringing it to my attention - this class just happens to be a notable example but I know for certain that I have relied upon repeated jcasutil.select*(..) calls for coding expediency.  "Make it work, then make it better."  Sometimes we don't get a chance to hit that second act.

If you get some time, please make the single change with ai only and see if there is a speed improvement.  If there is then we know that we are on the right track.  If so, or if you find other improvements, can you check in the changes?  Otherwise just let me know where you have success and maybe I can build on it.

Thanks,
Sean

-----Original Message-----
From: Dligach, Dmitriy [mailto:ddligach@luc.edu] 
Sent: Wednesday, June 21, 2017 3:18 PM
To: dev@ctakes.apache.org
Cc: Miller, Timothy
Subject: Re: negation/uncertainty: pipeline runs very slowly [EXTERNAL]

Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files. 

I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
> 
> Hi Dima,
> 
> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
> 
> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException  
> {
>    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
> 
> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
> 
> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
> 
> At any rate, I have not seen this problem in other pipelines.
> 
> Sean
> 
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu]
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
> 
> Dear cTAKES developers,
> 
> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
> 
> aggregateBuilder.add( 
> PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); 
> aggregateBuilder.add( 
> UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
> 
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
> 
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
> 
> If I remove the two AEs, this warning disappears.
> 
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitri
> ydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pip
> elines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14
> JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9
> lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWpl
> biPO2RYdGSI9E&e=
> 
> Any clues?
> 
> Thank you very much,
> 
> Dima
> 
> 
>

Re: negation/uncertainty: pipeline runs very slowly

Posted by "Dligach, Dmitriy" <dd...@luc.edu>.

Sean, thanks for your comments. You are right. The slowdown doesn’t have anything to do with documentID.

I am now convinced that the slowdown has to do with the Polarity annotator. The reason you and others haven’t seen this in other pipelines is that you’ve probably been processing relatively small files. 

I am processing MIMIC patient files, which typically have thousands of words. I just tried to process 300 files from the THYME corpus (where the files have hundreds of words) and the slowdown was barely noticeable. When running the same pipeline on the MIMIC files, the slowdown becomes very noticeable.


Dima



> On Jun 5, 2017, at 10:42, Finan, Sean <Se...@childrens.harvard.edu> wrote:
> 
> Hi Dima,
> 
> It looks like the UriCollectionReader that you are using never sets a document id (type DocumentID) in the cas.  However, this shouldn't be a problem as each document will be assigned a unique id "UnknownDocument"{###} where {###} is a number incremented per new document with an unknown id.  The message that you are seeing is just a warning.  The code fetching the documentID and creating a default are very simple and should not take any real processing time.
> 
> The call to get document id is the very first line in AssertionCleartkAnalysisEngine:
>  @Override
>  public void process(JCas jCas) throws AnalysisEngineProcessException
>  {
>    String documentId = DocumentIDAnnotationUtil.getDocumentID(jCas);
> 
> So, the slowdown occurring after the warning message leads me to believe that the problem lies later in the process ...
> 
> My suggestion is that you put a breakpoint there and run your pipeline through a debugger.  Optionally, there are a couple of log.debug messages in that class, so you could change the granularity of your log4j and see if you can narrow down the problem.  Add more debug statements if it helps.
> 
> At any rate, I have not seen this problem in other pipelines.
> 
> Sean
> 
> -----Original Message-----
> From: Dligach, Dmitriy [mailto:ddligach@luc.edu] 
> Sent: Wednesday, May 24, 2017 10:34 AM
> To: cTAKES Developer list
> Subject: negation/uncertainty: pipeline runs very slowly
> 
> Dear cTAKES developers, 
> 
> I am observing something strange. As soon as I add at the end of my pipeline the uncertainty/negation AEs:
> 
> aggregateBuilder.add( PolarityCleartkAnalysisEngine.createAnnotatorDescription() ); aggregateBuilder.add( UncertaintyCleartkAnalysisEngine.createAnnotatorDescription() );
> 
> the pipeline becomes 10-20 times slower. I just confirmed this again. As soon as I remove these two AEs at the end of my pipeline, it runs very fast again.
> 
> It seems to get stuck often right after it outputs this warning:
> WARN DocumentIDAnnotationUtil - Unable to find DocumentIDAnnotation
> 
> If I remove the two AEs, this warning disappears.
> 
> The full pipeline is here:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_dmitriydligach_ctakes-2Dmisc_blob_master_src_main_java_org_apache_ctakes_pipelines_UmlsLookupPipeline.java&d=DwIFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=cQRgT9lMipJUOQCu86lnRETbYFVC0C5yfMl2r5u0lNs&s=fnshTyx1ruwH-8ktFPX4JeX-7PVWplbiPO2RYdGSI9E&e= 
> 
> Any clues?
> 
> Thank you very much,
> 
> Dima
> 
> 
>