You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2019/09/24 23:29:31 UTC

Re: Large files taking forever to process [EXTERNAL]

Hi Greg,

Check your log to see what component is taking all the time.

There is a known problem with the cleartk assertion annotators:

https://issues.apache.org/jira/browse/CTAKES-449

A partial fix was made in the "windowed" sub-package of ctakes-assertion: org.apache.ctakes.assertion.medfacts.cleartk.windowed.

Each of the normal assertion engines has a replacement in the windowed package.

If you are using a piper file that contains "load AttributeCleartkSubPipe" as the Default clinical pipeline does, just replace it with "load WindowedAttributeCleartkSubPipe".

It isn't a full fix for the problem, and I don't know if it will make your processing faster, but  you can give it a try.

Sean

________________________________________
From: Greg Silverman <gm...@umn.edu>
Sent: Tuesday, September 24, 2019 6:47 PM
To: dev@ctakes.apache.org
Subject: Large files taking forever to process [EXTERNAL]

Any suggestions on how to speed up processing large clinical text notes
approaching 13K lines? This is a very old corpus culled from EPIC notes
back in 2009. I thought about splitting the notes into smaller chunks, but
then I would have to deal with the offsets when analyzing system output
against manual annotations that had been done.

As is, I've tried different garbage collection options (this seemed to have
worked well with CLAMP on the same set of notes).

TIA!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e= >
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.

Hi Gandhi,
The setting were out-of-the-box. Tried both the trunk branches for both
github and svn.
pom.xml is attached.

 Thanks!


On Mon, Sep 30, 2019 at 9:53 AM gandhi rajan <ga...@gmail.com>
wrote:

> Hi Greg, Can you check you pom.xml and see if you have enabled any
> sonarqube related profile Iin build?
>
> On Sunday, September 29, 2019, Greg Silverman <gm...@umn.edu> wrote:
>
> > Trying to do the maven build and getting the following error: "You're not
> > authorized to execute any SonarQube analysis. Please contact your
> SonarQube
> > administrator."
> >
> > Please advise. I'm under a pretty tight time line to get these files
> > processed.
> >
> > Thanks!
> >
> > Greg--
> >
> >
> > On Sun, Sep 29, 2019 at 10:56 AM Greg Silverman <gm...@umn.edu> wrote:
> >
> > > Never mind! I see I have to build from source.
> > >
> > > Greg--
> > >
> > > On Sun, Sep 29, 2019 at 10:44 AM Greg Silverman <gm...@umn.edu> wrote:
> > >
> > >> Hi Sean,
> > >> I just ran another set of notes through cTAKES and noticed the
> following
> > >> error:
> > >>
> > >> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy
> HH:mm:ss}
> > >> %5p %c{1} - %m%n].
> > >> log4j: Adding appender named [consoleAppender] to category [root].
> > >> 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
> > >> WindowedAttributeCleartkSubPipe
> > >>
> > >> Is something missing? This is how my DefaultFastPipeline.piper file
> > looks
> > >>  (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with
> > similar
> > >> results)
> > >>
> > >> // Commands and parameters to create a default plaintext document
> > >> processing pipeline with UMLS lookup
> > >>
> > >> // Load a simple token processing pipeline from another pipeline file
> > >> load DefaultTokenizerPipeline.piper
> > >>
> > >> // Add non-core annotators
> > >> add ContextDependentTokenizerAnnotator
> > >> addDescription POSTagger
> > >>
> > >> // Add Chunkers
> > >> load ChunkerSubPipe.piper
> > >>
> > >> // Default fast dictionary lookup
> > >> add DefaultJCasTermAnnotator
> > >>
> > >> // Add Cleartk Entity Attribute annotators
> > >> // see https://issues.apache.org/jira/browse/CTAKES-449
> > >> //load AttributeCleartkSubPipe.piper
> > >> load WindowedAttributeCleartkSubPipe
> > >>
> > >>
> > >> All files seem to have been processed fine, but wondering if something
> > >> was missed, due to the error. If so, how do I construct the
> > >> WindowedAttributeCleartkSubPipe.piper file?
> > >>
> > >> Thanks very much in advance!
> > >>
> > >> Greg--
> > >>
> > >>
> > >> On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:
> > >>
> > >>> Sweet! That was definitely it! It's flying now (granted, our files
> are
> > >>> not in the > 1 mb realm, like it the jira issue - just in the nnn.kb
> > realm,
> > >>> but still!).
> > >>>
> > >>> Mahalo nui loa!
> > >>>
> > >>>
> > >>>
> > >>> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
> > >>> Sean.Finan@childrens.harvard.edu> wrote:
> > >>>
> > >>>> Hi Greg,
> > >>>>
> > >>>> Check your log to see what component is taking all the time.
> > >>>>
> > >>>> There is a known problem with the cleartk assertion annotators:
> > >>>>
> > >>>> https://issues.apache.org/jira/browse/CTAKES-449
> > >>>>
> > >>>> A partial fix was made in the "windowed" sub-package of
> > >>>> ctakes-assertion: org.apache.ctakes.assertion.
> > medfacts.cleartk.windowed.
> > >>>>
> > >>>> Each of the normal assertion engines has a replacement in the
> windowed
> > >>>> package.
> > >>>>
> > >>>> If you are using a piper file that contains "load
> > >>>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
> > >>>> replace it with "load WindowedAttributeCleartkSubPipe".
> > >>>>
> > >>>> It isn't a full fix for the problem, and I don't know if it will
> make
> > >>>> your processing faster, but  you can give it a try.
> > >>>>
> > >>>> Sean
> > >>>>
> > >>>> ________________________________________
> > >>>> From: Greg Silverman <gm...@umn.edu>
> > >>>> Sent: Tuesday, September 24, 2019 6:47 PM
> > >>>> To: dev@ctakes.apache.org
> > >>>> Subject: Large files taking forever to process [EXTERNAL]
> > >>>>
> > >>>> Any suggestions on how to speed up processing large clinical text
> > notes
> > >>>> approaching 13K lines? This is a very old corpus culled from EPIC
> > notes
> > >>>> back in 2009. I thought about splitting the notes into smaller
> chunks,
> > >>>> but
> > >>>> then I would have to deal with the offsets when analyzing system
> > output
> > >>>> against manual annotations that had been done.
> > >>>>
> > >>>> As is, I've tried different garbage collection options (this seemed
> to
> > >>>> have
> > >>>> worked well with CLAMP on the same set of notes).
> > >>>>
> > >>>> TIA!
> > >>>>
> > >>>> Greg--
> > >>>>
> > >>>> --
> > >>>> Greg M. Silverman
> > >>>> Senior Systems Developer
> > >>>> NLP/IE <
> > >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__
> > healthinformatics.umn.edu_research_nlpie-2Dgroup&d=
> > DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> > fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-
> > 9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
> > >>>> >
> > >>>> Department of Surgery
> > >>>> University of Minnesota
> > >>>> gms@umn.edu
> > >>>>
> > >>>>  ›  evaluate-it.org  ‹
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> Greg M. Silverman
> > >>> Senior Systems Developer
> > >>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > >>> Department of Surgery
> > >>> University of Minnesota
> > >>> gms@umn.edu
> > >>>
> > >>>  ›  evaluate-it.org  ‹
> > >>>
> > >>
> > >>
> > >> --
> > >> Greg M. Silverman
> > >> Senior Systems Developer
> > >> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > >> Department of Surgery
> > >> University of Minnesota
> > >> gms@umn.edu
> > >>
> > >>  ›  evaluate-it.org  ‹
> > >>
> > >
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > > Department of Surgery
> > > University of Minnesota
> > > gms@umn.edu
> > >
> > >  ›  evaluate-it.org  ‹
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by gandhi rajan <ga...@gmail.com>.

Hi Greg, Can you check you pom.xml and see if you have enabled any
sonarqube related profile Iin build?

On Sunday, September 29, 2019, Greg Silverman <gm...@umn.edu> wrote:

> Trying to do the maven build and getting the following error: "You're not
> authorized to execute any SonarQube analysis. Please contact your SonarQube
> administrator."
>
> Please advise. I'm under a pretty tight time line to get these files
> processed.
>
> Thanks!
>
> Greg--
>
>
> On Sun, Sep 29, 2019 at 10:56 AM Greg Silverman <gm...@umn.edu> wrote:
>
> > Never mind! I see I have to build from source.
> >
> > Greg--
> >
> > On Sun, Sep 29, 2019 at 10:44 AM Greg Silverman <gm...@umn.edu> wrote:
> >
> >> Hi Sean,
> >> I just ran another set of notes through cTAKES and noticed the following
> >> error:
> >>
> >> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss}
> >> %5p %c{1} - %m%n].
> >> log4j: Adding appender named [consoleAppender] to category [root].
> >> 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
> >> WindowedAttributeCleartkSubPipe
> >>
> >> Is something missing? This is how my DefaultFastPipeline.piper file
> looks
> >>  (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with
> similar
> >> results)
> >>
> >> // Commands and parameters to create a default plaintext document
> >> processing pipeline with UMLS lookup
> >>
> >> // Load a simple token processing pipeline from another pipeline file
> >> load DefaultTokenizerPipeline.piper
> >>
> >> // Add non-core annotators
> >> add ContextDependentTokenizerAnnotator
> >> addDescription POSTagger
> >>
> >> // Add Chunkers
> >> load ChunkerSubPipe.piper
> >>
> >> // Default fast dictionary lookup
> >> add DefaultJCasTermAnnotator
> >>
> >> // Add Cleartk Entity Attribute annotators
> >> // see https://issues.apache.org/jira/browse/CTAKES-449
> >> //load AttributeCleartkSubPipe.piper
> >> load WindowedAttributeCleartkSubPipe
> >>
> >>
> >> All files seem to have been processed fine, but wondering if something
> >> was missed, due to the error. If so, how do I construct the
> >> WindowedAttributeCleartkSubPipe.piper file?
> >>
> >> Thanks very much in advance!
> >>
> >> Greg--
> >>
> >>
> >> On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:
> >>
> >>> Sweet! That was definitely it! It's flying now (granted, our files are
> >>> not in the > 1 mb realm, like it the jira issue - just in the nnn.kb
> realm,
> >>> but still!).
> >>>
> >>> Mahalo nui loa!
> >>>
> >>>
> >>>
> >>> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
> >>> Sean.Finan@childrens.harvard.edu> wrote:
> >>>
> >>>> Hi Greg,
> >>>>
> >>>> Check your log to see what component is taking all the time.
> >>>>
> >>>> There is a known problem with the cleartk assertion annotators:
> >>>>
> >>>> https://issues.apache.org/jira/browse/CTAKES-449
> >>>>
> >>>> A partial fix was made in the "windowed" sub-package of
> >>>> ctakes-assertion: org.apache.ctakes.assertion.
> medfacts.cleartk.windowed.
> >>>>
> >>>> Each of the normal assertion engines has a replacement in the windowed
> >>>> package.
> >>>>
> >>>> If you are using a piper file that contains "load
> >>>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
> >>>> replace it with "load WindowedAttributeCleartkSubPipe".
> >>>>
> >>>> It isn't a full fix for the problem, and I don't know if it will make
> >>>> your processing faster, but  you can give it a try.
> >>>>
> >>>> Sean
> >>>>
> >>>> ________________________________________
> >>>> From: Greg Silverman <gm...@umn.edu>
> >>>> Sent: Tuesday, September 24, 2019 6:47 PM
> >>>> To: dev@ctakes.apache.org
> >>>> Subject: Large files taking forever to process [EXTERNAL]
> >>>>
> >>>> Any suggestions on how to speed up processing large clinical text
> notes
> >>>> approaching 13K lines? This is a very old corpus culled from EPIC
> notes
> >>>> back in 2009. I thought about splitting the notes into smaller chunks,
> >>>> but
> >>>> then I would have to deal with the offsets when analyzing system
> output
> >>>> against manual annotations that had been done.
> >>>>
> >>>> As is, I've tried different garbage collection options (this seemed to
> >>>> have
> >>>> worked well with CLAMP on the same set of notes).
> >>>>
> >>>> TIA!
> >>>>
> >>>> Greg--
> >>>>
> >>>> --
> >>>> Greg M. Silverman
> >>>> Senior Systems Developer
> >>>> NLP/IE <
> >>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__
> healthinformatics.umn.edu_research_nlpie-2Dgroup&d=
> DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=
> fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-
> 9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
> >>>> >
> >>>> Department of Surgery
> >>>> University of Minnesota
> >>>> gms@umn.edu
> >>>>
> >>>>  ›  evaluate-it.org  ‹
> >>>>
> >>>
> >>>
> >>> --
> >>> Greg M. Silverman
> >>> Senior Systems Developer
> >>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> >>> Department of Surgery
> >>> University of Minnesota
> >>> gms@umn.edu
> >>>
> >>>  ›  evaluate-it.org  ‹
> >>>
> >>
> >>
> >> --
> >> Greg M. Silverman
> >> Senior Systems Developer
> >> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> >> Department of Surgery
> >> University of Minnesota
> >> gms@umn.edu
> >>
> >>  ›  evaluate-it.org  ‹
> >>
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"

Re: Large files taking forever to process [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Greg,

I haven't a clue why you are getting a sonarqube failure message.  Hopefully somebody else is paying attention and can help with that item.

Sorry,

Sean 
________________________________________
From: Greg Silverman <gm...@umn.edu>
Sent: Sunday, September 29, 2019 12:21 PM
To: dev@ctakes.apache.org
Subject: Re: Large files taking forever to process [EXTERNAL]

Here's the final output (if I could just download cTAKES 4.0.1-SNAPSHOT
without having to do the build, that would be fantastic!):

[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary for Apache cTAKES 4.0.1-SNAPSHOT:
[INFO]
[INFO] Apache cTAKES ...................................... FAILURE [
12.169 s]
[INFO] ctakes-gui-res ..................................... SUCCESS [
 3.663 s]
[INFO] Apache cTAKES Resources coreference ................ SUCCESS [
 3.055 s]
[INFO] Apache cTAKES Resources temporal ................... SUCCESS [
 0.587 s]
[INFO] Apache cTAKES Resources relation-extractor ......... SUCCESS [
 0.305 s]
[INFO] Apache cTAKES Resources dictionary-lookup-fast-res . SUCCESS [
 8.437 s]
[INFO] Apache cTAKES Resources core ....................... SUCCESS [
 0.362 s]
[INFO] Apache cTAKES common type system ................... SUCCESS [
 6.588 s]
[INFO] Apache cTAKES utils ................................ SUCCESS [
 1.639 s]
[INFO] Apache cTAKES core ................................. SUCCESS [
 7.503 s]
[INFO] Apache cTAKES dictionary lookup fast ............... SUCCESS [
 1.065 s]
[INFO] Apache cTAKES document preprocessor ................ SUCCESS [
 0.818 s]
[INFO] Apache cTAKES Resources lvg ........................ SUCCESS [01:01
min]
[INFO] Apache cTAKES LVG lexical tools .................... SUCCESS [
 2.721 s]
[INFO] Apache cTAKES Resources ne-contexts ................ SUCCESS [
 0.223 s]
[INFO] Apache cTAKES named entity contexts ................ SUCCESS [
 1.535 s]
[INFO] Apache cTAKES Resources assertion .................. SUCCESS [
 1.456 s]
[INFO] Apache cTAKES Resources constituency-parser ........ SUCCESS [
 2.099 s]
[INFO] Apache cTAKES Constituency Parser .................. SUCCESS [
 0.793 s]
[INFO] Apache cTAKES Resources dependency-parser .......... SUCCESS [
12.839 s]
[INFO] Apache cTAKES Resources pos-tagger ................. SUCCESS [
 0.750 s]
[INFO] Apache cTAKES part-of-speech tagger ................ SUCCESS [
 2.531 s]
[INFO] Apache cTAKES Dependency Parser .................... SUCCESS [
20.299 s]
[INFO] Apache cTAKES context dependent tokenizer .......... SUCCESS [
 1.531 s]
[INFO] Apache cTAKES Resources ctakes-chunker-res ......... SUCCESS [
 0.730 s]
[INFO] Apache cTAKES chunker .............................. SUCCESS [
 0.710 s]
[INFO] Apache cTAKES Assertion ............................ SUCCESS [
 3.114 s]
[INFO] ctakes-clinical-pipeline-res ....................... SUCCESS [
 0.307 s]
[INFO] Apache cTAKES ctakes-clinical-pipeline ............. SUCCESS [
 0.674 s]
[INFO] Apache cTAKES Relation Extractor ................... SUCCESS [
 2.564 s]
[INFO] Apache cTAKES Temporal Information Extraction ...... SUCCESS [01:11
min]
[INFO] Apache cTAKES CoReference Resolver ................. SUCCESS [
 2.282 s]
[INFO] ctakes-gui ......................................... SUCCESS [
 1.681 s]
[INFO] Apache cTAKES fhir support ......................... SUCCESS [
10.391 s]
[INFO] Apache cTAKES Resources dictionary-lookup .......... SUCCESS [
41.824 s]
[INFO] Apache cTAKES dictionary lookup .................... SUCCESS [
 0.633 s]
[INFO] Apache cTAKES Resources drug-ner ................... SUCCESS [
 0.239 s]
[INFO] Apache cTAKES Drug NER ............................. SUCCESS [
 0.898 s]
[INFO] Apache cTAKES Resources side-effect ................ SUCCESS [
 0.257 s]
[INFO] Apache cTAKES Side Effects ......................... SUCCESS [
 0.549 s]
[INFO] Apache cTAKES Resources smoking-status ............. SUCCESS [
 0.263 s]
[INFO] Apache cTAKES Smoking Status ....................... SUCCESS [
 0.583 s]
[INFO] Apache cTAKES Resources assertion-zoner ............ SUCCESS [
 0.271 s]
[INFO] Apache cTAKES Assertion's zoner .................... SUCCESS [
 0.633 s]
[INFO] ctakes-examples-res ................................ SUCCESS [
 0.374 s]
[INFO] ctakes-examples .................................... SUCCESS [
 0.582 s]
[INFO] Apache cTAKES Resources ctakes-ytex-res ............ SUCCESS [
 0.317 s]
[INFO] Apache cTAKES YTEX ................................. SUCCESS [
56.133 s]
[INFO] Apache cTAKES YTEX UIMA ............................ SUCCESS [
53.047 s]
[INFO] Apache cTAKES YTEX Web ............................. SUCCESS [
46.687 s]
[INFO] Apache cTAKES Distribution ......................... SUCCESS [02:00
min]
[INFO] Apache cTAKES Regression-test ...................... SUCCESS [
 2.254 s]
[INFO] Apache cTAKES template filler ...................... SUCCESS [
 0.749 s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time:  09:58 min
[INFO] Finished at: 2019-09-29T11:15:52-05:00
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.sonarsource.scanner.maven:sonar-maven-plugin:3.6.1.1688:sonar
(default-cli) on project ctakes: You're not authorized to execute any
SonarQube analysis. Please contact your SonarQube administrator. -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal org.sonarsource.scanner.maven:sonar-maven-plugin:3.6.1.1688:sonar
(default-cli) on project ctakes: You're not authorized to execute any
SonarQube analysis. Please contact your SonarQube administrator.

On Sun, Sep 29, 2019 at 11:18 AM Greg Silverman <gm...@umn.edu> wrote:

> Trying to do the maven build and getting the following error: "You're not
> authorized to execute any SonarQube analysis. Please contact your SonarQube
> administrator."
>
> Please advise. I'm under a pretty tight time line to get these files
> processed.
>
> Thanks!
>
> Greg--
>
>
> On Sun, Sep 29, 2019 at 10:56 AM Greg Silverman <gm...@umn.edu> wrote:
>
>> Never mind! I see I have to build from source.
>>
>> Greg--
>>
>> On Sun, Sep 29, 2019 at 10:44 AM Greg Silverman <gm...@umn.edu> wrote:
>>
>>> Hi Sean,
>>> I just ran another set of notes through cTAKES and noticed the following
>>> error:
>>>
>>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss}
>>> %5p %c{1} - %m%n].
>>> log4j: Adding appender named [consoleAppender] to category [root].
>>> 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
>>> WindowedAttributeCleartkSubPipe
>>>
>>> Is something missing? This is how my DefaultFastPipeline.piper file
>>> looks  (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with
>>> similar results)
>>>
>>> // Commands and parameters to create a default plaintext document
>>> processing pipeline with UMLS lookup
>>>
>>> // Load a simple token processing pipeline from another pipeline file
>>> load DefaultTokenizerPipeline.piper
>>>
>>> // Add non-core annotators
>>> add ContextDependentTokenizerAnnotator
>>> addDescription POSTagger
>>>
>>> // Add Chunkers
>>> load ChunkerSubPipe.piper
>>>
>>> // Default fast dictionary lookup
>>> add DefaultJCasTermAnnotator
>>>
>>> // Add Cleartk Entity Attribute annotators
>>> // see https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D449&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=IdHX7NzL5xkTIhGoF5b4Oo77PD4mLa05myUxsW4UElE&e=
>>> //load AttributeCleartkSubPipe.piper
>>> load WindowedAttributeCleartkSubPipe
>>>
>>>
>>> All files seem to have been processed fine, but wondering if something
>>> was missed, due to the error. If so, how do I construct the
>>> WindowedAttributeCleartkSubPipe.piper file?
>>>
>>> Thanks very much in advance!
>>>
>>> Greg--
>>>
>>>
>>> On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:
>>>
>>>> Sweet! That was definitely it! It's flying now (granted, our files are
>>>> not in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm,
>>>> but still!).
>>>>
>>>> Mahalo nui loa!
>>>>
>>>>
>>>>
>>>> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
>>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>>
>>>>> Hi Greg,
>>>>>
>>>>> Check your log to see what component is taking all the time.
>>>>>
>>>>> There is a known problem with the cleartk assertion annotators:
>>>>>
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CTAKES-2D449&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=IdHX7NzL5xkTIhGoF5b4Oo77PD4mLa05myUxsW4UElE&e=
>>>>>
>>>>> A partial fix was made in the "windowed" sub-package of
>>>>> ctakes-assertion: org.apache.ctakes.assertion.medfacts.cleartk.windowed.
>>>>>
>>>>> Each of the normal assertion engines has a replacement in the windowed
>>>>> package.
>>>>>
>>>>> If you are using a piper file that contains "load
>>>>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
>>>>> replace it with "load WindowedAttributeCleartkSubPipe".
>>>>>
>>>>> It isn't a full fix for the problem, and I don't know if it will make
>>>>> your processing faster, but  you can give it a try.
>>>>>
>>>>> Sean
>>>>>
>>>>> ________________________________________
>>>>> From: Greg Silverman <gm...@umn.edu>
>>>>> Sent: Tuesday, September 24, 2019 6:47 PM
>>>>> To: dev@ctakes.apache.org
>>>>> Subject: Large files taking forever to process [EXTERNAL]
>>>>>
>>>>> Any suggestions on how to speed up processing large clinical text notes
>>>>> approaching 13K lines? This is a very old corpus culled from EPIC notes
>>>>> back in 2009. I thought about splitting the notes into smaller chunks,
>>>>> but
>>>>> then I would have to deal with the offsets when analyzing system output
>>>>> against manual annotations that had been done.
>>>>>
>>>>> As is, I've tried different garbage collection options (this seemed to
>>>>> have
>>>>> worked well with CLAMP on the same set of notes).
>>>>>
>>>>> TIA!
>>>>>
>>>>> Greg--
>>>>>
>>>>> --
>>>>> Greg M. Silverman
>>>>> Senior Systems Developer
>>>>> NLP/IE <
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
>>>>> >
>>>>> Department of Surgery
>>>>> University of Minnesota
>>>>> gms@umn.edu
>>>>>
>>>>>  ›  evaluate-it.org  ‹
>>>>>
>>>>
>>>>
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=wis6vUdUWcdihgvOlzi8SNhf493kLV83rBWbbVq6btI&e= >
>>>> Department of Surgery
>>>> University of Minnesota
>>>> gms@umn.edu
>>>>
>>>>  ›  evaluate-it.org  ‹
>>>>
>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=wis6vUdUWcdihgvOlzi8SNhf493kLV83rBWbbVq6btI&e= >
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
>>>  ›  evaluate-it.org  ‹
>>>
>>
>>
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=wis6vUdUWcdihgvOlzi8SNhf493kLV83rBWbbVq6btI&e= >
>> Department of Surgery
>> University of Minnesota
>> gms@umn.edu
>>
>>  ›  evaluate-it.org  ‹
>>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=wis6vUdUWcdihgvOlzi8SNhf493kLV83rBWbbVq6btI&e= >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=5eTgybFH8bBSf3e9a4E7Nb2wzCZQNsr2ArPltyPabTI&s=wis6vUdUWcdihgvOlzi8SNhf493kLV83rBWbbVq6btI&e= >
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.

Here's the final output (if I could just download cTAKES 4.0.1-SNAPSHOT
without having to do the build, that would be fantastic!):

[INFO]
------------------------------------------------------------------------
[INFO] Reactor Summary for Apache cTAKES 4.0.1-SNAPSHOT:
[INFO]
[INFO] Apache cTAKES ...................................... FAILURE [
12.169 s]
[INFO] ctakes-gui-res ..................................... SUCCESS [
 3.663 s]
[INFO] Apache cTAKES Resources coreference ................ SUCCESS [
 3.055 s]
[INFO] Apache cTAKES Resources temporal ................... SUCCESS [
 0.587 s]
[INFO] Apache cTAKES Resources relation-extractor ......... SUCCESS [
 0.305 s]
[INFO] Apache cTAKES Resources dictionary-lookup-fast-res . SUCCESS [
 8.437 s]
[INFO] Apache cTAKES Resources core ....................... SUCCESS [
 0.362 s]
[INFO] Apache cTAKES common type system ................... SUCCESS [
 6.588 s]
[INFO] Apache cTAKES utils ................................ SUCCESS [
 1.639 s]
[INFO] Apache cTAKES core ................................. SUCCESS [
 7.503 s]
[INFO] Apache cTAKES dictionary lookup fast ............... SUCCESS [
 1.065 s]
[INFO] Apache cTAKES document preprocessor ................ SUCCESS [
 0.818 s]
[INFO] Apache cTAKES Resources lvg ........................ SUCCESS [01:01
min]
[INFO] Apache cTAKES LVG lexical tools .................... SUCCESS [
 2.721 s]
[INFO] Apache cTAKES Resources ne-contexts ................ SUCCESS [
 0.223 s]
[INFO] Apache cTAKES named entity contexts ................ SUCCESS [
 1.535 s]
[INFO] Apache cTAKES Resources assertion .................. SUCCESS [
 1.456 s]
[INFO] Apache cTAKES Resources constituency-parser ........ SUCCESS [
 2.099 s]
[INFO] Apache cTAKES Constituency Parser .................. SUCCESS [
 0.793 s]
[INFO] Apache cTAKES Resources dependency-parser .......... SUCCESS [
12.839 s]
[INFO] Apache cTAKES Resources pos-tagger ................. SUCCESS [
 0.750 s]
[INFO] Apache cTAKES part-of-speech tagger ................ SUCCESS [
 2.531 s]
[INFO] Apache cTAKES Dependency Parser .................... SUCCESS [
20.299 s]
[INFO] Apache cTAKES context dependent tokenizer .......... SUCCESS [
 1.531 s]
[INFO] Apache cTAKES Resources ctakes-chunker-res ......... SUCCESS [
 0.730 s]
[INFO] Apache cTAKES chunker .............................. SUCCESS [
 0.710 s]
[INFO] Apache cTAKES Assertion ............................ SUCCESS [
 3.114 s]
[INFO] ctakes-clinical-pipeline-res ....................... SUCCESS [
 0.307 s]
[INFO] Apache cTAKES ctakes-clinical-pipeline ............. SUCCESS [
 0.674 s]
[INFO] Apache cTAKES Relation Extractor ................... SUCCESS [
 2.564 s]
[INFO] Apache cTAKES Temporal Information Extraction ...... SUCCESS [01:11
min]
[INFO] Apache cTAKES CoReference Resolver ................. SUCCESS [
 2.282 s]
[INFO] ctakes-gui ......................................... SUCCESS [
 1.681 s]
[INFO] Apache cTAKES fhir support ......................... SUCCESS [
10.391 s]
[INFO] Apache cTAKES Resources dictionary-lookup .......... SUCCESS [
41.824 s]
[INFO] Apache cTAKES dictionary lookup .................... SUCCESS [
 0.633 s]
[INFO] Apache cTAKES Resources drug-ner ................... SUCCESS [
 0.239 s]
[INFO] Apache cTAKES Drug NER ............................. SUCCESS [
 0.898 s]
[INFO] Apache cTAKES Resources side-effect ................ SUCCESS [
 0.257 s]
[INFO] Apache cTAKES Side Effects ......................... SUCCESS [
 0.549 s]
[INFO] Apache cTAKES Resources smoking-status ............. SUCCESS [
 0.263 s]
[INFO] Apache cTAKES Smoking Status ....................... SUCCESS [
 0.583 s]
[INFO] Apache cTAKES Resources assertion-zoner ............ SUCCESS [
 0.271 s]
[INFO] Apache cTAKES Assertion's zoner .................... SUCCESS [
 0.633 s]
[INFO] ctakes-examples-res ................................ SUCCESS [
 0.374 s]
[INFO] ctakes-examples .................................... SUCCESS [
 0.582 s]
[INFO] Apache cTAKES Resources ctakes-ytex-res ............ SUCCESS [
 0.317 s]
[INFO] Apache cTAKES YTEX ................................. SUCCESS [
56.133 s]
[INFO] Apache cTAKES YTEX UIMA ............................ SUCCESS [
53.047 s]
[INFO] Apache cTAKES YTEX Web ............................. SUCCESS [
46.687 s]
[INFO] Apache cTAKES Distribution ......................... SUCCESS [02:00
min]
[INFO] Apache cTAKES Regression-test ...................... SUCCESS [
 2.254 s]
[INFO] Apache cTAKES template filler ...................... SUCCESS [
 0.749 s]
[INFO]
------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]
------------------------------------------------------------------------
[INFO] Total time:  09:58 min
[INFO] Finished at: 2019-09-29T11:15:52-05:00
[INFO]
------------------------------------------------------------------------
[ERROR] Failed to execute goal
org.sonarsource.scanner.maven:sonar-maven-plugin:3.6.1.1688:sonar
(default-cli) on project ctakes: You're not authorized to execute any
SonarQube analysis. Please contact your SonarQube administrator. -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute
goal org.sonarsource.scanner.maven:sonar-maven-plugin:3.6.1.1688:sonar
(default-cli) on project ctakes: You're not authorized to execute any
SonarQube analysis. Please contact your SonarQube administrator.

On Sun, Sep 29, 2019 at 11:18 AM Greg Silverman <gm...@umn.edu> wrote:

> Trying to do the maven build and getting the following error: "You're not
> authorized to execute any SonarQube analysis. Please contact your SonarQube
> administrator."
>
> Please advise. I'm under a pretty tight time line to get these files
> processed.
>
> Thanks!
>
> Greg--
>
>
> On Sun, Sep 29, 2019 at 10:56 AM Greg Silverman <gm...@umn.edu> wrote:
>
>> Never mind! I see I have to build from source.
>>
>> Greg--
>>
>> On Sun, Sep 29, 2019 at 10:44 AM Greg Silverman <gm...@umn.edu> wrote:
>>
>>> Hi Sean,
>>> I just ran another set of notes through cTAKES and noticed the following
>>> error:
>>>
>>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss}
>>> %5p %c{1} - %m%n].
>>> log4j: Adding appender named [consoleAppender] to category [root].
>>> 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
>>> WindowedAttributeCleartkSubPipe
>>>
>>> Is something missing? This is how my DefaultFastPipeline.piper file
>>> looks  (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with
>>> similar results)
>>>
>>> // Commands and parameters to create a default plaintext document
>>> processing pipeline with UMLS lookup
>>>
>>> // Load a simple token processing pipeline from another pipeline file
>>> load DefaultTokenizerPipeline.piper
>>>
>>> // Add non-core annotators
>>> add ContextDependentTokenizerAnnotator
>>> addDescription POSTagger
>>>
>>> // Add Chunkers
>>> load ChunkerSubPipe.piper
>>>
>>> // Default fast dictionary lookup
>>> add DefaultJCasTermAnnotator
>>>
>>> // Add Cleartk Entity Attribute annotators
>>> // see https://issues.apache.org/jira/browse/CTAKES-449
>>> //load AttributeCleartkSubPipe.piper
>>> load WindowedAttributeCleartkSubPipe
>>>
>>>
>>> All files seem to have been processed fine, but wondering if something
>>> was missed, due to the error. If so, how do I construct the
>>> WindowedAttributeCleartkSubPipe.piper file?
>>>
>>> Thanks very much in advance!
>>>
>>> Greg--
>>>
>>>
>>> On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:
>>>
>>>> Sweet! That was definitely it! It's flying now (granted, our files are
>>>> not in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm,
>>>> but still!).
>>>>
>>>> Mahalo nui loa!
>>>>
>>>>
>>>>
>>>> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
>>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>>
>>>>> Hi Greg,
>>>>>
>>>>> Check your log to see what component is taking all the time.
>>>>>
>>>>> There is a known problem with the cleartk assertion annotators:
>>>>>
>>>>> https://issues.apache.org/jira/browse/CTAKES-449
>>>>>
>>>>> A partial fix was made in the "windowed" sub-package of
>>>>> ctakes-assertion: org.apache.ctakes.assertion.medfacts.cleartk.windowed.
>>>>>
>>>>> Each of the normal assertion engines has a replacement in the windowed
>>>>> package.
>>>>>
>>>>> If you are using a piper file that contains "load
>>>>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
>>>>> replace it with "load WindowedAttributeCleartkSubPipe".
>>>>>
>>>>> It isn't a full fix for the problem, and I don't know if it will make
>>>>> your processing faster, but  you can give it a try.
>>>>>
>>>>> Sean
>>>>>
>>>>> ________________________________________
>>>>> From: Greg Silverman <gm...@umn.edu>
>>>>> Sent: Tuesday, September 24, 2019 6:47 PM
>>>>> To: dev@ctakes.apache.org
>>>>> Subject: Large files taking forever to process [EXTERNAL]
>>>>>
>>>>> Any suggestions on how to speed up processing large clinical text notes
>>>>> approaching 13K lines? This is a very old corpus culled from EPIC notes
>>>>> back in 2009. I thought about splitting the notes into smaller chunks,
>>>>> but
>>>>> then I would have to deal with the offsets when analyzing system output
>>>>> against manual annotations that had been done.
>>>>>
>>>>> As is, I've tried different garbage collection options (this seemed to
>>>>> have
>>>>> worked well with CLAMP on the same set of notes).
>>>>>
>>>>> TIA!
>>>>>
>>>>> Greg--
>>>>>
>>>>> --
>>>>> Greg M. Silverman
>>>>> Senior Systems Developer
>>>>> NLP/IE <
>>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
>>>>> >
>>>>> Department of Surgery
>>>>> University of Minnesota
>>>>> gms@umn.edu
>>>>>
>>>>>  ›  evaluate-it.org  ‹
>>>>>
>>>>
>>>>
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>>>> Department of Surgery
>>>> University of Minnesota
>>>> gms@umn.edu
>>>>
>>>>  ›  evaluate-it.org  ‹
>>>>
>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
>>>  ›  evaluate-it.org  ‹
>>>
>>
>>
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>> Department of Surgery
>> University of Minnesota
>> gms@umn.edu
>>
>>  ›  evaluate-it.org  ‹
>>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.

Trying to do the maven build and getting the following error: "You're not
authorized to execute any SonarQube analysis. Please contact your SonarQube
administrator."

Please advise. I'm under a pretty tight time line to get these files
processed.

Thanks!

Greg--


On Sun, Sep 29, 2019 at 10:56 AM Greg Silverman <gm...@umn.edu> wrote:

> Never mind! I see I have to build from source.
>
> Greg--
>
> On Sun, Sep 29, 2019 at 10:44 AM Greg Silverman <gm...@umn.edu> wrote:
>
>> Hi Sean,
>> I just ran another set of notes through cTAKES and noticed the following
>> error:
>>
>> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss}
>> %5p %c{1} - %m%n].
>> log4j: Adding appender named [consoleAppender] to category [root].
>> 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
>> WindowedAttributeCleartkSubPipe
>>
>> Is something missing? This is how my DefaultFastPipeline.piper file looks
>>  (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with similar
>> results)
>>
>> // Commands and parameters to create a default plaintext document
>> processing pipeline with UMLS lookup
>>
>> // Load a simple token processing pipeline from another pipeline file
>> load DefaultTokenizerPipeline.piper
>>
>> // Add non-core annotators
>> add ContextDependentTokenizerAnnotator
>> addDescription POSTagger
>>
>> // Add Chunkers
>> load ChunkerSubPipe.piper
>>
>> // Default fast dictionary lookup
>> add DefaultJCasTermAnnotator
>>
>> // Add Cleartk Entity Attribute annotators
>> // see https://issues.apache.org/jira/browse/CTAKES-449
>> //load AttributeCleartkSubPipe.piper
>> load WindowedAttributeCleartkSubPipe
>>
>>
>> All files seem to have been processed fine, but wondering if something
>> was missed, due to the error. If so, how do I construct the
>> WindowedAttributeCleartkSubPipe.piper file?
>>
>> Thanks very much in advance!
>>
>> Greg--
>>
>>
>> On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:
>>
>>> Sweet! That was definitely it! It's flying now (granted, our files are
>>> not in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm,
>>> but still!).
>>>
>>> Mahalo nui loa!
>>>
>>>
>>>
>>> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>
>>>> Hi Greg,
>>>>
>>>> Check your log to see what component is taking all the time.
>>>>
>>>> There is a known problem with the cleartk assertion annotators:
>>>>
>>>> https://issues.apache.org/jira/browse/CTAKES-449
>>>>
>>>> A partial fix was made in the "windowed" sub-package of
>>>> ctakes-assertion: org.apache.ctakes.assertion.medfacts.cleartk.windowed.
>>>>
>>>> Each of the normal assertion engines has a replacement in the windowed
>>>> package.
>>>>
>>>> If you are using a piper file that contains "load
>>>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
>>>> replace it with "load WindowedAttributeCleartkSubPipe".
>>>>
>>>> It isn't a full fix for the problem, and I don't know if it will make
>>>> your processing faster, but  you can give it a try.
>>>>
>>>> Sean
>>>>
>>>> ________________________________________
>>>> From: Greg Silverman <gm...@umn.edu>
>>>> Sent: Tuesday, September 24, 2019 6:47 PM
>>>> To: dev@ctakes.apache.org
>>>> Subject: Large files taking forever to process [EXTERNAL]
>>>>
>>>> Any suggestions on how to speed up processing large clinical text notes
>>>> approaching 13K lines? This is a very old corpus culled from EPIC notes
>>>> back in 2009. I thought about splitting the notes into smaller chunks,
>>>> but
>>>> then I would have to deal with the offsets when analyzing system output
>>>> against manual annotations that had been done.
>>>>
>>>> As is, I've tried different garbage collection options (this seemed to
>>>> have
>>>> worked well with CLAMP on the same set of notes).
>>>>
>>>> TIA!
>>>>
>>>> Greg--
>>>>
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <
>>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
>>>> >
>>>> Department of Surgery
>>>> University of Minnesota
>>>> gms@umn.edu
>>>>
>>>>  ›  evaluate-it.org  ‹
>>>>
>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
>>>  ›  evaluate-it.org  ‹
>>>
>>
>>
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>> Department of Surgery
>> University of Minnesota
>> gms@umn.edu
>>
>>  ›  evaluate-it.org  ‹
>>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.

Never mind! I see I have to build from source.

Greg--

On Sun, Sep 29, 2019 at 10:44 AM Greg Silverman <gm...@umn.edu> wrote:

> Hi Sean,
> I just ran another set of notes through cTAKES and noticed the following
> error:
>
> log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss}
> %5p %c{1} - %m%n].
> log4j: Adding appender named [consoleAppender] to category [root].
> 29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
> WindowedAttributeCleartkSubPipe
>
> Is something missing? This is how my DefaultFastPipeline.piper file looks
>  (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with similar
> results)
>
> // Commands and parameters to create a default plaintext document
> processing pipeline with UMLS lookup
>
> // Load a simple token processing pipeline from another pipeline file
> load DefaultTokenizerPipeline.piper
>
> // Add non-core annotators
> add ContextDependentTokenizerAnnotator
> addDescription POSTagger
>
> // Add Chunkers
> load ChunkerSubPipe.piper
>
> // Default fast dictionary lookup
> add DefaultJCasTermAnnotator
>
> // Add Cleartk Entity Attribute annotators
> // see https://issues.apache.org/jira/browse/CTAKES-449
> //load AttributeCleartkSubPipe.piper
> load WindowedAttributeCleartkSubPipe
>
>
> All files seem to have been processed fine, but wondering if something was
> missed, due to the error. If so, how do I construct the
> WindowedAttributeCleartkSubPipe.piper file?
>
> Thanks very much in advance!
>
> Greg--
>
>
> On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:
>
>> Sweet! That was definitely it! It's flying now (granted, our files are
>> not in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm,
>> but still!).
>>
>> Mahalo nui loa!
>>
>>
>>
>> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
>> Sean.Finan@childrens.harvard.edu> wrote:
>>
>>> Hi Greg,
>>>
>>> Check your log to see what component is taking all the time.
>>>
>>> There is a known problem with the cleartk assertion annotators:
>>>
>>> https://issues.apache.org/jira/browse/CTAKES-449
>>>
>>> A partial fix was made in the "windowed" sub-package of
>>> ctakes-assertion: org.apache.ctakes.assertion.medfacts.cleartk.windowed.
>>>
>>> Each of the normal assertion engines has a replacement in the windowed
>>> package.
>>>
>>> If you are using a piper file that contains "load
>>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
>>> replace it with "load WindowedAttributeCleartkSubPipe".
>>>
>>> It isn't a full fix for the problem, and I don't know if it will make
>>> your processing faster, but  you can give it a try.
>>>
>>> Sean
>>>
>>> ________________________________________
>>> From: Greg Silverman <gm...@umn.edu>
>>> Sent: Tuesday, September 24, 2019 6:47 PM
>>> To: dev@ctakes.apache.org
>>> Subject: Large files taking forever to process [EXTERNAL]
>>>
>>> Any suggestions on how to speed up processing large clinical text notes
>>> approaching 13K lines? This is a very old corpus culled from EPIC notes
>>> back in 2009. I thought about splitting the notes into smaller chunks,
>>> but
>>> then I would have to deal with the offsets when analyzing system output
>>> against manual annotations that had been done.
>>>
>>> As is, I've tried different garbage collection options (this seemed to
>>> have
>>> worked well with CLAMP on the same set of notes).
>>>
>>> TIA!
>>>
>>> Greg--
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
>>> >
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
>>>  ›  evaluate-it.org  ‹
>>>
>>
>>
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>> Department of Surgery
>> University of Minnesota
>> gms@umn.edu
>>
>>  ›  evaluate-it.org  ‹
>>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.

Hi Sean,
I just ran another set of notes through cTAKES and noticed the following
error:

log4j: Setting property [conversionPattern] to [%d{dd MMM yyyy HH:mm:ss}
%5p %c{1} - %m%n].
log4j: Adding appender named [consoleAppender] to category [root].
29 Sep 2019 15:31:21 ERROR PiperFileReader - Piper File not found:
WindowedAttributeCleartkSubPipe

Is something missing? This is how my DefaultFastPipeline.piper file looks
 (NB: I also tried load WindowedAttributeCleartkSubPipe.piper, with similar
results)

// Commands and parameters to create a default plaintext document
processing pipeline with UMLS lookup

// Load a simple token processing pipeline from another pipeline file
load DefaultTokenizerPipeline.piper

// Add non-core annotators
add ContextDependentTokenizerAnnotator
addDescription POSTagger

// Add Chunkers
load ChunkerSubPipe.piper

// Default fast dictionary lookup
add DefaultJCasTermAnnotator

// Add Cleartk Entity Attribute annotators
// see https://issues.apache.org/jira/browse/CTAKES-449
//load AttributeCleartkSubPipe.piper
load WindowedAttributeCleartkSubPipe


All files seem to have been processed fine, but wondering if something was
missed, due to the error. If so, how do I construct the
WindowedAttributeCleartkSubPipe.piper file?

Thanks very much in advance!

Greg--


On Tue, Sep 24, 2019 at 7:27 PM Greg Silverman <gm...@umn.edu> wrote:

> Sweet! That was definitely it! It's flying now (granted, our files are not
> in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm, but
> still!).
>
> Mahalo nui loa!
>
>
>
> On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
>> Hi Greg,
>>
>> Check your log to see what component is taking all the time.
>>
>> There is a known problem with the cleartk assertion annotators:
>>
>> https://issues.apache.org/jira/browse/CTAKES-449
>>
>> A partial fix was made in the "windowed" sub-package of ctakes-assertion:
>> org.apache.ctakes.assertion.medfacts.cleartk.windowed.
>>
>> Each of the normal assertion engines has a replacement in the windowed
>> package.
>>
>> If you are using a piper file that contains "load
>> AttributeCleartkSubPipe" as the Default clinical pipeline does, just
>> replace it with "load WindowedAttributeCleartkSubPipe".
>>
>> It isn't a full fix for the problem, and I don't know if it will make
>> your processing faster, but  you can give it a try.
>>
>> Sean
>>
>> ________________________________________
>> From: Greg Silverman <gm...@umn.edu>
>> Sent: Tuesday, September 24, 2019 6:47 PM
>> To: dev@ctakes.apache.org
>> Subject: Large files taking forever to process [EXTERNAL]
>>
>> Any suggestions on how to speed up processing large clinical text notes
>> approaching 13K lines? This is a very old corpus culled from EPIC notes
>> back in 2009. I thought about splitting the notes into smaller chunks, but
>> then I would have to deal with the offsets when analyzing system output
>> against manual annotations that had been done.
>>
>> As is, I've tried different garbage collection options (this seemed to
>> have
>> worked well with CLAMP on the same set of notes).
>>
>> TIA!
>>
>> Greg--
>>
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
>> >
>> Department of Surgery
>> University of Minnesota
>> gms@umn.edu
>>
>>  ›  evaluate-it.org  ‹
>>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: Large files taking forever to process [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.

Sweet! That was definitely it! It's flying now (granted, our files are not
in the > 1 mb realm, like it the jira issue - just in the nnn.kb realm, but
still!).

Mahalo nui loa!



On Tue, Sep 24, 2019 at 6:29 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Greg,
>
> Check your log to see what component is taking all the time.
>
> There is a known problem with the cleartk assertion annotators:
>
> https://issues.apache.org/jira/browse/CTAKES-449
>
> A partial fix was made in the "windowed" sub-package of ctakes-assertion:
> org.apache.ctakes.assertion.medfacts.cleartk.windowed.
>
> Each of the normal assertion engines has a replacement in the windowed
> package.
>
> If you are using a piper file that contains "load AttributeCleartkSubPipe"
> as the Default clinical pipeline does, just replace it with "load
> WindowedAttributeCleartkSubPipe".
>
> It isn't a full fix for the problem, and I don't know if it will make your
> processing faster, but  you can give it a try.
>
> Sean
>
> ________________________________________
> From: Greg Silverman <gm...@umn.edu>
> Sent: Tuesday, September 24, 2019 6:47 PM
> To: dev@ctakes.apache.org
> Subject: Large files taking forever to process [EXTERNAL]
>
> Any suggestions on how to speed up processing large clinical text notes
> approaching 13K lines? This is a very old corpus culled from EPIC notes
> back in 2009. I thought about splitting the notes into smaller chunks, but
> then I would have to deal with the offsets when analyzing system output
> against manual annotations that had been done.
>
> As is, I've tried different garbage collection options (this seemed to have
> worked well with CLAMP on the same set of notes).
>
> TIA!
>
> Greg--
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=kVCVyGR2m-zb7CsPmrrCeBL1N-9Z6tXZOp869xqkcBQ&s=TEirYUPMXTOjZ1PoJMxTXt7M8I5axwQI9zzNrvLmGRo&e=
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹