You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ctakes.apache.org by Greg Silverman <gm...@umn.edu.INVALID> on 2021/01/23 17:05:38 UTC

getting job information

Hi all,
Is there a way to easily generate a performance report similar to the one
generated by MetaMap (with timings for each task, etc.)?

Thanks in advance!

Greg--

-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

Re: performance report [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.

Great, thanks Greg.  I'd like to see the kind of stats that are available
beyond what one can scrape from log4j

Peter

On Mon, Jan 25, 2021 at 5:16 PM Greg Silverman <gm...@umn.edu.invalid> wrote:

> Hi Sean,
> Thanks! I'll give it a whirl and let you know how it works out.
>
> Best!
>
> On Mon, Jan 25, 2021 at 8:48 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Greg, Peter,
> >
> > I believe that the performance report comes from a
> > CollectionProcessingEngine (CPE)
> >
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html
> >
> >
> > I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the
> > tool's name, but that may have changed in recent years.
> >
> > The PipelineBuilder class in ctakes.core used by the PiperFileRunner
> could
> > be changed to use this style of running a single-threaded pipeline -
> right
> > now it uses a simpler UIMAFit method.
> > The code changes are relatively minor, but obviously significant testing
> > would be required.  The ctakes PipelineBuilder does use a CPE for
> > multi-threaded pipelines, so there has already been some testing on that
> > front.
> >
> > You can look at the ctakes PipelineBuilder run() method.  If you get rid
> > of the if (threadCount==1) {..} else {   the the CPE will always be used.
> > Then just add a cpe.getPerformanceReport() after cpe.process() you should
> > have a ProcessTrace object.  This is where my guessing ends as I have
> never
> > used a ProcessTrace and don't know exactly what to beg of it.
> >
> > I hope that is a decent start,
> > Sean
> > ________________________________________
> > From: Greg Silverman <gm...@umn.edu.INVALID>
> > Sent: Saturday, January 23, 2021 3:01 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: performance report [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Peter,
> > I have no doubt about performance differences regarding variance between
> > note styles and pipeline components.
> >
> > We're looking for a way to benchmark the standard/non-customized pipeline
> > performance for processing a largish set of identical notes using several
> > clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
> > clamp). At the command line, both metamap and biomedicus output a
> standard
> > performance report with total timings and the details for each specific
> > pipeline component. I assume there is a way to enable the performance
> > report output available in the GUI version of ctakes at the command line
> -
> > which is what I'm really interested in.
> >
> > We're fine with information at a very coarse level, since we're
> interested
> > in a particular note type, so the aforementioned report should be
> > sufficient. I'm just wondering how to enable it using the standard
> pipeline
> > in cTAKES.
> >
> > Thanks!
> >
> > Greg--
> >
> >
> >
> > On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <
> > pabramowitsch@gmail.com>
> > wrote:
> >
> > > Hi Greg,
> > >
> > > I’ve found that there’s so much difference between note styles that
> have
> > > performance implications and so many interactions between pipeline
> > > configurations which affect overall performance, that really the only
> way
> > > to get a sense of performance is either on a vary coarse level,
> measuring
> > > process time across large collections of varied notes, or very granular
> > > using something like jvisualvm.   Using the latter I saw some
> surprising
> > > things, some of which I was able to tackle with minor software changes,
> > > while others are deep in UIMA utilities used by cTakes..  The biggest
> > > factor in my experience after processing millions of notes is after
> they
> > > have reached about 5k AND are missing punctuation.  At around this size
> > > begins a geometric rise in complexity of internal structures that
> depend
> > on
> > > sentences and a serious elevation of processing time.
> > >
> > > Peter
> > >
> > > Sent from my iPad
> > >
> > > > On Jan 23, 2021, at 18:09, Greg Silverman <gm...@umn.edu.invalid>
> wrote:
> > > >
> > > > I found this:
> > > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI&e=
> > , which
> > > > states: "A performance report is generated when the process is done."
> > > >
> > > > However, we are running this from the command line and no such report
> > is
> > > > being generated.
> > > >
> > > > Thanks!
> > > >
> > > >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu>
> wrote:
> > > >>
> > > >> Hi all,
> > > >> Is there a way to easily generate a performance report similar to
> the
> > > one
> > > >> generated by MetaMap (with timings for each task, etc.)?
> > > >>
> > > >> Thanks in advance!
> > > >>
> > > >> Greg--
> > > >>
> > > >> --
> > > >> Greg M. Silverman
> > > >> Senior Systems Developer
> > > >> NLP/IE <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> > >
> > > >> Department of Surgery
> > > >> University of Minnesota
> > > >> gms@umn.edu
> > > >>
> > > >>
> > > >
> > > > --
> > > > Greg M. Silverman
> > > > Senior Systems Developer
> > > > NLP/IE <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> > >
> > > > Department of Surgery
> > > > University of Minnesota
> > > > gms@umn.edu
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> > >
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>

Re: performance report [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu.INVALID>.

Hi Sean,
Thanks! I'll give it a whirl and let you know how it works out.

Best!

On Mon, Jan 25, 2021 at 8:48 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Greg, Peter,
>
> I believe that the performance report comes from a
> CollectionProcessingEngine (CPE)
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html
>
>
> I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the
> tool's name, but that may have changed in recent years.
>
> The PipelineBuilder class in ctakes.core used by the PiperFileRunner could
> be changed to use this style of running a single-threaded pipeline - right
> now it uses a simpler UIMAFit method.
> The code changes are relatively minor, but obviously significant testing
> would be required.  The ctakes PipelineBuilder does use a CPE for
> multi-threaded pipelines, so there has already been some testing on that
> front.
>
> You can look at the ctakes PipelineBuilder run() method.  If you get rid
> of the if (threadCount==1) {..} else {   the the CPE will always be used.
> Then just add a cpe.getPerformanceReport() after cpe.process() you should
> have a ProcessTrace object.  This is where my guessing ends as I have never
> used a ProcessTrace and don't know exactly what to beg of it.
>
> I hope that is a decent start,
> Sean
> ________________________________________
> From: Greg Silverman <gm...@umn.edu.INVALID>
> Sent: Saturday, January 23, 2021 3:01 PM
> To: dev@ctakes.apache.org
> Subject: Re: performance report [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Peter,
> I have no doubt about performance differences regarding variance between
> note styles and pipeline components.
>
> We're looking for a way to benchmark the standard/non-customized pipeline
> performance for processing a largish set of identical notes using several
> clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
> clamp). At the command line, both metamap and biomedicus output a standard
> performance report with total timings and the details for each specific
> pipeline component. I assume there is a way to enable the performance
> report output available in the GUI version of ctakes at the command line -
> which is what I'm really interested in.
>
> We're fine with information at a very coarse level, since we're interested
> in a particular note type, so the aforementioned report should be
> sufficient. I'm just wondering how to enable it using the standard pipeline
> in cTAKES.
>
> Thanks!
>
> Greg--
>
>
>
> On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Hi Greg,
> >
> > I’ve found that there’s so much difference between note styles that have
> > performance implications and so many interactions between pipeline
> > configurations which affect overall performance, that really the only way
> > to get a sense of performance is either on a vary coarse level, measuring
> > process time across large collections of varied notes, or very granular
> > using something like jvisualvm.   Using the latter I saw some surprising
> > things, some of which I was able to tackle with minor software changes,
> > while others are deep in UIMA utilities used by cTakes..  The biggest
> > factor in my experience after processing millions of notes is after they
> > have reached about 5k AND are missing punctuation.  At around this size
> > begins a geometric rise in complexity of internal structures that depend
> on
> > sentences and a serious elevation of processing time.
> >
> > Peter
> >
> > Sent from my iPad
> >
> > > On Jan 23, 2021, at 18:09, Greg Silverman <gm...@umn.edu.invalid> wrote:
> > >
> > > I found this:
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI&e=
> , which
> > > states: "A performance report is generated when the process is done."
> > >
> > > However, we are running this from the command line and no such report
> is
> > > being generated.
> > >
> > > Thanks!
> > >
> > >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu> wrote:
> > >>
> > >> Hi all,
> > >> Is there a way to easily generate a performance report similar to the
> > one
> > >> generated by MetaMap (with timings for each task, etc.)?
> > >>
> > >> Thanks in advance!
> > >>
> > >> Greg--
> > >>
> > >> --
> > >> Greg M. Silverman
> > >> Senior Systems Developer
> > >> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> >
> > >> Department of Surgery
> > >> University of Minnesota
> > >> gms@umn.edu
> > >>
> > >>
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> >
> > > Department of Surgery
> > > University of Minnesota
> > > gms@umn.edu
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

Re: performance report [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.

Thanks Sean.  The CPE ProcessTrace object was something I wasn't familiar
with.

Definitely, though, the piper file runner, by default,  should be as
lightweight and simple as possible.  Other options for threading or for
tracing should be injected or layered in without modifying default
behavior.  It is currently very stable.  In my alternative threading model
it runs thirty or more pipeline instances for weeks in a single process
under very heavy stress.

Peter


On Mon, Jan 25, 2021 at 3:48 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Greg, Peter,
>
> I believe that the performance report comes from a
> CollectionProcessingEngine (CPE)
> https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html
>
>
> I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the
> tool's name, but that may have changed in recent years.
>
> The PipelineBuilder class in ctakes.core used by the PiperFileRunner could
> be changed to use this style of running a single-threaded pipeline - right
> now it uses a simpler UIMAFit method.
> The code changes are relatively minor, but obviously significant testing
> would be required.  The ctakes PipelineBuilder does use a CPE for
> multi-threaded pipelines, so there has already been some testing on that
> front.
>
> You can look at the ctakes PipelineBuilder run() method.  If you get rid
> of the if (threadCount==1) {..} else {   the the CPE will always be used.
> Then just add a cpe.getPerformanceReport() after cpe.process() you should
> have a ProcessTrace object.  This is where my guessing ends as I have never
> used a ProcessTrace and don't know exactly what to beg of it.
>
> I hope that is a decent start,
> Sean
> ________________________________________
> From: Greg Silverman <gm...@umn.edu.INVALID>
> Sent: Saturday, January 23, 2021 3:01 PM
> To: dev@ctakes.apache.org
> Subject: Re: performance report [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Peter,
> I have no doubt about performance differences regarding variance between
> note styles and pipeline components.
>
> We're looking for a way to benchmark the standard/non-customized pipeline
> performance for processing a largish set of identical notes using several
> clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
> clamp). At the command line, both metamap and biomedicus output a standard
> performance report with total timings and the details for each specific
> pipeline component. I assume there is a way to enable the performance
> report output available in the GUI version of ctakes at the command line -
> which is what I'm really interested in.
>
> We're fine with information at a very coarse level, since we're interested
> in a particular note type, so the aforementioned report should be
> sufficient. I'm just wondering how to enable it using the standard pipeline
> in cTAKES.
>
> Thanks!
>
> Greg--
>
>
>
> On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Hi Greg,
> >
> > I’ve found that there’s so much difference between note styles that have
> > performance implications and so many interactions between pipeline
> > configurations which affect overall performance, that really the only way
> > to get a sense of performance is either on a vary coarse level, measuring
> > process time across large collections of varied notes, or very granular
> > using something like jvisualvm.   Using the latter I saw some surprising
> > things, some of which I was able to tackle with minor software changes,
> > while others are deep in UIMA utilities used by cTakes..  The biggest
> > factor in my experience after processing millions of notes is after they
> > have reached about 5k AND are missing punctuation.  At around this size
> > begins a geometric rise in complexity of internal structures that depend
> on
> > sentences and a serious elevation of processing time.
> >
> > Peter
> >
> > Sent from my iPad
> >
> > > On Jan 23, 2021, at 18:09, Greg Silverman <gm...@umn.edu.invalid> wrote:
> > >
> > > I found this:
> > >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI&e=
> , which
> > > states: "A performance report is generated when the process is done."
> > >
> > > However, we are running this from the command line and no such report
> is
> > > being generated.
> > >
> > > Thanks!
> > >
> > >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu> wrote:
> > >>
> > >> Hi all,
> > >> Is there a way to easily generate a performance report similar to the
> > one
> > >> generated by MetaMap (with timings for each task, etc.)?
> > >>
> > >> Thanks in advance!
> > >>
> > >> Greg--
> > >>
> > >> --
> > >> Greg M. Silverman
> > >> Senior Systems Developer
> > >> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> >
> > >> Department of Surgery
> > >> University of Minnesota
> > >> gms@umn.edu
> > >>
> > >>
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> >
> > > Department of Surgery
> > > University of Minnesota
> > > gms@umn.edu
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e=
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>

Re: performance report [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.

Hi Greg, Peter,

I believe that the performance report comes from a CollectionProcessingEngine (CPE) https://uima.apache.org/d/uimaj-current/apidocs/org/apache/uima/collection/CollectionProcessingEngine.html  

I think that UIMA's CPE GUI runs the pipeline through a CPE - hence the tool's name, but that may have changed in recent years.

The PipelineBuilder class in ctakes.core used by the PiperFileRunner could be changed to use this style of running a single-threaded pipeline - right now it uses a simpler UIMAFit method.
The code changes are relatively minor, but obviously significant testing would be required.  The ctakes PipelineBuilder does use a CPE for multi-threaded pipelines, so there has already been some testing on that front.

You can look at the ctakes PipelineBuilder run() method.  If you get rid of the if (threadCount==1) {..} else {   the the CPE will always be used.  Then just add a cpe.getPerformanceReport() after cpe.process() you should have a ProcessTrace object.  This is where my guessing ends as I have never used a ProcessTrace and don't know exactly what to beg of it.

I hope that is a decent start,
Sean
________________________________________
From: Greg Silverman <gm...@umn.edu.INVALID>
Sent: Saturday, January 23, 2021 3:01 PM
To: dev@ctakes.apache.org
Subject: Re: performance report [EXTERNAL]

* External Email - Caution *

Hi Peter,
I have no doubt about performance differences regarding variance between
note styles and pipeline components.

We're looking for a way to benchmark the standard/non-customized pipeline
performance for processing a largish set of identical notes using several
clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
clamp). At the command line, both metamap and biomedicus output a standard
performance report with total timings and the details for each specific
pipeline component. I assume there is a way to enable the performance
report output available in the GUI version of ctakes at the command line -
which is what I'm really interested in.

We're fine with information at a very coarse level, since we're interested
in a particular note type, so the aforementioned report should be
sufficient. I'm just wondering how to enable it using the standard pipeline
in cTAKES.

Thanks!

Greg--

On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi Greg,
>
> I’ve found that there’s so much difference between note styles that have
> performance implications and so many interactions between pipeline
> configurations which affect overall performance, that really the only way
> to get a sense of performance is either on a vary coarse level, measuring
> process time across large collections of varied notes, or very granular
> using something like jvisualvm.   Using the latter I saw some surprising
> things, some of which I was able to tackle with minor software changes,
> while others are deep in UIMA utilities used by cTakes..  The biggest
> factor in my experience after processing millions of notes is after they
> have reached about 5k AND are missing punctuation.  At around this size
> begins a geometric rise in complexity of internal structures that depend on
> sentences and a serious elevation of processing time.
>
> Peter
>
> Sent from my iPad
>
> > On Jan 23, 2021, at 18:09, Greg Silverman <gm...@umn.edu.invalid> wrote:
> >
> > I found this:
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__medium.com_-40felix-5Fchan_install-2Dapache-2Dctakes-2D924c40967ce2&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=s-jUaTKHh4ts1f2UzY5nHsKbjA27HDpqAchBF36juTI&e= , which
> > states: "A performance report is generated when the process is done."
> >
> > However, we are running this from the command line and no such report is
> > being generated.
> >
> > Thanks!
> >
> >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu> wrote:
> >>
> >> Hi all,
> >> Is there a way to easily generate a performance report similar to the
> one
> >> generated by MetaMap (with timings for each task, etc.)?
> >>
> >> Thanks in advance!
> >>
> >> Greg--
> >>
> >> --
> >> Greg M. Silverman
> >> Senior Systems Developer
> >> NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= >
> >> Department of Surgery
> >> University of Minnesota
> >> gms@umn.edu
> >>
> >>
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= >
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
>

--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=uuvD9Z5PgR1KUWZ1Dc80V19dfKcr2DTrMuBxe2OCbMc&s=5Kgux8IKOmsj2xjj7DxAhKZf6anK7HF3ddsOhnI1VFM&e= >
Department of Surgery
University of Minnesota
gms@umn.edu

Re: performance report

Posted by Greg Silverman <gm...@umn.edu.INVALID>.

Hi Peter,
I have no doubt about performance differences regarding variance between
note styles and pipeline components.

We're looking for a way to benchmark the standard/non-customized pipeline
performance for processing a largish set of identical notes using several
clinical NLP annotators (specifically, ctakes, biomedicus, metamap and
clamp). At the command line, both metamap and biomedicus output a standard
performance report with total timings and the details for each specific
pipeline component. I assume there is a way to enable the performance
report output available in the GUI version of ctakes at the command line -
which is what I'm really interested in.

We're fine with information at a very coarse level, since we're interested
in a particular note type, so the aforementioned report should be
sufficient. I'm just wondering how to enable it using the standard pipeline
in cTAKES.

Thanks!

Greg--

On Sat, Jan 23, 2021 at 12:26 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi Greg,
>
> I’ve found that there’s so much difference between note styles that have
> performance implications and so many interactions between pipeline
> configurations which affect overall performance, that really the only way
> to get a sense of performance is either on a vary coarse level, measuring
> process time across large collections of varied notes, or very granular
> using something like jvisualvm.   Using the latter I saw some surprising
> things, some of which I was able to tackle with minor software changes,
> while others are deep in UIMA utilities used by cTakes..  The biggest
> factor in my experience after processing millions of notes is after they
> have reached about 5k AND are missing punctuation.  At around this size
> begins a geometric rise in complexity of internal structures that depend on
> sentences and a serious elevation of processing time.
>
> Peter
>
> Sent from my iPad
>
> > On Jan 23, 2021, at 18:09, Greg Silverman <gm...@umn.edu.invalid> wrote:
> >
> > I found this:
> > https://medium.com/@felix_chan/install-apache-ctakes-924c40967ce2, which
> > states: "A performance report is generated when the process is done."
> >
> > However, we are running this from the command line and no such report is
> > being generated.
> >
> > Thanks!
> >
> >> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu> wrote:
> >>
> >> Hi all,
> >> Is there a way to easily generate a performance report similar to the
> one
> >> generated by MetaMap (with timings for each task, etc.)?
> >>
> >> Thanks in advance!
> >>
> >> Greg--
> >>
> >> --
> >> Greg M. Silverman
> >> Senior Systems Developer
> >> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> >> Department of Surgery
> >> University of Minnesota
> >> gms@umn.edu
> >>
> >>
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
>

-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

Re: performance report

Posted by Peter Abramowitsch <pa...@gmail.com>.

Hi Greg,

I’ve found that there’s so much difference between note styles that have performance implications and so many interactions between pipeline configurations which affect overall performance, that really the only way to get a sense of performance is either on a vary coarse level, measuring  process time across large collections of varied notes, or very granular using something like jvisualvm.   Using the latter I saw some surprising things, some of which I was able to tackle with minor software changes, while others are deep in UIMA utilities used by cTakes..  The biggest factor in my experience after processing millions of notes is after they have reached about 5k AND are missing punctuation.  At around this size begins a geometric rise in complexity of internal structures that depend on sentences and a serious elevation of processing time. 

Peter

Sent from my iPad

> On Jan 23, 2021, at 18:09, Greg Silverman <gm...@umn.edu.invalid> wrote:
> 
> I found this:
> https://medium.com/@felix_chan/install-apache-ctakes-924c40967ce2, which
> states: "A performance report is generated when the process is done."
> 
> However, we are running this from the command line and no such report is
> being generated.
> 
> Thanks!
> 
>> On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu> wrote:
>> 
>> Hi all,
>> Is there a way to easily generate a performance report similar to the one
>> generated by MetaMap (with timings for each task, etc.)?
>> 
>> Thanks in advance!
>> 
>> Greg--
>> 
>> --
>> Greg M. Silverman
>> Senior Systems Developer
>> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>> Department of Surgery
>> University of Minnesota
>> gms@umn.edu
>> 
>> 
> 
> -- 
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu

performance report

Posted by Greg Silverman <gm...@umn.edu.INVALID>.

I found this:
https://medium.com/@felix_chan/install-apache-ctakes-924c40967ce2, which
states: "A performance report is generated when the process is done."

However, we are running this from the command line and no such report is
being generated.

Thanks!

On Sat, Jan 23, 2021 at 11:05 AM Greg Silverman <gm...@umn.edu> wrote:

> Hi all,
> Is there a way to easily generate a performance report similar to the one
> generated by MetaMap (with timings for each task, etc.)?
>
> Thanks in advance!
>
> Greg--
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>
>

-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu