You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2019/02/19 17:12:29 UTC

Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

Hi Jeff,

The short answer: No, LVG is not in the pipeline created by the DefaultFastPipeline.piper

Longer answer:
In older versions of dictionary lookup the Lexical Variant Generator module (LVG) was recommended to capture lexical variants of terms.  However, the dictionary resource already contains variants so the LVG module should not make much of a difference. When the fast lookup was new several years ago I ran a test with and without LVG on two datasets and the difference was along the lines of +1-2% recall, -1% precision.  

I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste of the previous .getClinicalPipeline() but with the dictionary module replaced.  So, LVG is still in that method -created pipeline.

When I (more recently) wrote that piper file that you reference I left out LVG as the added burden didn't seem to warrant its presence.  When I say burden I don't just mean speed decrease and memory footprint.  There have been a lot of configuration problems with LVG on various systems which led to difficulty using ctakes.

The diagram that you reference places LVG after the dictionary lookup, and after the part of speech tagger, while the page on lvg https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+LVG lists those as the two modules that may benefit from its presence.  That diagram is very old and should definitely be updated.  Both the diagram and the page on lvg include information that precedes (does not account for) the existence of the fast dictionary lookup.

Sean


________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Tuesday, February 19, 2019 10:53 AM
To: dev@ctakes.apache.org
Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

Hi,

I was wondering if the LVG Annotator is included DefaultFastPipeline.piper
<https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns&e=>.
I have tried to trace through all the includes, but I cannot find it.
However, when I look at the code for the
ClinicalPipelineFactory.getFastPipeline() it seems to be included.
<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4&e=>
From
documentation in this flow diagram
<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_68718172_ctakes-2D3.1-2Ddependencies.png-3Fversion-3D1-26modificationDate-3D1488992146000-26api-3Dv2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=4yYVqkyLiodAWATji1EjSwoMh-YpU7qTz2J8tZvRT6I&e=>
from
the components documentation page
<https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BComponent-2BUse-2BGuide&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=m-9MenhmNTr2vdVAhCvKgBt48OUiQB8R2TkR7fEYtsY&e=>,
it seems to be a recommended component for the dictionary annotator.

Thanks for your help,
Jeff

Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Sean,

I may just be missing something obvious, but having signed up for the
confluence wiki I don't see any Edit button on the cTAKES pages even though
it does look like everyone has permission to edit.

On Fri, Feb 22, 2019 at 11:01 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> >Do you accept documentation
> contributions?
>
> Of course!! We are completely open source, documentation included.  I
> think that you should be able to edit the wiki after signing up on
> confluence:
> https://cwiki.apache.org/confluence/signup.action
>
> Cheers,
> Sean
>
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Friday, February 22, 2019 10:57 AM
> To: dev@ctakes.apache.org
> Subject: Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
>
> Thank you Sean, that clears it up for me. Do you accept documentation
> contributions? I might be able to document a few of the things I have
> learned along the way setting up ctakes.
>
> On Tue, Feb 19, 2019 at 12:14 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Jeff,
> >
> > The short answer: No, LVG is not in the pipeline created by the
> > DefaultFastPipeline.piper
> >
> > Longer answer:
> > In older versions of dictionary lookup the Lexical Variant Generator
> > module (LVG) was recommended to capture lexical variants of terms.
> > However, the dictionary resource already contains variants so the LVG
> > module should not make much of a difference. When the fast lookup was new
> > several years ago I ran a test with and without LVG on two datasets and
> the
> > difference was along the lines of +1-2% recall, -1% precision.
> >
> > I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste
> of
> > the previous .getClinicalPipeline() but with the dictionary module
> > replaced.  So, LVG is still in that method -created pipeline.
> >
> > When I (more recently) wrote that piper file that you reference I left
> out
> > LVG as the added burden didn't seem to warrant its presence.  When I say
> > burden I don't just mean speed decrease and memory footprint.  There have
> > been a lot of configuration problems with LVG on various systems which
> led
> > to difficulty using ctakes.
> >
> > The diagram that you reference places LVG after the dictionary lookup,
> and
> > after the part of speech tagger, while the page on lvg
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BLVG&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Odd-RqfBFrKxLVWy-Nf_-gmZ-UKh9phdcGO0ifqffis&s=ly656xPq-DlDPCj5eTsrlErYHA6FU7gC8h_nofoZRTo&e=
> lists
> > those as the two modules that may benefit from its presence.  That
> diagram
> > is very old and should definitely be updated.  Both the diagram and the
> > page on lvg include information that precedes (does not account for) the
> > existence of the fast dictionary lookup.
> >
> > Sean
> >
> >
> > ________________________________________
> > From: Jeffrey Miller <je...@gmail.com>
> > Sent: Tuesday, February 19, 2019 10:53 AM
> > To: dev@ctakes.apache.org
> > Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
> >
> > Hi,
> >
> > I was wondering if the LVG Annotator is included
> DefaultFastPipeline.piper
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns&e=
> > >.
> > I have tried to trace through all the includes, but I cannot find it.
> > However, when I look at the code for the
> > ClinicalPipelineFactory.getFastPipeline() it seems to be included.
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4&e=
> > >
> > From
> > documentation in this flow diagram
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_68718172_ctakes-2D3.1-2Ddependencies.png-3Fversion-3D1-26modificationDate-3D1488992146000-26api-3Dv2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=4yYVqkyLiodAWATji1EjSwoMh-YpU7qTz2J8tZvRT6I&e=
> > >
> > from
> > the components documentation page
> > <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BComponent-2BUse-2BGuide&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=m-9MenhmNTr2vdVAhCvKgBt48OUiQB8R2TkR7fEYtsY&e=
> > >,
> > it seems to be a recommended component for the dictionary annotator.
> >
> > Thanks for your help,
> > Jeff
> >
>

Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Jeff,

>Do you accept documentation
contributions?

Of course!! We are completely open source, documentation included.  I think that you should be able to edit the wiki after signing up on confluence:
https://cwiki.apache.org/confluence/signup.action

Cheers,
Sean

________________________________________
From: Jeffrey Miller <je...@gmail.com>
Sent: Friday, February 22, 2019 10:57 AM
To: dev@ctakes.apache.org
Subject: Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

Thank you Sean, that clears it up for me. Do you accept documentation
contributions? I might be able to document a few of the things I have
learned along the way setting up ctakes.

On Tue, Feb 19, 2019 at 12:14 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The short answer: No, LVG is not in the pipeline created by the
> DefaultFastPipeline.piper
>
> Longer answer:
> In older versions of dictionary lookup the Lexical Variant Generator
> module (LVG) was recommended to capture lexical variants of terms.
> However, the dictionary resource already contains variants so the LVG
> module should not make much of a difference. When the fast lookup was new
> several years ago I ran a test with and without LVG on two datasets and the
> difference was along the lines of +1-2% recall, -1% precision.
>
> I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste of
> the previous .getClinicalPipeline() but with the dictionary module
> replaced.  So, LVG is still in that method -created pipeline.
>
> When I (more recently) wrote that piper file that you reference I left out
> LVG as the added burden didn't seem to warrant its presence.  When I say
> burden I don't just mean speed decrease and memory footprint.  There have
> been a lot of configuration problems with LVG on various systems which led
> to difficulty using ctakes.
>
> The diagram that you reference places LVG after the dictionary lookup, and
> after the part of speech tagger, while the page on lvg
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2B-2D-2BLVG&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Odd-RqfBFrKxLVWy-Nf_-gmZ-UKh9phdcGO0ifqffis&s=ly656xPq-DlDPCj5eTsrlErYHA6FU7gC8h_nofoZRTo&e= lists
> those as the two modules that may benefit from its presence.  That diagram
> is very old and should definitely be updated.  Both the diagram and the
> page on lvg include information that precedes (does not account for) the
> existence of the fast dictionary lookup.
>
> Sean
>
>
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Tuesday, February 19, 2019 10:53 AM
> To: dev@ctakes.apache.org
> Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
>
> Hi,
>
> I was wondering if the LVG Annotator is included DefaultFastPipeline.piper
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns&e=
> >.
> I have tried to trace through all the includes, but I cannot find it.
> However, when I look at the code for the
> ClinicalPipelineFactory.getFastPipeline() it seems to be included.
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4&e=
> >
> From
> documentation in this flow diagram
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_68718172_ctakes-2D3.1-2Ddependencies.png-3Fversion-3D1-26modificationDate-3D1488992146000-26api-3Dv2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=4yYVqkyLiodAWATji1EjSwoMh-YpU7qTz2J8tZvRT6I&e=
> >
> from
> the components documentation page
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BComponent-2BUse-2BGuide&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=m-9MenhmNTr2vdVAhCvKgBt48OUiQB8R2TkR7fEYtsY&e=
> >,
> it seems to be a recommended component for the dictionary annotator.
>
> Thanks for your help,
> Jeff
>

Re: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]

Posted by Jeffrey Miller <je...@gmail.com>.
Thank you Sean, that clears it up for me. Do you accept documentation
contributions? I might be able to document a few of the things I have
learned along the way setting up ctakes.

On Tue, Feb 19, 2019 at 12:14 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Jeff,
>
> The short answer: No, LVG is not in the pipeline created by the
> DefaultFastPipeline.piper
>
> Longer answer:
> In older versions of dictionary lookup the Lexical Variant Generator
> module (LVG) was recommended to capture lexical variants of terms.
> However, the dictionary resource already contains variants so the LVG
> module should not make much of a difference. When the fast lookup was new
> several years ago I ran a test with and without LVG on two datasets and the
> difference was along the lines of +1-2% recall, -1% precision.
>
> I think that ClinicalPipelineFactory.getFastPipeline() was a copy-paste of
> the previous .getClinicalPipeline() but with the dictionary module
> replaced.  So, LVG is still in that method -created pipeline.
>
> When I (more recently) wrote that piper file that you reference I left out
> LVG as the added burden didn't seem to warrant its presence.  When I say
> burden I don't just mean speed decrease and memory footprint.  There have
> been a lot of configuration problems with LVG on various systems which led
> to difficulty using ctakes.
>
> The diagram that you reference places LVG after the dictionary lookup, and
> after the part of speech tagger, while the page on lvg
> https://cwiki.apache.org/confluence/display/CTAKES/cTAKES+4.0+-+LVG lists
> those as the two modules that may benefit from its presence.  That diagram
> is very old and should definitely be updated.  Both the diagram and the
> page on lvg include information that precedes (does not account for) the
> existence of the fast dictionary lookup.
>
> Sean
>
>
> ________________________________________
> From: Jeffrey Miller <je...@gmail.com>
> Sent: Tuesday, February 19, 2019 10:53 AM
> To: dev@ctakes.apache.org
> Subject: DefaultFastPipeline.piper and LVG Annotator [EXTERNAL]
>
> Hi,
>
> I was wondering if the LVG Annotator is included DefaultFastPipeline.piper
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__svn.apache.org_repos_asf_ctakes_trunk_ctakes-2Dclinical-2Dpipeline-2Dres_src_main_resources_org_apache_ctakes_clinical_pipeline_DefaultFastPipeline.piper&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=3Sgs1Jc-C37kcy1efCEhU_3RV4aFipAt1lbTO0Wu_Ns&e=
> >.
> I have tried to trace through all the includes, but I cannot find it.
> However, when I look at the code for the
> ClinicalPipelineFactory.getFastPipeline() it seems to be included.
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_ctakes_blob_513bb49ebb98c4ac63f690c7b88a82aff18947b8_ctakes-2Dclinical-2Dpipeline_src_main_java_org_apache_ctakes_clinicalpipeline_ClinicalPipelineFactory.java-23L98&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=kmZDExXBOyXg84kix__UvgD3LniSHa8MgL8K5fK3XC4&e=
> >
> From
> documentation in this flow diagram
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_download_attachments_68718172_ctakes-2D3.1-2Ddependencies.png-3Fversion-3D1-26modificationDate-3D1488992146000-26api-3Dv2&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=4yYVqkyLiodAWATji1EjSwoMh-YpU7qTz2J8tZvRT6I&e=
> >
> from
> the components documentation page
> <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CTAKES_cTAKES-2B4.0-2BComponent-2BUse-2BGuide&d=DwIBaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=TrXJiUUghmeYrvrV21K68pCfJk5KnG-xwBfzwVbxoRo&s=m-9MenhmNTr2vdVAhCvKgBt48OUiQB8R2TkR7fEYtsY&e=
> >,
> it seems to be a recommended component for the dictionary annotator.
>
> Thanks for your help,
> Jeff
>