You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by "Finan, Sean" <Se...@childrens.harvard.edu> on 2019/05/16 00:43:27 UTC

Re: acronyms/abbreviations [EXTERNAL]

Hi Greg,

What exactly do you need ?

There are a lot of output components that can produce different formats containing various types of information.

Do you prefer to parse ml ?  Or is columnized text output ok?  Does this go to a post-processing engine or a human user?

Thanks,

Sean
________________________________________
From: Greg Silverman <gm...@umn.edu>
Sent: Wednesday, May 15, 2019 7:09 PM
To: dev@ctakes.apache.org
Subject: acronyms/abbreviations [EXTERNAL]

How can I get these from the XMI annotations?

Thanks!

Greg--

--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: acronyms/abbreviations [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
OMG,  I hadn't even thought of "ephemeral vocabulary".  Great example!

Peter

On Sun, May 19, 2019 at 6:05 PM Greg Silverman <gm...@umn.edu> wrote:

> Peter,
> You'll like this example then from a manuscript we submitted to MedInfo:
> "It is important to point out that while some system annotation types
> scored really well using the geometric mean method to identify best-at-task
> annotation systems,  on examination, since our method was unable to provide
> lexical disambiguation of terms, there were some misclassifications. An
> example was for the entity Speed of Vehicle where the system cTAKES perform
> very well with the MedicationsMention annotation type. On further
> examination, the terms that provided a match were “speed” and “mph,” which
> have different contextual meanings from those having to do with physical
> measurement with respect to velocity.  In this case, “speed” and “mph” are
> common street drugs..."
>
> Greg--
>
>
> On Sat, May 18, 2019 at 3:12 AM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Greg,  Thanks for these links.  I really enjoy discussions of this kind
> and
> > am glad to see that someone is trying these knowledge based approaches
> and
> > reporting back.  I've played with the Wordnet APIs and believe that it is
> > possible to use the hyper/hypo-nym constructs to help score different
> > interpretations of ambiguous terms.  Additionally, I think Ngram fitting
> > can be used to help rate the relevance of one definition over another.
> > But I'd bet that the effectiveness these approaches is highly dependent
> on
> > grammatically complete and correct text.   Clinical notes are another
> > thing.
> >
> > I had a perfect example of this problem the other day.   A note stating
> > something like "nursing care resumed after 12pm".  Ctakes had tagged this
> > with both lactation-related and nursing-service-related CUIs.  But the
> > patient was an elderly man.  Clearly the context was not to be found in
> the
> > grammar but in the clinical setting....Thus there is a kind of meta
> context
> > (patient's age, gender, disease state) that could also contribute to
> > disambiguation.  This could be achieved by ML methods trained on marked
> up
> > notes... very labor intensive, or by some kind of rules mechanism, but
> that
> > would also be labor intensive - a never-to-be-finished effort.  These
> might
> > require the creation of an instant/lightweight VMR to structure the
> > contextual elements from the note that the scoring mechanism would reason
> > over.    But I'd prefer a Campari and soda.
> >
> >
> >
> > On Sat, May 18, 2019 at 3:24 AM Greg Silverman <gm...@umn.edu> wrote:
> >
> > > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111590/
> > >
> > > On Fri, May 17, 2019 at 8:23 PM Greg Silverman <gm...@umn.edu> wrote:
> > >
> > > > Yes, and regarding your last paragraph: This is where disambiguation
> > > comes
> > > > into play. Here is one method:
> > > >
> > >
> >
> https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume23/montoyo05a-html/node9.html
> > > >
> > > > I'm not sure how either MetaMap or BioMedICUS do disambiguation, but
> > > since
> > > > are both open source, they would be potential resources..
> > > >
> > > > Greg--
> > > >
> > > > On Fri, May 17, 2019 at 2:17 AM Peter Abramowitsch <
> > > > pabramowitsch@gmail.com> wrote:
> > > >
> > > >> Seems like some kind of simple heuristic should work:    Isn't it
> > just a
> > > >> case of looking at the in/out text offsets of the source text for an
> > > >> identified annotation and then comparing that with the canonical
> text
> > of
> > > >> the CUI or SnomedID.   If the source text is just a few of
> characters
> > > (say
> > > >> less than 5) and the Levenstein difference between it and the
> > canonical
> > > >> text is > than the length of the source text,  you're pretty sure to
> > > have
> > > >> an acronym.
> > > >>
> > > >> For instance if cTakes finds   "MI" and assigns SNOMED  22298006 or
> > CUI
> > > >> C0027051 with canonical text "Myocardial Infarction"*, *then with
> the
> > > >> in/out offsets into the text you should be able to run this
> heuristic
> > > >>
> > > >> The problem (and I see this in my work) is that many acronyms have
> > > >> multiple
> > > >> meanings.  Thus, you may accurately be able to tell that your
> > identified
> > > >> concept came from an acronym, but it was the wrong concept!!
> > > >>
> > > >> Peter
> > > >>
> > > >> On Thu, May 16, 2019 at 4:31 AM Greg Silverman <gm...@umn.edu> wrote:
> > > >>
> > > >> > Got it!
> > > >> >
> > > >> > Yes, I understand the formidability, given the need for
> > > disambiguation,
> > > >> > etc. Was just curious if this existed.
> > > >> >
> > > >> > Thanks!
> > > >> >
> > > >> >
> > > >> > On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
> > > >> > Sean.Finan@childrens.harvard.edu> wrote:
> > > >> >
> > > >> > > Hi Greg,
> > > >> > >
> > > >> > > Ok, that gives me a great vector toward addressing your needs.
> > > >> > >
> > > >> > > I don't know of any ctakes components that indicate whether or
> not
> > > >> > > discovered concepts come from acronyms, abbreviations or
> -replete-
> > > >> text
> > > >> > > mentions.
> > > >> > >
> > > >> > > There should be something that does that.   Open source ---->
> Any
> > > >> > > champions available?
> > > >> > >
> > > >> > > Right now no abbreviation or metonym information is provided in
> > the
> > > >> > > standard components.    If it can be extruded from source then
> it
> > > >> should
> > > >> > be
> > > >> > > provided.
> > > >> > >
> > > >> > > If anybody has such a component, please let us know !   This is
> a
> > > >> > > formidable (imio) nlp problem, so call your kudos with a
> solution!
> > > >> > >
> > > >> > > Sean
> > > >> > >
> > > >> > > ________________________________________
> > > >> > > From: Greg Silverman <gm...@umn.edu>
> > > >> > > Sent: Wednesday, May 15, 2019 9:21 PM
> > > >> > > To: dev@ctakes.apache.org
> > > >> > > Subject: Re: acronyms/abbreviations [EXTERNAL]
> > > >> > >
> > > >> > > I'm just wondering how acronyms are identified as acronyms in
> > cTAKES
> > > >> (for
> > > >> > > example, in MetaMap, there is an attribute in the Document
> > > annotation
> > > >> > with
> > > >> > > ids of where they are in the Utterance annotation; and in
> > > BioMedICUS,
> > > >> > there
> > > >> > > is an acronym annotation type, etc.). From examining the XMI
> CAS,
> > it
> > > >> is
> > > >> > not
> > > >> > > obvious.
> > > >> > >
> > > >> > > We're extracting the desired annotations from the XMI CAS using
> a
> > > >> custom
> > > >> > > Groovy client.
> > > >> > >
> > > >> > > Thanks!
> > > >> > >
> > > >> > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
> > > >> > > Sean.Finan@childrens.harvard.edu> wrote:
> > > >> > >
> > > >> > > > Hi Greg,
> > > >> > > >
> > > >> > > > What exactly do you need ?
> > > >> > > >
> > > >> > > > There are a lot of output components that can produce
> different
> > > >> formats
> > > >> > > > containing various types of information.
> > > >> > > >
> > > >> > > > Do you prefer to parse ml ?  Or is columnized text output ok?
> > > Does
> > > >> > this
> > > >> > > > go to a post-processing engine or a human user?
> > > >> > > >
> > > >> > > > Thanks,
> > > >> > > >
> > > >> > > > Sean
> > > >> > > > ________________________________________
> > > >> > > > From: Greg Silverman <gm...@umn.edu>
> > > >> > > > Sent: Wednesday, May 15, 2019 7:09 PM
> > > >> > > > To: dev@ctakes.apache.org
> > > >> > > > Subject: acronyms/abbreviations [EXTERNAL]
> > > >> > > >
> > > >> > > > How can I get these from the XMI annotations?
> > > >> > > >
> > > >> > > > Thanks!
> > > >> > > >
> > > >> > > > Greg--
> > > >> > > >
> > > >> > > > --
> > > >> > > > Greg M. Silverman
> > > >> > > > Senior Systems Developer
> > > >> > > > NLP/IE <
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> > > >> > > > >
> > > >> > > > University of Minnesota
> > > >> > > > gms@umn.edu
> > > >> > > >
> > > >> > > >  ›  evaluate-it.org  ‹
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Greg M. Silverman
> > > >> > > Senior Systems Developer
> > > >> > > NLP/IE <
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
> > > >> > > >
> > > >> > > University of Minnesota
> > > >> > > gms@umn.edu
> > > >> > >
> > > >> > >  ›  evaluate-it.org  ‹
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Greg M. Silverman
> > > >> > Senior Systems Developer
> > > >> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > > >> > University of Minnesota
> > > >> > gms@umn.edu
> > > >> >
> > > >> >  ›  evaluate-it.org  ‹
> > > >> >
> > > >>
> > > >
> > > >
> > > > --
> > > > Greg M. Silverman
> > > > Senior Systems Developer
> > > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > > > University of Minnesota
> > > > gms@umn.edu
> > > >
> > > >  ›  evaluate-it.org  ‹
> > > >
> > >
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > > University of Minnesota
> > > gms@umn.edu
> > >
> > >  ›  evaluate-it.org  ‹
> > >
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>

Re: acronyms/abbreviations [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.
Peter,
You'll like this example then from a manuscript we submitted to MedInfo:
"It is important to point out that while some system annotation types
scored really well using the geometric mean method to identify best-at-task
annotation systems,  on examination, since our method was unable to provide
lexical disambiguation of terms, there were some misclassifications. An
example was for the entity Speed of Vehicle where the system cTAKES perform
very well with the MedicationsMention annotation type. On further
examination, the terms that provided a match were “speed” and “mph,” which
have different contextual meanings from those having to do with physical
measurement with respect to velocity.  In this case, “speed” and “mph” are
common street drugs..."

Greg--


On Sat, May 18, 2019 at 3:12 AM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Greg,  Thanks for these links.  I really enjoy discussions of this kind and
> am glad to see that someone is trying these knowledge based approaches and
> reporting back.  I've played with the Wordnet APIs and believe that it is
> possible to use the hyper/hypo-nym constructs to help score different
> interpretations of ambiguous terms.  Additionally, I think Ngram fitting
> can be used to help rate the relevance of one definition over another.
> But I'd bet that the effectiveness these approaches is highly dependent on
> grammatically complete and correct text.   Clinical notes are another
> thing.
>
> I had a perfect example of this problem the other day.   A note stating
> something like "nursing care resumed after 12pm".  Ctakes had tagged this
> with both lactation-related and nursing-service-related CUIs.  But the
> patient was an elderly man.  Clearly the context was not to be found in the
> grammar but in the clinical setting....Thus there is a kind of meta context
> (patient's age, gender, disease state) that could also contribute to
> disambiguation.  This could be achieved by ML methods trained on marked up
> notes... very labor intensive, or by some kind of rules mechanism, but that
> would also be labor intensive - a never-to-be-finished effort.  These might
> require the creation of an instant/lightweight VMR to structure the
> contextual elements from the note that the scoring mechanism would reason
> over.    But I'd prefer a Campari and soda.
>
>
>
> On Sat, May 18, 2019 at 3:24 AM Greg Silverman <gm...@umn.edu> wrote:
>
> > https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111590/
> >
> > On Fri, May 17, 2019 at 8:23 PM Greg Silverman <gm...@umn.edu> wrote:
> >
> > > Yes, and regarding your last paragraph: This is where disambiguation
> > comes
> > > into play. Here is one method:
> > >
> >
> https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume23/montoyo05a-html/node9.html
> > >
> > > I'm not sure how either MetaMap or BioMedICUS do disambiguation, but
> > since
> > > are both open source, they would be potential resources..
> > >
> > > Greg--
> > >
> > > On Fri, May 17, 2019 at 2:17 AM Peter Abramowitsch <
> > > pabramowitsch@gmail.com> wrote:
> > >
> > >> Seems like some kind of simple heuristic should work:    Isn't it
> just a
> > >> case of looking at the in/out text offsets of the source text for an
> > >> identified annotation and then comparing that with the canonical text
> of
> > >> the CUI or SnomedID.   If the source text is just a few of characters
> > (say
> > >> less than 5) and the Levenstein difference between it and the
> canonical
> > >> text is > than the length of the source text,  you're pretty sure to
> > have
> > >> an acronym.
> > >>
> > >> For instance if cTakes finds   "MI" and assigns SNOMED  22298006 or
> CUI
> > >> C0027051 with canonical text "Myocardial Infarction"*, *then with the
> > >> in/out offsets into the text you should be able to run this heuristic
> > >>
> > >> The problem (and I see this in my work) is that many acronyms have
> > >> multiple
> > >> meanings.  Thus, you may accurately be able to tell that your
> identified
> > >> concept came from an acronym, but it was the wrong concept!!
> > >>
> > >> Peter
> > >>
> > >> On Thu, May 16, 2019 at 4:31 AM Greg Silverman <gm...@umn.edu> wrote:
> > >>
> > >> > Got it!
> > >> >
> > >> > Yes, I understand the formidability, given the need for
> > disambiguation,
> > >> > etc. Was just curious if this existed.
> > >> >
> > >> > Thanks!
> > >> >
> > >> >
> > >> > On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
> > >> > Sean.Finan@childrens.harvard.edu> wrote:
> > >> >
> > >> > > Hi Greg,
> > >> > >
> > >> > > Ok, that gives me a great vector toward addressing your needs.
> > >> > >
> > >> > > I don't know of any ctakes components that indicate whether or not
> > >> > > discovered concepts come from acronyms, abbreviations or -replete-
> > >> text
> > >> > > mentions.
> > >> > >
> > >> > > There should be something that does that.   Open source ---->  Any
> > >> > > champions available?
> > >> > >
> > >> > > Right now no abbreviation or metonym information is provided in
> the
> > >> > > standard components.    If it can be extruded from source then it
> > >> should
> > >> > be
> > >> > > provided.
> > >> > >
> > >> > > If anybody has such a component, please let us know !   This is a
> > >> > > formidable (imio) nlp problem, so call your kudos with a solution!
> > >> > >
> > >> > > Sean
> > >> > >
> > >> > > ________________________________________
> > >> > > From: Greg Silverman <gm...@umn.edu>
> > >> > > Sent: Wednesday, May 15, 2019 9:21 PM
> > >> > > To: dev@ctakes.apache.org
> > >> > > Subject: Re: acronyms/abbreviations [EXTERNAL]
> > >> > >
> > >> > > I'm just wondering how acronyms are identified as acronyms in
> cTAKES
> > >> (for
> > >> > > example, in MetaMap, there is an attribute in the Document
> > annotation
> > >> > with
> > >> > > ids of where they are in the Utterance annotation; and in
> > BioMedICUS,
> > >> > there
> > >> > > is an acronym annotation type, etc.). From examining the XMI CAS,
> it
> > >> is
> > >> > not
> > >> > > obvious.
> > >> > >
> > >> > > We're extracting the desired annotations from the XMI CAS using a
> > >> custom
> > >> > > Groovy client.
> > >> > >
> > >> > > Thanks!
> > >> > >
> > >> > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
> > >> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >> > >
> > >> > > > Hi Greg,
> > >> > > >
> > >> > > > What exactly do you need ?
> > >> > > >
> > >> > > > There are a lot of output components that can produce different
> > >> formats
> > >> > > > containing various types of information.
> > >> > > >
> > >> > > > Do you prefer to parse ml ?  Or is columnized text output ok?
> > Does
> > >> > this
> > >> > > > go to a post-processing engine or a human user?
> > >> > > >
> > >> > > > Thanks,
> > >> > > >
> > >> > > > Sean
> > >> > > > ________________________________________
> > >> > > > From: Greg Silverman <gm...@umn.edu>
> > >> > > > Sent: Wednesday, May 15, 2019 7:09 PM
> > >> > > > To: dev@ctakes.apache.org
> > >> > > > Subject: acronyms/abbreviations [EXTERNAL]
> > >> > > >
> > >> > > > How can I get these from the XMI annotations?
> > >> > > >
> > >> > > > Thanks!
> > >> > > >
> > >> > > > Greg--
> > >> > > >
> > >> > > > --
> > >> > > > Greg M. Silverman
> > >> > > > Senior Systems Developer
> > >> > > > NLP/IE <
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> > >> > > > >
> > >> > > > University of Minnesota
> > >> > > > gms@umn.edu
> > >> > > >
> > >> > > >  ›  evaluate-it.org  ‹
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Greg M. Silverman
> > >> > > Senior Systems Developer
> > >> > > NLP/IE <
> > >> > >
> > >> >
> > >>
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
> > >> > > >
> > >> > > University of Minnesota
> > >> > > gms@umn.edu
> > >> > >
> > >> > >  ›  evaluate-it.org  ‹
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Greg M. Silverman
> > >> > Senior Systems Developer
> > >> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > >> > University of Minnesota
> > >> > gms@umn.edu
> > >> >
> > >> >  ›  evaluate-it.org  ‹
> > >> >
> > >>
> > >
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > > University of Minnesota
> > > gms@umn.edu
> > >
> > >  ›  evaluate-it.org  ‹
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: acronyms/abbreviations [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Greg,  Thanks for these links.  I really enjoy discussions of this kind and
am glad to see that someone is trying these knowledge based approaches and
reporting back.  I've played with the Wordnet APIs and believe that it is
possible to use the hyper/hypo-nym constructs to help score different
interpretations of ambiguous terms.  Additionally, I think Ngram fitting
can be used to help rate the relevance of one definition over another.
But I'd bet that the effectiveness these approaches is highly dependent on
grammatically complete and correct text.   Clinical notes are another
thing.

I had a perfect example of this problem the other day.   A note stating
something like "nursing care resumed after 12pm".  Ctakes had tagged this
with both lactation-related and nursing-service-related CUIs.  But the
patient was an elderly man.  Clearly the context was not to be found in the
grammar but in the clinical setting....Thus there is a kind of meta context
(patient's age, gender, disease state) that could also contribute to
disambiguation.  This could be achieved by ML methods trained on marked up
notes... very labor intensive, or by some kind of rules mechanism, but that
would also be labor intensive - a never-to-be-finished effort.  These might
require the creation of an instant/lightweight VMR to structure the
contextual elements from the note that the scoring mechanism would reason
over.    But I'd prefer a Campari and soda.



On Sat, May 18, 2019 at 3:24 AM Greg Silverman <gm...@umn.edu> wrote:

> https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111590/
>
> On Fri, May 17, 2019 at 8:23 PM Greg Silverman <gm...@umn.edu> wrote:
>
> > Yes, and regarding your last paragraph: This is where disambiguation
> comes
> > into play. Here is one method:
> >
> https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume23/montoyo05a-html/node9.html
> >
> > I'm not sure how either MetaMap or BioMedICUS do disambiguation, but
> since
> > are both open source, they would be potential resources..
> >
> > Greg--
> >
> > On Fri, May 17, 2019 at 2:17 AM Peter Abramowitsch <
> > pabramowitsch@gmail.com> wrote:
> >
> >> Seems like some kind of simple heuristic should work:    Isn't it just a
> >> case of looking at the in/out text offsets of the source text for an
> >> identified annotation and then comparing that with the canonical text of
> >> the CUI or SnomedID.   If the source text is just a few of characters
> (say
> >> less than 5) and the Levenstein difference between it and the canonical
> >> text is > than the length of the source text,  you're pretty sure to
> have
> >> an acronym.
> >>
> >> For instance if cTakes finds   "MI" and assigns SNOMED  22298006 or CUI
> >> C0027051 with canonical text "Myocardial Infarction"*, *then with the
> >> in/out offsets into the text you should be able to run this heuristic
> >>
> >> The problem (and I see this in my work) is that many acronyms have
> >> multiple
> >> meanings.  Thus, you may accurately be able to tell that your identified
> >> concept came from an acronym, but it was the wrong concept!!
> >>
> >> Peter
> >>
> >> On Thu, May 16, 2019 at 4:31 AM Greg Silverman <gm...@umn.edu> wrote:
> >>
> >> > Got it!
> >> >
> >> > Yes, I understand the formidability, given the need for
> disambiguation,
> >> > etc. Was just curious if this existed.
> >> >
> >> > Thanks!
> >> >
> >> >
> >> > On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
> >> > Sean.Finan@childrens.harvard.edu> wrote:
> >> >
> >> > > Hi Greg,
> >> > >
> >> > > Ok, that gives me a great vector toward addressing your needs.
> >> > >
> >> > > I don't know of any ctakes components that indicate whether or not
> >> > > discovered concepts come from acronyms, abbreviations or -replete-
> >> text
> >> > > mentions.
> >> > >
> >> > > There should be something that does that.   Open source ---->  Any
> >> > > champions available?
> >> > >
> >> > > Right now no abbreviation or metonym information is provided in the
> >> > > standard components.    If it can be extruded from source then it
> >> should
> >> > be
> >> > > provided.
> >> > >
> >> > > If anybody has such a component, please let us know !   This is a
> >> > > formidable (imio) nlp problem, so call your kudos with a solution!
> >> > >
> >> > > Sean
> >> > >
> >> > > ________________________________________
> >> > > From: Greg Silverman <gm...@umn.edu>
> >> > > Sent: Wednesday, May 15, 2019 9:21 PM
> >> > > To: dev@ctakes.apache.org
> >> > > Subject: Re: acronyms/abbreviations [EXTERNAL]
> >> > >
> >> > > I'm just wondering how acronyms are identified as acronyms in cTAKES
> >> (for
> >> > > example, in MetaMap, there is an attribute in the Document
> annotation
> >> > with
> >> > > ids of where they are in the Utterance annotation; and in
> BioMedICUS,
> >> > there
> >> > > is an acronym annotation type, etc.). From examining the XMI CAS, it
> >> is
> >> > not
> >> > > obvious.
> >> > >
> >> > > We're extracting the desired annotations from the XMI CAS using a
> >> custom
> >> > > Groovy client.
> >> > >
> >> > > Thanks!
> >> > >
> >> > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
> >> > > Sean.Finan@childrens.harvard.edu> wrote:
> >> > >
> >> > > > Hi Greg,
> >> > > >
> >> > > > What exactly do you need ?
> >> > > >
> >> > > > There are a lot of output components that can produce different
> >> formats
> >> > > > containing various types of information.
> >> > > >
> >> > > > Do you prefer to parse ml ?  Or is columnized text output ok?
> Does
> >> > this
> >> > > > go to a post-processing engine or a human user?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Sean
> >> > > > ________________________________________
> >> > > > From: Greg Silverman <gm...@umn.edu>
> >> > > > Sent: Wednesday, May 15, 2019 7:09 PM
> >> > > > To: dev@ctakes.apache.org
> >> > > > Subject: acronyms/abbreviations [EXTERNAL]
> >> > > >
> >> > > > How can I get these from the XMI annotations?
> >> > > >
> >> > > > Thanks!
> >> > > >
> >> > > > Greg--
> >> > > >
> >> > > > --
> >> > > > Greg M. Silverman
> >> > > > Senior Systems Developer
> >> > > > NLP/IE <
> >> > > >
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> >> > > > >
> >> > > > University of Minnesota
> >> > > > gms@umn.edu
> >> > > >
> >> > > >  ›  evaluate-it.org  ‹
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Greg M. Silverman
> >> > > Senior Systems Developer
> >> > > NLP/IE <
> >> > >
> >> >
> >>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
> >> > > >
> >> > > University of Minnesota
> >> > > gms@umn.edu
> >> > >
> >> > >  ›  evaluate-it.org  ‹
> >> > >
> >> >
> >> >
> >> > --
> >> > Greg M. Silverman
> >> > Senior Systems Developer
> >> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> >> > University of Minnesota
> >> > gms@umn.edu
> >> >
> >> >  ›  evaluate-it.org  ‹
> >> >
> >>
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>

Re: acronyms/abbreviations [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3111590/

On Fri, May 17, 2019 at 8:23 PM Greg Silverman <gm...@umn.edu> wrote:

> Yes, and regarding your last paragraph: This is where disambiguation comes
> into play. Here is one method:
> https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume23/montoyo05a-html/node9.html
>
> I'm not sure how either MetaMap or BioMedICUS do disambiguation, but since
> are both open source, they would be potential resources..
>
> Greg--
>
> On Fri, May 17, 2019 at 2:17 AM Peter Abramowitsch <
> pabramowitsch@gmail.com> wrote:
>
>> Seems like some kind of simple heuristic should work:    Isn't it just a
>> case of looking at the in/out text offsets of the source text for an
>> identified annotation and then comparing that with the canonical text of
>> the CUI or SnomedID.   If the source text is just a few of characters (say
>> less than 5) and the Levenstein difference between it and the canonical
>> text is > than the length of the source text,  you're pretty sure to have
>> an acronym.
>>
>> For instance if cTakes finds   "MI" and assigns SNOMED  22298006 or CUI
>> C0027051 with canonical text "Myocardial Infarction"*, *then with the
>> in/out offsets into the text you should be able to run this heuristic
>>
>> The problem (and I see this in my work) is that many acronyms have
>> multiple
>> meanings.  Thus, you may accurately be able to tell that your identified
>> concept came from an acronym, but it was the wrong concept!!
>>
>> Peter
>>
>> On Thu, May 16, 2019 at 4:31 AM Greg Silverman <gm...@umn.edu> wrote:
>>
>> > Got it!
>> >
>> > Yes, I understand the formidability, given the need for disambiguation,
>> > etc. Was just curious if this existed.
>> >
>> > Thanks!
>> >
>> >
>> > On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
>> > Sean.Finan@childrens.harvard.edu> wrote:
>> >
>> > > Hi Greg,
>> > >
>> > > Ok, that gives me a great vector toward addressing your needs.
>> > >
>> > > I don't know of any ctakes components that indicate whether or not
>> > > discovered concepts come from acronyms, abbreviations or -replete-
>> text
>> > > mentions.
>> > >
>> > > There should be something that does that.   Open source ---->  Any
>> > > champions available?
>> > >
>> > > Right now no abbreviation or metonym information is provided in the
>> > > standard components.    If it can be extruded from source then it
>> should
>> > be
>> > > provided.
>> > >
>> > > If anybody has such a component, please let us know !   This is a
>> > > formidable (imio) nlp problem, so call your kudos with a solution!
>> > >
>> > > Sean
>> > >
>> > > ________________________________________
>> > > From: Greg Silverman <gm...@umn.edu>
>> > > Sent: Wednesday, May 15, 2019 9:21 PM
>> > > To: dev@ctakes.apache.org
>> > > Subject: Re: acronyms/abbreviations [EXTERNAL]
>> > >
>> > > I'm just wondering how acronyms are identified as acronyms in cTAKES
>> (for
>> > > example, in MetaMap, there is an attribute in the Document annotation
>> > with
>> > > ids of where they are in the Utterance annotation; and in BioMedICUS,
>> > there
>> > > is an acronym annotation type, etc.). From examining the XMI CAS, it
>> is
>> > not
>> > > obvious.
>> > >
>> > > We're extracting the desired annotations from the XMI CAS using a
>> custom
>> > > Groovy client.
>> > >
>> > > Thanks!
>> > >
>> > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
>> > > Sean.Finan@childrens.harvard.edu> wrote:
>> > >
>> > > > Hi Greg,
>> > > >
>> > > > What exactly do you need ?
>> > > >
>> > > > There are a lot of output components that can produce different
>> formats
>> > > > containing various types of information.
>> > > >
>> > > > Do you prefer to parse ml ?  Or is columnized text output ok?  Does
>> > this
>> > > > go to a post-processing engine or a human user?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Sean
>> > > > ________________________________________
>> > > > From: Greg Silverman <gm...@umn.edu>
>> > > > Sent: Wednesday, May 15, 2019 7:09 PM
>> > > > To: dev@ctakes.apache.org
>> > > > Subject: acronyms/abbreviations [EXTERNAL]
>> > > >
>> > > > How can I get these from the XMI annotations?
>> > > >
>> > > > Thanks!
>> > > >
>> > > > Greg--
>> > > >
>> > > > --
>> > > > Greg M. Silverman
>> > > > Senior Systems Developer
>> > > > NLP/IE <
>> > > >
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
>> > > > >
>> > > > University of Minnesota
>> > > > gms@umn.edu
>> > > >
>> > > >  ›  evaluate-it.org  ‹
>> > > >
>> > >
>> > >
>> > > --
>> > > Greg M. Silverman
>> > > Senior Systems Developer
>> > > NLP/IE <
>> > >
>> >
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
>> > > >
>> > > University of Minnesota
>> > > gms@umn.edu
>> > >
>> > >  ›  evaluate-it.org  ‹
>> > >
>> >
>> >
>> > --
>> > Greg M. Silverman
>> > Senior Systems Developer
>> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
>> > University of Minnesota
>> > gms@umn.edu
>> >
>> >  ›  evaluate-it.org  ‹
>> >
>>
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: acronyms/abbreviations [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.
Yes, and regarding your last paragraph: This is where disambiguation comes
into play. Here is one method:
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume23/montoyo05a-html/node9.html

I'm not sure how either MetaMap or BioMedICUS do disambiguation, but since
are both open source, they would be potential resources..

Greg--

On Fri, May 17, 2019 at 2:17 AM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Seems like some kind of simple heuristic should work:    Isn't it just a
> case of looking at the in/out text offsets of the source text for an
> identified annotation and then comparing that with the canonical text of
> the CUI or SnomedID.   If the source text is just a few of characters (say
> less than 5) and the Levenstein difference between it and the canonical
> text is > than the length of the source text,  you're pretty sure to have
> an acronym.
>
> For instance if cTakes finds   "MI" and assigns SNOMED  22298006 or CUI
> C0027051 with canonical text "Myocardial Infarction"*, *then with the
> in/out offsets into the text you should be able to run this heuristic
>
> The problem (and I see this in my work) is that many acronyms have multiple
> meanings.  Thus, you may accurately be able to tell that your identified
> concept came from an acronym, but it was the wrong concept!!
>
> Peter
>
> On Thu, May 16, 2019 at 4:31 AM Greg Silverman <gm...@umn.edu> wrote:
>
> > Got it!
> >
> > Yes, I understand the formidability, given the need for disambiguation,
> > etc. Was just curious if this existed.
> >
> > Thanks!
> >
> >
> > On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Greg,
> > >
> > > Ok, that gives me a great vector toward addressing your needs.
> > >
> > > I don't know of any ctakes components that indicate whether or not
> > > discovered concepts come from acronyms, abbreviations or -replete- text
> > > mentions.
> > >
> > > There should be something that does that.   Open source ---->  Any
> > > champions available?
> > >
> > > Right now no abbreviation or metonym information is provided in the
> > > standard components.    If it can be extruded from source then it
> should
> > be
> > > provided.
> > >
> > > If anybody has such a component, please let us know !   This is a
> > > formidable (imio) nlp problem, so call your kudos with a solution!
> > >
> > > Sean
> > >
> > > ________________________________________
> > > From: Greg Silverman <gm...@umn.edu>
> > > Sent: Wednesday, May 15, 2019 9:21 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: acronyms/abbreviations [EXTERNAL]
> > >
> > > I'm just wondering how acronyms are identified as acronyms in cTAKES
> (for
> > > example, in MetaMap, there is an attribute in the Document annotation
> > with
> > > ids of where they are in the Utterance annotation; and in BioMedICUS,
> > there
> > > is an acronym annotation type, etc.). From examining the XMI CAS, it is
> > not
> > > obvious.
> > >
> > > We're extracting the desired annotations from the XMI CAS using a
> custom
> > > Groovy client.
> > >
> > > Thanks!
> > >
> > > On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > > Hi Greg,
> > > >
> > > > What exactly do you need ?
> > > >
> > > > There are a lot of output components that can produce different
> formats
> > > > containing various types of information.
> > > >
> > > > Do you prefer to parse ml ?  Or is columnized text output ok?  Does
> > this
> > > > go to a post-processing engine or a human user?
> > > >
> > > > Thanks,
> > > >
> > > > Sean
> > > > ________________________________________
> > > > From: Greg Silverman <gm...@umn.edu>
> > > > Sent: Wednesday, May 15, 2019 7:09 PM
> > > > To: dev@ctakes.apache.org
> > > > Subject: acronyms/abbreviations [EXTERNAL]
> > > >
> > > > How can I get these from the XMI annotations?
> > > >
> > > > Thanks!
> > > >
> > > > Greg--
> > > >
> > > > --
> > > > Greg M. Silverman
> > > > Senior Systems Developer
> > > > NLP/IE <
> > > >
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> > > > >
> > > > University of Minnesota
> > > > gms@umn.edu
> > > >
> > > >  ›  evaluate-it.org  ‹
> > > >
> > >
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
> > > >
> > > University of Minnesota
> > > gms@umn.edu
> > >
> > >  ›  evaluate-it.org  ‹
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: acronyms/abbreviations [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Seems like some kind of simple heuristic should work:    Isn't it just a
case of looking at the in/out text offsets of the source text for an
identified annotation and then comparing that with the canonical text of
the CUI or SnomedID.   If the source text is just a few of characters (say
less than 5) and the Levenstein difference between it and the canonical
text is > than the length of the source text,  you're pretty sure to have
an acronym.

For instance if cTakes finds   "MI" and assigns SNOMED  22298006 or CUI
C0027051 with canonical text "Myocardial Infarction"*, *then with the
in/out offsets into the text you should be able to run this heuristic

The problem (and I see this in my work) is that many acronyms have multiple
meanings.  Thus, you may accurately be able to tell that your identified
concept came from an acronym, but it was the wrong concept!!

Peter

On Thu, May 16, 2019 at 4:31 AM Greg Silverman <gm...@umn.edu> wrote:

> Got it!
>
> Yes, I understand the formidability, given the need for disambiguation,
> etc. Was just curious if this existed.
>
> Thanks!
>
>
> On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Greg,
> >
> > Ok, that gives me a great vector toward addressing your needs.
> >
> > I don't know of any ctakes components that indicate whether or not
> > discovered concepts come from acronyms, abbreviations or -replete- text
> > mentions.
> >
> > There should be something that does that.   Open source ---->  Any
> > champions available?
> >
> > Right now no abbreviation or metonym information is provided in the
> > standard components.    If it can be extruded from source then it should
> be
> > provided.
> >
> > If anybody has such a component, please let us know !   This is a
> > formidable (imio) nlp problem, so call your kudos with a solution!
> >
> > Sean
> >
> > ________________________________________
> > From: Greg Silverman <gm...@umn.edu>
> > Sent: Wednesday, May 15, 2019 9:21 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: acronyms/abbreviations [EXTERNAL]
> >
> > I'm just wondering how acronyms are identified as acronyms in cTAKES (for
> > example, in MetaMap, there is an attribute in the Document annotation
> with
> > ids of where they are in the Utterance annotation; and in BioMedICUS,
> there
> > is an acronym annotation type, etc.). From examining the XMI CAS, it is
> not
> > obvious.
> >
> > We're extracting the desired annotations from the XMI CAS using a custom
> > Groovy client.
> >
> > Thanks!
> >
> > On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Greg,
> > >
> > > What exactly do you need ?
> > >
> > > There are a lot of output components that can produce different formats
> > > containing various types of information.
> > >
> > > Do you prefer to parse ml ?  Or is columnized text output ok?  Does
> this
> > > go to a post-processing engine or a human user?
> > >
> > > Thanks,
> > >
> > > Sean
> > > ________________________________________
> > > From: Greg Silverman <gm...@umn.edu>
> > > Sent: Wednesday, May 15, 2019 7:09 PM
> > > To: dev@ctakes.apache.org
> > > Subject: acronyms/abbreviations [EXTERNAL]
> > >
> > > How can I get these from the XMI annotations?
> > >
> > > Thanks!
> > >
> > > Greg--
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> > > >
> > > University of Minnesota
> > > gms@umn.edu
> > >
> > >  ›  evaluate-it.org  ‹
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
> > >
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>

Re: acronyms/abbreviations [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.
Got it!

Yes, I understand the formidability, given the need for disambiguation,
etc. Was just curious if this existed.

Thanks!


On Wed, May 15, 2019 at 9:11 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Greg,
>
> Ok, that gives me a great vector toward addressing your needs.
>
> I don't know of any ctakes components that indicate whether or not
> discovered concepts come from acronyms, abbreviations or -replete- text
> mentions.
>
> There should be something that does that.   Open source ---->  Any
> champions available?
>
> Right now no abbreviation or metonym information is provided in the
> standard components.    If it can be extruded from source then it should be
> provided.
>
> If anybody has such a component, please let us know !   This is a
> formidable (imio) nlp problem, so call your kudos with a solution!
>
> Sean
>
> ________________________________________
> From: Greg Silverman <gm...@umn.edu>
> Sent: Wednesday, May 15, 2019 9:21 PM
> To: dev@ctakes.apache.org
> Subject: Re: acronyms/abbreviations [EXTERNAL]
>
> I'm just wondering how acronyms are identified as acronyms in cTAKES (for
> example, in MetaMap, there is an attribute in the Document annotation with
> ids of where they are in the Utterance annotation; and in BioMedICUS, there
> is an acronym annotation type, etc.). From examining the XMI CAS, it is not
> obvious.
>
> We're extracting the desired annotations from the XMI CAS using a custom
> Groovy client.
>
> Thanks!
>
> On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Greg,
> >
> > What exactly do you need ?
> >
> > There are a lot of output components that can produce different formats
> > containing various types of information.
> >
> > Do you prefer to parse ml ?  Or is columnized text output ok?  Does this
> > go to a post-processing engine or a human user?
> >
> > Thanks,
> >
> > Sean
> > ________________________________________
> > From: Greg Silverman <gm...@umn.edu>
> > Sent: Wednesday, May 15, 2019 7:09 PM
> > To: dev@ctakes.apache.org
> > Subject: acronyms/abbreviations [EXTERNAL]
> >
> > How can I get these from the XMI annotations?
> >
> > Thanks!
> >
> > Greg--
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> > >
> > University of Minnesota
> > gms@umn.edu
> >
> >  ›  evaluate-it.org  ‹
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=
> >
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: acronyms/abbreviations [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Greg,

Ok, that gives me a great vector toward addressing your needs.

I don't know of any ctakes components that indicate whether or not discovered concepts come from acronyms, abbreviations or -replete- text mentions.

There should be something that does that.   Open source ---->  Any champions available?

Right now no abbreviation or metonym information is provided in the standard components.    If it can be extruded from source then it should be provided.

If anybody has such a component, please let us know !   This is a formidable (imio) nlp problem, so call your kudos with a solution!

Sean
 
________________________________________
From: Greg Silverman <gm...@umn.edu>
Sent: Wednesday, May 15, 2019 9:21 PM
To: dev@ctakes.apache.org
Subject: Re: acronyms/abbreviations [EXTERNAL]

I'm just wondering how acronyms are identified as acronyms in cTAKES (for
example, in MetaMap, there is an attribute in the Document annotation with
ids of where they are in the Utterance annotation; and in BioMedICUS, there
is an acronym annotation type, etc.). From examining the XMI CAS, it is not
obvious.

We're extracting the desired annotations from the XMI CAS using a custom
Groovy client.

Thanks!

On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Greg,
>
> What exactly do you need ?
>
> There are a lot of output components that can produce different formats
> containing various types of information.
>
> Do you prefer to parse ml ?  Or is columnized text output ok?  Does this
> go to a post-processing engine or a human user?
>
> Thanks,
>
> Sean
> ________________________________________
> From: Greg Silverman <gm...@umn.edu>
> Sent: Wednesday, May 15, 2019 7:09 PM
> To: dev@ctakes.apache.org
> Subject: acronyms/abbreviations [EXTERNAL]
>
> How can I get these from the XMI annotations?
>
> Thanks!
>
> Greg--
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> >
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=DSQkibRULBYY2ijgCfGWGPmrKD7gdrLjBbvnTbXozsA&s=pTRmMExWf-ju3IjLOdTelulzu0JW399BumarcAx5tRw&e=>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹

Re: acronyms/abbreviations [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu>.
I'm just wondering how acronyms are identified as acronyms in cTAKES (for
example, in MetaMap, there is an attribute in the Document annotation with
ids of where they are in the Utterance annotation; and in BioMedICUS, there
is an acronym annotation type, etc.). From examining the XMI CAS, it is not
obvious.

We're extracting the desired annotations from the XMI CAS using a custom
Groovy client.

Thanks!

On Wed, May 15, 2019 at 7:43 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Greg,
>
> What exactly do you need ?
>
> There are a lot of output components that can produce different formats
> containing various types of information.
>
> Do you prefer to parse ml ?  Or is columnized text output ok?  Does this
> go to a post-processing engine or a human user?
>
> Thanks,
>
> Sean
> ________________________________________
> From: Greg Silverman <gm...@umn.edu>
> Sent: Wednesday, May 15, 2019 7:09 PM
> To: dev@ctakes.apache.org
> Subject: acronyms/abbreviations [EXTERNAL]
>
> How can I get these from the XMI annotations?
>
> Thanks!
>
> Greg--
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.proofpoint.com/v2/url?u=https-3A__healthinformatics.umn.edu_research_nlpie-2Dgroup&d=DwIFaQ&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=fs67GvlGZstTpyIisCYNYmQCP6r0bcpKGd4f7d4gTao&m=Fj9pHse59o_GfrCnR_sqZ7ibEmMju2GDRj6hmEg5s9U&s=taqRUWLVp4l5699x1GSXNfIK6WkZXiAgKnA3CPmlfWk&e=
> >
> University of Minnesota
> gms@umn.edu
>
>  ›  evaluate-it.org  ‹
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
University of Minnesota
gms@umn.edu

 ›  evaluate-it.org  ‹