You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Greg Silverman <gm...@umn.edu.INVALID> on 2021/05/16 22:02:43 UTC

rule-based lookup for custom lexicon

I looked all over and could not find any information on how to add this
pipeline component to cTAKES. I assume it uses UIMA Ruta?

Thanks in advance!

Greg--
-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Greg Silverman <gm...@umn.edu.INVALID>.
Thanks everyone! This indeed is an enlightening conversation.

Best!

On Wed, May 19, 2021 at 3:10 PM Shyam Bhimani <SB...@targetrwe.com>
wrote:

> I am interested. Thank you
>
> Shyam Bhimani
> Software Engineer
>
>
>
>
> CONFIDENTIALITY NOTICE: The contents of this email message and any
> attachments are intended solely for the addressee(s) and may
> contain confidential and/or privileged information and may be legally
> protected from disclosure.
>
> -----Original Message-----
> From: Kean Kaufmann <ke...@recordsone.com>
> Sent: Wednesday, May 19, 2021 2:08 PM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> ** WARNING: This email originated from outside of Target RWE. **
>
>
> >
> > If anybody out there in the general community is interested, please
> > reply on this thread and maybe we can coordinate a single presentation
> time.
>
>
> Yes please. Thanks, Sean and (other) Peter!
>
> On Wed, May 19, 2021 at 3:42 PM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi (other) Peter,
> >
> > Many thanks for jumping in on this!
> >
> > I would definitely be interested in seeing some examples, even though
> > I don't have any specific use case right now.
> >
> > I will ask a few local people and see if they are interested in an
> > informal video chat.  If anybody out there in the general community is
> > interested, please reply on this thread and maybe we can coordinate a
> > single presentation time.
> >
> > Cheers,
> >
> > Sean
> > ________________________________________
> > From: Peter Klügl <pe...@averbis.com>
> > Sent: Wednesday, May 19, 2021 3:33 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> > Hi all,
> >
> >
> > if you are interested in UIMA Ruta and want to know more about it, you
> > can always ask on the UIMA user list or me directly (I am the creator
> > of UIMA Ruta). I can also prepare some slides and we can have an
> > informal video chat where I give an overview of Ruta.
> >
> >
> > I am of course not objective here (for several reasons) but I think
> > UIMA Ruta could be really useful for cTAKES. It was originally
> > developed for segmenting and processing discharge letters and similar
> > clincial documents. Since then (>10 years), Ruta has always been
> > applied to clincial documents and is being deployed in production by
> > several companies. The language has some advantages and disadvantages
> > compared to other rule languages. In the context of cTAKES, the
> > direct/comprehensive support of UIMA and the IDE dev support are maybe
> > the most relevant advantages.
> >
> >
> > I was thinking about creating some introductory examples for the
> > combination and usage of UIMA Ruta and cTAKES. If you have a good use
> > case, let me know.
> >
> >
> > Best,
> >
> >
> > (another) Peter
> >
> >
> > Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> > > Hi all,
> > > Correct.
> > >
> > > Tim  is correct in the sense that he is using a custom dictionary
> > (custom synonyms, cuis, etc.) which kind of changes the "rules" of
> > what the standard dictionary lookup considers a valid term based upon
> > available tokens in the text.  There are other simple settings that
> > further qualify how the standard dictionary lookup accepts or discards
> synonyms.
> > >
> > > I think that what Greg is asking about is something with introduced
> > "logic" that can alter or remove terms already discovered by the
> > standard dictionary lookup.
> > >
> > > Peter and Kean both outline some custom annotators that they have
> > created to use logic that can alter/add/remove terms discovered by the
> > standard dictionary lookup.  I do the same thing for different
> > projects and advise everybody that applies ctakes to specific domains do
> the same.
> > >
> > > ctakes is a general purpose tool and results can definitely be
> > > improved
> > when catered to a more narrow purpose.
> > >
> > > Back to Greg, I got the feeling that he might be interested in a
> > > more
> > versatile annotator.  Introducing an engine that can utilize something
> > like ruta has several advantages:
> > > 1.  You  can "easily" add complex rules in one place.
> > > 2.  You can change rules external to code ...
> > >   2a. the same pipeline can be catered to different projects without
> > changing code in an annotator or creating a new annotator.
> > >   2b.  An end user who knows nothing about ctakes can change a ruta
> > script to fit their purposes.
> > > 3. Rules are supported and documented by uima ruta, so you don't
> > > have to
> > worry about that extra headache.
> > > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> > community can apply ruta rules to their project.
> > >
> > > When I looked at it a few years ago it was for reason 2b.  In the
> > > end we
> > went for different annotators like Peter and Kean outlined and just
> > use piper file changes to satisfy #2 as that is definitely much easier.
> > However, it doesn't benefit the community as a whole (#4).
> > >
> > > Cheers all, this is a great conversation!
> > >
> > > Sean
> > >
> > >
> > >
> > >
> > > ________________________________________
> > > From: Kean Kaufmann <ke...@recordsone.com>
> > > Sent: Wednesday, May 19, 2021 7:50 AM
> > > To: dev@ctakes.apache.org
> > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > > [SUSPICIOUS]
> > >
> > > * External Email - Caution *
> > >
> > >
> > >> yes,  the line between "lookup" and rule execution is a little
> > >> blurry
> > > sometimes.
> > >
> > > Sure is.  I blur it with a set of annotators that extend dictionary
> > > annotations based on words or annotations covered by the same Chunk,
> e.g.
> > >
> > > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> > > MedicationMention + /dependenc[ey]|addiction/i =
> > > DiseaseDisorderMention DiseaseDisorderMention +
> > > AnatomicalSiteMention in same Chunk = DiseaseDisorderMention
> > > ProcedureMention + AnatomicalSiteMention in same Chunk =
> > > ProcedureMention
> > >
> > > Higher recall than the regular UmlsLookupAnnotator; higher precision
> > > than the UmlsOverlapLookupAnnotator (which skips a specified number
> > > of tokens regardless of syntax).
> > >
> > > I've been wanting a more general framework to fit this into, and
> > > thinking it might be Ruta.
> > > Thanks for the pointer to TokensRegex; I'll look at that as well.
> > >
> > >
> > > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> > pabramowitsch@gmail.com>
> > > wrote:
> > >
> > >> Hi All,  yes,  the line between "lookup" and rule execution is a
> little
> > >> blurry sometimes.   Here's some more blurriness.
> > >>
> > >> I've done something related, adapting a UIMA tokens regex engine
> > >> for Ctakes.  You create a new type in the TypeSystem.  In my case it
> uses
> > >> CONLLDEP Annotations as the tokens to reason over.   You can set up
> > >> expressions (rules) that look like this.
> > >> (Yes, this case is already covered in the dictionary, but it's an
> > example)
> > >>
> > >> Matcher A:   (lemma=="be");
> > >> Matcher B:   /partially|partly/;
> > >> Matcher C:   /vaccinated/;
> > >>
> > >> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> > >>
> > >> You get the Annotation you've delegated to this task, with the
> > >> entity value  "vaccinated|1234|5678"  and the range which spanned
> > >> the tokens
> > that
> > >> caused the annotation rule to fire
> > >>
> > >> (See Stanford's Tokens Regex)
> > >>
> > >> Peter
> > >>
> > >>
> > >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> > >> Timothy.Miller@childrens.harvard.edu> wrote:
> > >>
> > >>> But Sean, isn't what he's asking for essentially already
> > >>> implemented in cTAKES as the custom dictionary? I'm currently
> > >>> using that approach for
> > my
> > >>> covid container:
> > >>>
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2FMachine-Learning-for-Me
> > dical-Language%2Fctakes-covid-container__%3B!!NZvER7FxgEiBAiR_!7ZopTIh
> > XKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac%2
> > 4&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d
> > 241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZ
> > sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> > D%7C1000&amp;sdata=9sq3Mkcfzpq6ky5VxRTJYX5fg96K9jLQ84ZuAZtfkBw%3D&amp;
> > reserved=0
> > >>> Tim
> > >>>
> > >>> ________________________________________
> > >>> From: Finan, Sean <Se...@childrens.harvard.edu>
> > >>> Sent: Tuesday, May 18, 2021 11:55 AM
> > >>> To: dev@ctakes.apache.org
> > >>> Cc: Himanshu Shekhar Sahoo
> > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > [SUSPICIOUS]
> > >>>
> > >>> * External Email - Caution *
> > >>>
> > >>>
> > >>> Hi Greg,
> > >>>
> > >>> From 30,000 ft, I think that you would want to use the RutaEngine.
> > >>>
> > >>>
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fuima.apache.org%2Fd%2Fruta-current%2
> > Ftools.ruta.book.html*ugr.tools.ruta.ae.basic__%3BIw!!NZvER7FxgEiBAiR_
> > !6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWi
> > ckztninUTU%24&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7
> > Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnkno
> > wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> > CJXVCI6Mn0%3D%7C1000&amp;sdata=NplkaaVc1VSAzprb2eKYEWDZyjlceT%2FIzx0X9
> > Y23yco%3D&amp;reserved=0
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fjavadoc.io%2Fdoc%2Forg.apache.uima%2
> > Fruta-core%2Flatest%2Forg%2Fapache%2Fuima%2Fruta%2Fengine%2FRutaEngine
> > .html__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-
> > iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI%24&amp;data=04%7C01%7C%7C2c06b4
> > 8172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%
> > 7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> > joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wsLHHngunn8
> > M%2B8IIJpCLuUeHEreCkFbJsYxN41%2FErrc%3D&amp;reserved=0
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__http%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fuima%2
> > Fruta%2Ftrunk%2Fruta-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fuima%2F
> > ruta%2Fengine%2FRutaEngine.java__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAv
> > Lt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4%24&am
> > p;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f
> > 380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d
> > 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> > 1000&amp;sdata=8e53AJqf9xK5ZKj%2BhKk7wy%2BzQSEcHybEe65SM7etn5I%3D&amp;
> > reserved=0
> > >>> That seems to be the actual analysis engine that loads and uses
> > >>> rules
> > to
> > >>> create annotations.
> > >>> While you could use an xml descriptor or use the piper "set"
> > >>> command
> > and
> > >>> do things like mapping ruta to ctakes type systems, I would take
> > >>> the alternate approach of "copying" the initialize(..) and process
> > >>> (..)
> > >> methods
> > >>> and modify them to use ctakes types directly.
> > >>>
> > >>> Disclaimer:  I know very little about uima ruta.  At some point I
> > >>> did
> > >> look
> > >>> into it but it was for a specific (ctakes-derivative) project and
> > >>> I
> > >> didn't
> > >>> go further than basic doc perusal.
> > >>>
> > >>> If you move forward with this please let us all know what you
> > >>> find.  I think that there will be great interest in the community.
> > >>>
> > >>> Sean
> > >>> ________________________________________
> > >>> From: Greg Silverman <gm...@umn.edu.INVALID>
> > >>> Sent: Tuesday, May 18, 2021 11:13 AM
> > >>> To: dev@ctakes.apache.org
> > >>> Cc: Himanshu Shekhar Sahoo
> > >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > >>>
> > >>> * External Email - Caution *
> > >>>
> > >>>
> > >>> Hi Sean,
> > >>> I was wondering if there was a way to use rule-base lookup of a
> > >>> custom lexicon within cTAKES (say a locally curated list of covd-19
> symptoms).
> > >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> > >> anything
> > >>> wrt to cTAKES specifics.
> > >>>
> > >>> Thanks!
> > >>>
> > >>>
> > >>> Greg--
> > >>>
> > >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > >>> Sean.Finan@childrens.harvard.edu> wrote:
> > >>>
> > >>>>  To which ctakes component(s) are you referring?
> > >>>> ________________________________________
> > >>>> From: Greg Silverman <gm...@umn.edu.INVALID>
> > >>>> Sent: Sunday, May 16, 2021 6:02 PM
> > >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > >>>>
> > >>>> * External Email - Caution *
> > >>>>
> > >>>>
> > >>>> I looked all over and could not find any information on how to
> > >>>> add
> > this
> > >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > >>>>
> > >>>> Thanks in advance!
> > >>>>
> > >>>> Greg--
> > >>>> --
> > >>>> Greg M. Silverman
> > >>>> Senior Systems Developer
> > >>>> NLP/IE <
> > >>>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch
> > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313Q
> > U2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA%24&amp;data=04%7C01%7C%7C2
> > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1
> > %7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0WN0yw
> > j9IqYGirnJL2cF4EhcJCyqLR2E6gjrGH8r%2BPo%3D&amp;reserved=0
> > >>>> Department of Surgery
> > >>>> University of Minnesota
> > >>>> gms@umn.edu
> > >>>>
> > >>>
> > >>> --
> > >>> Greg M. Silverman
> > >>> Senior Systems Developer
> > >>> NLP/IE <
> > >>>
> > >>
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch
> > %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4
> > _zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I%24&amp;data=04%7C01%7C%7C2
> > c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1
> > %7C0%7C637570516886408094%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> > LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aUEAqH
> > Dqep4MURX9a5ZXabQ4W1LzM89AEPNHTqzG1Yw%3D&amp;reserved=0
> > >>> Department of Surgery
> > >>> University of Minnesota
> > >>> gms@umn.edu
> > >>>
> > --
> > Dr. Peter Klügl
> > Head of Text Mining/Machine Learning
> >
> > Averbis GmbH
> > Salzstr. 15
> > 79098 Freiburg
> > Germany
> >
> > Fon: +49 761 708 394 0
> > Fax: +49 761 708 394 10
> > Email: peter.kluegl@averbis.com
> > Web:
> > https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> > efense.com%2Fv3%2F__https%3A%2F%2Faverbis.com__%3B!!NZvER7FxgEiBAiR_!8
> > k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfA
> > OWo4%24&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6
> > c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886408094%7CUnknown%7CT
> > WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> > 6Mn0%3D%7C1000&amp;sdata=EQcNZBDQoEHOCGnJRWPyz%2B2a8tulfifkkFGI1Py4SIs
> > %3D&amp;reserved=0
> >
> > Headquarters: Freiburg im Breisgau
> > Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing
> > Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
> >
> >
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

RE: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Shyam Bhimani <SB...@targetrwe.com>.
I am interested. Thank you 

Shyam Bhimani
Software Engineer


  

CONFIDENTIALITY NOTICE: The contents of this email message and any attachments are intended solely for the addressee(s) and may contain confidential and/or privileged information and may be legally protected from disclosure.

-----Original Message-----
From: Kean Kaufmann <ke...@recordsone.com> 
Sent: Wednesday, May 19, 2021 2:08 PM
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

** WARNING: This email originated from outside of Target RWE. **


>
> If anybody out there in the general community is interested, please 
> reply on this thread and maybe we can coordinate a single presentation time.


Yes please. Thanks, Sean and (other) Peter!

On Wed, May 19, 2021 at 3:42 PM Finan, Sean < Sean.Finan@childrens.harvard.edu> wrote:

> Hi (other) Peter,
>
> Many thanks for jumping in on this!
>
> I would definitely be interested in seeing some examples, even though 
> I don't have any specific use case right now.
>
> I will ask a few local people and see if they are interested in an 
> informal video chat.  If anybody out there in the general community is 
> interested, please reply on this thread and maybe we can coordinate a 
> single presentation time.
>
> Cheers,
>
> Sean
> ________________________________________
> From: Peter Klügl <pe...@averbis.com>
> Sent: Wednesday, May 19, 2021 3:33 PM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] 
> [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi all,
>
>
> if you are interested in UIMA Ruta and want to know more about it, you 
> can always ask on the UIMA user list or me directly (I am the creator 
> of UIMA Ruta). I can also prepare some slides and we can have an 
> informal video chat where I give an overview of Ruta.
>
>
> I am of course not objective here (for several reasons) but I think 
> UIMA Ruta could be really useful for cTAKES. It was originally 
> developed for segmenting and processing discharge letters and similar 
> clincial documents. Since then (>10 years), Ruta has always been 
> applied to clincial documents and is being deployed in production by 
> several companies. The language has some advantages and disadvantages 
> compared to other rule languages. In the context of cTAKES, the 
> direct/comprehensive support of UIMA and the IDE dev support are maybe 
> the most relevant advantages.
>
>
> I was thinking about creating some introductory examples for the 
> combination and usage of UIMA Ruta and cTAKES. If you have a good use 
> case, let me know.
>
>
> Best,
>
>
> (another) Peter
>
>
> Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> > Hi all,
> > Correct.
> >
> > Tim  is correct in the sense that he is using a custom dictionary
> (custom synonyms, cuis, etc.) which kind of changes the "rules" of 
> what the standard dictionary lookup considers a valid term based upon 
> available tokens in the text.  There are other simple settings that 
> further qualify how the standard dictionary lookup accepts or discards synonyms.
> >
> > I think that what Greg is asking about is something with introduced
> "logic" that can alter or remove terms already discovered by the 
> standard dictionary lookup.
> >
> > Peter and Kean both outline some custom annotators that they have
> created to use logic that can alter/add/remove terms discovered by the 
> standard dictionary lookup.  I do the same thing for different 
> projects and advise everybody that applies ctakes to specific domains do the same.
> >
> > ctakes is a general purpose tool and results can definitely be 
> > improved
> when catered to a more narrow purpose.
> >
> > Back to Greg, I got the feeling that he might be interested in a 
> > more
> versatile annotator.  Introducing an engine that can utilize something 
> like ruta has several advantages:
> > 1.  You  can "easily" add complex rules in one place.
> > 2.  You can change rules external to code ...
> >   2a. the same pipeline can be catered to different projects without
> changing code in an annotator or creating a new annotator.
> >   2b.  An end user who knows nothing about ctakes can change a ruta
> script to fit their purposes.
> > 3. Rules are supported and documented by uima ruta, so you don't 
> > have to
> worry about that extra headache.
> > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> community can apply ruta rules to their project.
> >
> > When I looked at it a few years ago it was for reason 2b.  In the 
> > end we
> went for different annotators like Peter and Kean outlined and just 
> use piper file changes to satisfy #2 as that is definitely much easier.
> However, it doesn't benefit the community as a whole (#4).
> >
> > Cheers all, this is a great conversation!
> >
> > Sean
> >
> >
> >
> >
> > ________________________________________
> > From: Kean Kaufmann <ke...@recordsone.com>
> > Sent: Wednesday, May 19, 2021 7:50 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] 
> > [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> >> yes,  the line between "lookup" and rule execution is a little 
> >> blurry
> > sometimes.
> >
> > Sure is.  I blur it with a set of annotators that extend dictionary 
> > annotations based on words or annotations covered by the same Chunk, e.g.
> >
> > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention 
> > MedicationMention + /dependenc[ey]|addiction/i = 
> > DiseaseDisorderMention DiseaseDisorderMention + 
> > AnatomicalSiteMention in same Chunk = DiseaseDisorderMention 
> > ProcedureMention + AnatomicalSiteMention in same Chunk = 
> > ProcedureMention
> >
> > Higher recall than the regular UmlsLookupAnnotator; higher precision 
> > than the UmlsOverlapLookupAnnotator (which skips a specified number 
> > of tokens regardless of syntax).
> >
> > I've been wanting a more general framework to fit this into, and 
> > thinking it might be Ruta.
> > Thanks for the pointer to TokensRegex; I'll look at that as well.
> >
> >
> > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> > wrote:
> >
> >> Hi All,  yes,  the line between "lookup" and rule execution is a little
> >> blurry sometimes.   Here's some more blurriness.
> >>
> >> I've done something related, adapting a UIMA tokens regex engine 
> >> for Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> >> CONLLDEP Annotations as the tokens to reason over.   You can set up
> >> expressions (rules) that look like this.
> >> (Yes, this case is already covered in the dictionary, but it's an
> example)
> >>
> >> Matcher A:   (lemma=="be");
> >> Matcher B:   /partially|partly/;
> >> Matcher C:   /vaccinated/;
> >>
> >> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> >>
> >> You get the Annotation you've delegated to this task, with the 
> >> entity value  "vaccinated|1234|5678"  and the range which spanned 
> >> the tokens
> that
> >> caused the annotation rule to fire
> >>
> >> (See Stanford's Tokens Regex)
> >>
> >> Peter
> >>
> >>
> >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy < 
> >> Timothy.Miller@childrens.harvard.edu> wrote:
> >>
> >>> But Sean, isn't what he's asking for essentially already 
> >>> implemented in cTAKES as the custom dictionary? I'm currently 
> >>> using that approach for
> my
> >>> covid container:
> >>>
> >>>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Fgithub.com%2FMachine-Learning-for-Me
> dical-Language%2Fctakes-covid-container__%3B!!NZvER7FxgEiBAiR_!7ZopTIh
> XKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac%2
> 4&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d
> 241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3
> D%7C1000&amp;sdata=9sq3Mkcfzpq6ky5VxRTJYX5fg96K9jLQ84ZuAZtfkBw%3D&amp;
> reserved=0
> >>> Tim
> >>>
> >>> ________________________________________
> >>> From: Finan, Sean <Se...@childrens.harvard.edu>
> >>> Sent: Tuesday, May 18, 2021 11:55 AM
> >>> To: dev@ctakes.apache.org
> >>> Cc: Himanshu Shekhar Sahoo
> >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> [SUSPICIOUS]
> >>>
> >>> * External Email - Caution *
> >>>
> >>>
> >>> Hi Greg,
> >>>
> >>> From 30,000 ft, I think that you would want to use the RutaEngine.
> >>>
> >>>
> >>>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Fuima.apache.org%2Fd%2Fruta-current%2
> Ftools.ruta.book.html*ugr.tools.ruta.ae.basic__%3BIw!!NZvER7FxgEiBAiR_
> !6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWi
> ckztninUTU%24&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7
> Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnkno
> wn%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiL
> CJXVCI6Mn0%3D%7C1000&amp;sdata=NplkaaVc1VSAzprb2eKYEWDZyjlceT%2FIzx0X9
> Y23yco%3D&amp;reserved=0
> >>>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Fjavadoc.io%2Fdoc%2Forg.apache.uima%2
> Fruta-core%2Flatest%2Forg%2Fapache%2Fuima%2Fruta%2Fengine%2FRutaEngine
> .html__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-
> iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI%24&amp;data=04%7C01%7C%7C2c06b4
> 8172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1%7C0%
> 7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQI
> joiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=wsLHHngunn8
> M%2B8IIJpCLuUeHEreCkFbJsYxN41%2FErrc%3D&amp;reserved=0
> >>>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__http%3A%2F%2Fsvn.apache.org%2Frepos%2Fasf%2Fuima%2
> Fruta%2Ftrunk%2Fruta-core%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fuima%2F
> ruta%2Fengine%2FRutaEngine.java__%3B!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAv
> Lt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4%24&am
> p;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f
> 380993e0f7df7a48e%7C1%7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d
> 8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C
> 1000&amp;sdata=8e53AJqf9xK5ZKj%2BhKk7wy%2BzQSEcHybEe65SM7etn5I%3D&amp;
> reserved=0
> >>> That seems to be the actual analysis engine that loads and uses 
> >>> rules
> to
> >>> create annotations.
> >>> While you could use an xml descriptor or use the piper "set" 
> >>> command
> and
> >>> do things like mapping ruta to ctakes type systems, I would take 
> >>> the alternate approach of "copying" the initialize(..) and process 
> >>> (..)
> >> methods
> >>> and modify them to use ctakes types directly.
> >>>
> >>> Disclaimer:  I know very little about uima ruta.  At some point I 
> >>> did
> >> look
> >>> into it but it was for a specific (ctakes-derivative) project and 
> >>> I
> >> didn't
> >>> go further than basic doc perusal.
> >>>
> >>> If you move forward with this please let us all know what you 
> >>> find.  I think that there will be great interest in the community.
> >>>
> >>> Sean
> >>> ________________________________________
> >>> From: Greg Silverman <gm...@umn.edu.INVALID>
> >>> Sent: Tuesday, May 18, 2021 11:13 AM
> >>> To: dev@ctakes.apache.org
> >>> Cc: Himanshu Shekhar Sahoo
> >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> >>>
> >>> * External Email - Caution *
> >>>
> >>>
> >>> Hi Sean,
> >>> I was wondering if there was a way to use rule-base lookup of a 
> >>> custom lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> >> anything
> >>> wrt to cTAKES specifics.
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Greg--
> >>>
> >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean < 
> >>> Sean.Finan@childrens.harvard.edu> wrote:
> >>>
> >>>>  To which ctakes component(s) are you referring?
> >>>> ________________________________________
> >>>> From: Greg Silverman <gm...@umn.edu.INVALID>
> >>>> Sent: Sunday, May 16, 2021 6:02 PM
> >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
> >>>>
> >>>> * External Email - Caution *
> >>>>
> >>>>
> >>>> I looked all over and could not find any information on how to 
> >>>> add
> this
> >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
> >>>>
> >>>> Thanks in advance!
> >>>>
> >>>> Greg--
> >>>> --
> >>>> Greg M. Silverman
> >>>> Senior Systems Developer
> >>>> NLP/IE <
> >>>>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch
> %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313Q
> U2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA%24&amp;data=04%7C01%7C%7C2
> c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1
> %7C0%7C637570516886398095%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=0WN0yw
> j9IqYGirnJL2cF4EhcJCyqLR2E6gjrGH8r%2BPo%3D&amp;reserved=0
> >>>> Department of Surgery
> >>>> University of Minnesota
> >>>> gms@umn.edu
> >>>>
> >>>
> >>> --
> >>> Greg M. Silverman
> >>> Senior Systems Developer
> >>> NLP/IE <
> >>>
> >>
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Fhealthinformatics.umn.edu%2Fresearch
> %2Fnlpie-group__%3B!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4
> _zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I%24&amp;data=04%7C01%7C%7C2
> c06b48172e64effe38208d91b01d138%7Cd09f6c4846d241f380993e0f7df7a48e%7C1
> %7C0%7C637570516886408094%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=aUEAqH
> Dqep4MURX9a5ZXabQ4W1LzM89AEPNHTqzG1Yw%3D&amp;reserved=0
> >>> Department of Surgery
> >>> University of Minnesota
> >>> gms@umn.edu
> >>>
> --
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web:
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Furld
> efense.com%2Fv3%2F__https%3A%2F%2Faverbis.com__%3B!!NZvER7FxgEiBAiR_!8
> k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfA
> OWo4%24&amp;data=04%7C01%7C%7C2c06b48172e64effe38208d91b01d138%7Cd09f6
> c4846d241f380993e0f7df7a48e%7C1%7C0%7C637570516886408094%7CUnknown%7CT
> WFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI
> 6Mn0%3D%7C1000&amp;sdata=EQcNZBDQoEHOCGnJRWPyz%2B2a8tulfifkkFGI1Py4SIs
> %3D&amp;reserved=0
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing 
> Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Kean Kaufmann <ke...@recordsone.com>.
>
> If anybody out there in the general community is interested, please reply
> on this thread and maybe we can coordinate a single presentation time.


Yes please. Thanks, Sean and (other) Peter!

On Wed, May 19, 2021 at 3:42 PM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi (other) Peter,
>
> Many thanks for jumping in on this!
>
> I would definitely be interested in seeing some examples, even though I
> don't have any specific use case right now.
>
> I will ask a few local people and see if they are interested in an
> informal video chat.  If anybody out there in the general community is
> interested, please reply on this thread and maybe we can coordinate a
> single presentation time.
>
> Cheers,
>
> Sean
> ________________________________________
> From: Peter Klügl <pe...@averbis.com>
> Sent: Wednesday, May 19, 2021 3:33 PM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi all,
>
>
> if you are interested in UIMA Ruta and want to know more about it, you
> can always ask on the UIMA user list or me directly (I am the creator of
> UIMA Ruta). I can also prepare some slides and we can have an informal
> video chat where I give an overview of Ruta.
>
>
> I am of course not objective here (for several reasons) but I think UIMA
> Ruta could be really useful for cTAKES. It was originally developed for
> segmenting and processing discharge letters and similar clincial
> documents. Since then (>10 years), Ruta has always been applied to
> clincial documents and is being deployed in production by several
> companies. The language has some advantages and disadvantages compared
> to other rule languages. In the context of cTAKES, the
> direct/comprehensive support of UIMA and the IDE dev support are maybe
> the most relevant advantages.
>
>
> I was thinking about creating some introductory examples for the
> combination and usage of UIMA Ruta and cTAKES. If you have a good use
> case, let me know.
>
>
> Best,
>
>
> (another) Peter
>
>
> Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> > Hi all,
> > Correct.
> >
> > Tim  is correct in the sense that he is using a custom dictionary
> (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the
> standard dictionary lookup considers a valid term based upon available
> tokens in the text.  There are other simple settings that further qualify
> how the standard dictionary lookup accepts or discards synonyms.
> >
> > I think that what Greg is asking about is something with introduced
> "logic" that can alter or remove terms already discovered by the standard
> dictionary lookup.
> >
> > Peter and Kean both outline some custom annotators that they have
> created to use logic that can alter/add/remove terms discovered by the
> standard dictionary lookup.  I do the same thing for different projects and
> advise everybody that applies ctakes to specific domains do the same.
> >
> > ctakes is a general purpose tool and results can definitely be improved
> when catered to a more narrow purpose.
> >
> > Back to Greg, I got the feeling that he might be interested in a more
> versatile annotator.  Introducing an engine that can utilize something like
> ruta has several advantages:
> > 1.  You  can "easily" add complex rules in one place.
> > 2.  You can change rules external to code ...
> >   2a. the same pipeline can be catered to different projects without
> changing code in an annotator or creating a new annotator.
> >   2b.  An end user who knows nothing about ctakes can change a ruta
> script to fit their purposes.
> > 3. Rules are supported and documented by uima ruta, so you don't have to
> worry about that extra headache.
> > 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> community can apply ruta rules to their project.
> >
> > When I looked at it a few years ago it was for reason 2b.  In the end we
> went for different annotators like Peter and Kean outlined and just use
> piper file changes to satisfy #2 as that is definitely much easier.
> However, it doesn't benefit the community as a whole (#4).
> >
> > Cheers all, this is a great conversation!
> >
> > Sean
> >
> >
> >
> >
> > ________________________________________
> > From: Kean Kaufmann <ke...@recordsone.com>
> > Sent: Wednesday, May 19, 2021 7:50 AM
> > To: dev@ctakes.apache.org
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> >> yes,  the line between "lookup" and rule execution is a little blurry
> > sometimes.
> >
> > Sure is.  I blur it with a set of annotators that extend dictionary
> > annotations based on words or annotations covered by the same Chunk, e.g.
> >
> > DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> > MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> > DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> > DiseaseDisorderMention
> > ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
> >
> > Higher recall than the regular UmlsLookupAnnotator;
> > higher precision than the UmlsOverlapLookupAnnotator (which skips a
> > specified number of tokens regardless of syntax).
> >
> > I've been wanting a more general framework to fit this into, and thinking
> > it might be Ruta.
> > Thanks for the pointer to TokensRegex; I'll look at that as well.
> >
> >
> > On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> > wrote:
> >
> >> Hi All,  yes,  the line between "lookup" and rule execution is a little
> >> blurry sometimes.   Here's some more blurriness.
> >>
> >> I've done something related, adapting a UIMA tokens regex engine for
> >> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> >> CONLLDEP Annotations as the tokens to reason over.   You can set up
> >> expressions (rules) that look like this.
> >> (Yes, this case is already covered in the dictionary, but it's an
> example)
> >>
> >> Matcher A:   (lemma=="be");
> >> Matcher B:   /partially|partly/;
> >> Matcher C:   /vaccinated/;
> >>
> >> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> >>
> >> You get the Annotation you've delegated to this task, with the entity
> >> value  "vaccinated|1234|5678"  and the range which spanned the tokens
> that
> >> caused the annotation rule to fire
> >>
> >> (See Stanford's Tokens Regex)
> >>
> >> Peter
> >>
> >>
> >> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> >> Timothy.Miller@childrens.harvard.edu> wrote:
> >>
> >>> But Sean, isn't what he's asking for essentially already implemented in
> >>> cTAKES as the custom dictionary? I'm currently using that approach for
> my
> >>> covid container:
> >>>
> >>>
> >>
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> >>> Tim
> >>>
> >>> ________________________________________
> >>> From: Finan, Sean <Se...@childrens.harvard.edu>
> >>> Sent: Tuesday, May 18, 2021 11:55 AM
> >>> To: dev@ctakes.apache.org
> >>> Cc: Himanshu Shekhar Sahoo
> >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> [SUSPICIOUS]
> >>>
> >>> * External Email - Caution *
> >>>
> >>>
> >>> Hi Greg,
> >>>
> >>> From 30,000 ft, I think that you would want to use the RutaEngine.
> >>>
> >>>
> >>>
> >>
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> >>>
> >>
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> >>>
> >>
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> >>> That seems to be the actual analysis engine that loads and uses rules
> to
> >>> create annotations.
> >>> While you could use an xml descriptor or use the piper "set" command
> and
> >>> do things like mapping ruta to ctakes type systems, I would take the
> >>> alternate approach of "copying" the initialize(..) and process (..)
> >> methods
> >>> and modify them to use ctakes types directly.
> >>>
> >>> Disclaimer:  I know very little about uima ruta.  At some point I did
> >> look
> >>> into it but it was for a specific (ctakes-derivative) project and I
> >> didn't
> >>> go further than basic doc perusal.
> >>>
> >>> If you move forward with this please let us all know what you find.  I
> >>> think that there will be great interest in the community.
> >>>
> >>> Sean
> >>> ________________________________________
> >>> From: Greg Silverman <gm...@umn.edu.INVALID>
> >>> Sent: Tuesday, May 18, 2021 11:13 AM
> >>> To: dev@ctakes.apache.org
> >>> Cc: Himanshu Shekhar Sahoo
> >>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> >>>
> >>> * External Email - Caution *
> >>>
> >>>
> >>> Hi Sean,
> >>> I was wondering if there was a way to use rule-base lookup of a custom
> >>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> >>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> >> anything
> >>> wrt to cTAKES specifics.
> >>>
> >>> Thanks!
> >>>
> >>>
> >>> Greg--
> >>>
> >>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> >>> Sean.Finan@childrens.harvard.edu> wrote:
> >>>
> >>>>  To which ctakes component(s) are you referring?
> >>>> ________________________________________
> >>>> From: Greg Silverman <gm...@umn.edu.INVALID>
> >>>> Sent: Sunday, May 16, 2021 6:02 PM
> >>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> >>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
> >>>>
> >>>> * External Email - Caution *
> >>>>
> >>>>
> >>>> I looked all over and could not find any information on how to add
> this
> >>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
> >>>>
> >>>> Thanks in advance!
> >>>>
> >>>> Greg--
> >>>> --
> >>>> Greg M. Silverman
> >>>> Senior Systems Developer
> >>>> NLP/IE <
> >>>>
> >>
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >>>> Department of Surgery
> >>>> University of Minnesota
> >>>> gms@umn.edu
> >>>>
> >>>
> >>> --
> >>> Greg M. Silverman
> >>> Senior Systems Developer
> >>> NLP/IE <
> >>>
> >>
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> >>> Department of Surgery
> >>> University of Minnesota
> >>> gms@umn.edu
> >>>
> --
> Dr. Peter Klügl
> Head of Text Mining/Machine Learning
>
> Averbis GmbH
> Salzstr. 15
> 79098 Freiburg
> Germany
>
> Fon: +49 761 708 394 0
> Fax: +49 761 708 394 10
> Email: peter.kluegl@averbis.com
> Web:
> https://urldefense.com/v3/__https://averbis.com__;!!NZvER7FxgEiBAiR_!8k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfAOWo4$
>
> Headquarters: Freiburg im Breisgau
> Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
> Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó
>
>

RE: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by "Monogyiou, Eugenia" <Eu...@nttdata.com>.
Very interested!
Thank you :)

Kind Regards,

Eugenia

-----Original Message-----
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: 19 May 2021 20:42
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Hi (other) Peter,

Many thanks for jumping in on this!

I would definitely be interested in seeing some examples, even though I don't have any specific use case right now.

I will ask a few local people and see if they are interested in an informal video chat.  If anybody out there in the general community is interested, please reply on this thread and maybe we can coordinate a single presentation time.

Cheers,

Sean
________________________________________
From: Peter Klügl <pe...@averbis.com>
Sent: Wednesday, May 19, 2021 3:33 PM
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,


if you are interested in UIMA Ruta and want to know more about it, you can always ask on the UIMA user list or me directly (I am the creator of UIMA Ruta). I can also prepare some slides and we can have an informal video chat where I give an overview of Ruta.


I am of course not objective here (for several reasons) but I think UIMA Ruta could be really useful for cTAKES. It was originally developed for segmenting and processing discharge letters and similar clincial documents. Since then (>10 years), Ruta has always been applied to clincial documents and is being deployed in production by several companies. The language has some advantages and disadvantages compared to other rule languages. In the context of cTAKES, the direct/comprehensive support of UIMA and the IDE dev support are maybe the most relevant advantages.


I was thinking about creating some introductory examples for the combination and usage of UIMA Ruta and cTAKES. If you have a good use case, let me know.


Best,


(another) Peter


Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the standard dictionary lookup considers a valid term based upon available tokens in the text.  There are other simple settings that further qualify how the standard dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced "logic" that can alter or remove terms already discovered by the standard dictionary lookup.
>
> Peter and Kean both outline some custom annotators that they have created to use logic that can alter/add/remove terms discovered by the standard dictionary lookup.  I do the same thing for different projects and advise everybody that applies ctakes to specific domains do the same.
>
> ctakes is a general purpose tool and results can definitely be improved when catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more versatile annotator.  Introducing an engine that can utilize something like ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without changing code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script to fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we went for different annotators like Peter and Kean outlined and just use piper file changes to satisfy #2 as that is definitely much easier.  However, it doesn't benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> ________________________________________
> From: Kean Kaufmann <ke...@recordsone.com>
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> [SUSPICIOUS]
>
> * External Email - Caution *
>
>
>> yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i =
> DiseaseDisorderMention DiseaseDisorderMention + AnatomicalSiteMention
> in same Chunk = DiseaseDisorderMention ProcedureMention +
> AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator; higher precision
> than the UmlsOverlapLookupAnnotator (which skips a specified number of
> tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and
> thinking it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch
> <pa...@gmail.com>
> wrote:
>
>> Hi All,  yes,  the line between "lookup" and rule execution is a little
>> blurry sometimes.   Here's some more blurriness.
>>
>> I've done something related, adapting a UIMA tokens regex engine for
>> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
>> CONLLDEP Annotations as the tokens to reason over.   You can set up
>> expressions (rules) that look like this.
>> (Yes, this case is already covered in the dictionary, but it's an
>> example)
>>
>> Matcher A:   (lemma=="be");
>> Matcher B:   /partially|partly/;
>> Matcher C:   /vaccinated/;
>>
>> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>>
>> You get the Annotation you've delegated to this task, with the entity
>> value  "vaccinated|1234|5678"  and the range which spanned the tokens
>> that caused the annotation rule to fire
>>
>> (See Stanford's Tokens Regex)
>>
>> Peter
>>
>>
>> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>
>>> But Sean, isn't what he's asking for essentially already implemented
>>> in cTAKES as the custom dictionary? I'm currently using that
>>> approach for my covid container:
>>>
>>>
>> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-M
>> edical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXK
>> alQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
>>> Tim
>>>
>>> ________________________________________
>>> From: Finan, Sean <Se...@childrens.harvard.edu>
>>> Sent: Tuesday, May 18, 2021 11:55 AM
>>> To: dev@ctakes.apache.org
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>>> [SUSPICIOUS]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Greg,
>>>
>>> From 30,000 ft, I think that you would want to use the RutaEngine.
>>>
>>>
>>>
>> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/to
>> ols.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH
>> 1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickz
>> tninUTU$
>>>
>> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ru
>> ta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7
>> FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJ
>> xSvv8r5GjWickzI7QF5CI$
>>>
>> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta
>> /trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine
>> .java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-i
>> Oew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
>>> That seems to be the actual analysis engine that loads and uses
>>> rules to create annotations.
>>> While you could use an xml descriptor or use the piper "set" command
>>> and do things like mapping ruta to ctakes type systems, I would take
>>> the alternate approach of "copying" the initialize(..) and process
>>> (..)
>> methods
>>> and modify them to use ctakes types directly.
>>>
>>> Disclaimer:  I know very little about uima ruta.  At some point I
>>> did
>> look
>>> into it but it was for a specific (ctakes-derivative) project and I
>> didn't
>>> go further than basic doc perusal.
>>>
>>> If you move forward with this please let us all know what you find.
>>> I think that there will be great interest in the community.
>>>
>>> Sean
>>> ________________________________________
>>> From: Greg Silverman <gm...@umn.edu.INVALID>
>>> Sent: Tuesday, May 18, 2021 11:13 AM
>>> To: dev@ctakes.apache.org
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Sean,
>>> I was wondering if there was a way to use rule-base lookup of a
>>> custom lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
>>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
>> anything
>>> wrt to cTAKES specifics.
>>>
>>> Thanks!
>>>
>>>
>>> Greg--
>>>
>>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>
>>>>  To which ctakes component(s) are you referring?
>>>> ________________________________________
>>>> From: Greg Silverman <gm...@umn.edu.INVALID>
>>>> Sent: Sunday, May 16, 2021 6:02 PM
>>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
>>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>>>>
>>>> * External Email - Caution *
>>>>
>>>>
>>>> I looked all over and could not find any information on how to add
>>>> this pipeline component to cTAKES. I assume it uses UIMA Ruta?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Greg--
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <
>>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/researc
>> h/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2
>> QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
>>>> Department of Surgery
>>>> University of Minnesota
>>>> gms@umn.edu
>>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <
>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/researc
>> h/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_z
>> sPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://urldefense.com/v3/__https://averbis.com__;!!NZvER7FxgEiBAiR_!8k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfAOWo4$

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080 Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó

Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding.

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi (other) Peter,

Many thanks for jumping in on this!

I would definitely be interested in seeing some examples, even though I don't have any specific use case right now.

I will ask a few local people and see if they are interested in an informal video chat.  If anybody out there in the general community is interested, please reply on this thread and maybe we can coordinate a single presentation time.

Cheers,

Sean
________________________________________
From: Peter Klügl <pe...@averbis.com>
Sent: Wednesday, May 19, 2021 3:33 PM
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi all,


if you are interested in UIMA Ruta and want to know more about it, you
can always ask on the UIMA user list or me directly (I am the creator of
UIMA Ruta). I can also prepare some slides and we can have an informal
video chat where I give an overview of Ruta.


I am of course not objective here (for several reasons) but I think UIMA
Ruta could be really useful for cTAKES. It was originally developed for
segmenting and processing discharge letters and similar clincial
documents. Since then (>10 years), Ruta has always been applied to
clincial documents and is being deployed in production by several
companies. The language has some advantages and disadvantages compared
to other rule languages. In the context of cTAKES, the
direct/comprehensive support of UIMA and the IDE dev support are maybe
the most relevant advantages.


I was thinking about creating some introductory examples for the
combination and usage of UIMA Ruta and cTAKES. If you have a good use
case, let me know.


Best,


(another) Peter


Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the standard dictionary lookup considers a valid term based upon available tokens in the text.  There are other simple settings that further qualify how the standard dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced "logic" that can alter or remove terms already discovered by the standard dictionary lookup.
>
> Peter and Kean both outline some custom annotators that they have created to use logic that can alter/add/remove terms discovered by the standard dictionary lookup.  I do the same thing for different projects and advise everybody that applies ctakes to specific domains do the same.
>
> ctakes is a general purpose tool and results can definitely be improved when catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more versatile annotator.  Introducing an engine that can utilize something like ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without changing code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script to fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we went for different annotators like Peter and Kean outlined and just use piper file changes to satisfy #2 as that is definitely much easier.  However, it doesn't benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> ________________________________________
> From: Kean Kaufmann <ke...@recordsone.com>
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
>> yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified number of tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and thinking
> it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <pa...@gmail.com>
> wrote:
>
>> Hi All,  yes,  the line between "lookup" and rule execution is a little
>> blurry sometimes.   Here's some more blurriness.
>>
>> I've done something related, adapting a UIMA tokens regex engine for
>> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
>> CONLLDEP Annotations as the tokens to reason over.   You can set up
>> expressions (rules) that look like this.
>> (Yes, this case is already covered in the dictionary, but it's an example)
>>
>> Matcher A:   (lemma=="be");
>> Matcher B:   /partially|partly/;
>> Matcher C:   /vaccinated/;
>>
>> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>>
>> You get the Annotation you've delegated to this task, with the entity
>> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
>> caused the annotation rule to fire
>>
>> (See Stanford's Tokens Regex)
>>
>> Peter
>>
>>
>> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>
>>> But Sean, isn't what he's asking for essentially already implemented in
>>> cTAKES as the custom dictionary? I'm currently using that approach for my
>>> covid container:
>>>
>>>
>> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
>>> Tim
>>>
>>> ________________________________________
>>> From: Finan, Sean <Se...@childrens.harvard.edu>
>>> Sent: Tuesday, May 18, 2021 11:55 AM
>>> To: dev@ctakes.apache.org
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Greg,
>>>
>>> From 30,000 ft, I think that you would want to use the RutaEngine.
>>>
>>>
>>>
>> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
>>>
>> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
>>>
>> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
>>> That seems to be the actual analysis engine that loads and uses rules to
>>> create annotations.
>>> While you could use an xml descriptor or use the piper "set" command and
>>> do things like mapping ruta to ctakes type systems, I would take the
>>> alternate approach of "copying" the initialize(..) and process (..)
>> methods
>>> and modify them to use ctakes types directly.
>>>
>>> Disclaimer:  I know very little about uima ruta.  At some point I did
>> look
>>> into it but it was for a specific (ctakes-derivative) project and I
>> didn't
>>> go further than basic doc perusal.
>>>
>>> If you move forward with this please let us all know what you find.  I
>>> think that there will be great interest in the community.
>>>
>>> Sean
>>> ________________________________________
>>> From: Greg Silverman <gm...@umn.edu.INVALID>
>>> Sent: Tuesday, May 18, 2021 11:13 AM
>>> To: dev@ctakes.apache.org
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Sean,
>>> I was wondering if there was a way to use rule-base lookup of a custom
>>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
>>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
>> anything
>>> wrt to cTAKES specifics.
>>>
>>> Thanks!
>>>
>>>
>>> Greg--
>>>
>>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>
>>>>  To which ctakes component(s) are you referring?
>>>> ________________________________________
>>>> From: Greg Silverman <gm...@umn.edu.INVALID>
>>>> Sent: Sunday, May 16, 2021 6:02 PM
>>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
>>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>>>>
>>>> * External Email - Caution *
>>>>
>>>>
>>>> I looked all over and could not find any information on how to add this
>>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Greg--
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <
>>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
>>>> Department of Surgery
>>>> University of Minnesota
>>>> gms@umn.edu
>>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <
>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
--
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://urldefense.com/v3/__https://averbis.com__;!!NZvER7FxgEiBAiR_!8k8JQUqNQYj-fQWELRFtxlACk1xSqLtVEnIHDmvmw6QnGtc3id_S4IOLqa6-Y9F4mOzpTfAOWo4$

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Peter Klügl <pe...@averbis.com>.
Hi all,


if you are interested in UIMA Ruta and want to know more about it, you
can always ask on the UIMA user list or me directly (I am the creator of
UIMA Ruta). I can also prepare some slides and we can have an informal
video chat where I give an overview of Ruta.


I am of course not objective here (for several reasons) but I think UIMA
Ruta could be really useful for cTAKES. It was originally developed for
segmenting and processing discharge letters and similar clincial
documents. Since then (>10 years), Ruta has always been applied to
clincial documents and is being deployed in production by several
companies. The language has some advantages and disadvantages compared
to other rule languages. In the context of cTAKES, the
direct/comprehensive support of UIMA and the IDE dev support are maybe
the most relevant advantages.


I was thinking about creating some introductory examples for the
combination and usage of UIMA Ruta and cTAKES. If you have a good use
case, let me know.


Best,


(another) Peter


Am 19.05.2021 um 14:30 schrieb Finan, Sean:
> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the standard dictionary lookup considers a valid term based upon available tokens in the text.  There are other simple settings that further qualify how the standard dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced "logic" that can alter or remove terms already discovered by the standard dictionary lookup.
>
> Peter and Kean both outline some custom annotators that they have created to use logic that can alter/add/remove terms discovered by the standard dictionary lookup.  I do the same thing for different projects and advise everybody that applies ctakes to specific domains do the same.  
>
> ctakes is a general purpose tool and results can definitely be improved when catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more versatile annotator.  Introducing an engine that can utilize something like ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without changing code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script to fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we went for different annotators like Peter and Kean outlined and just use piper file changes to satisfy #2 as that is definitely much easier.  However, it doesn't benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> ________________________________________
> From: Kean Kaufmann <ke...@recordsone.com>
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
>> yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified number of tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and thinking
> it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <pa...@gmail.com>
> wrote:
>
>> Hi All,  yes,  the line between "lookup" and rule execution is a little
>> blurry sometimes.   Here's some more blurriness.
>>
>> I've done something related, adapting a UIMA tokens regex engine for
>> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
>> CONLLDEP Annotations as the tokens to reason over.   You can set up
>> expressions (rules) that look like this.
>> (Yes, this case is already covered in the dictionary, but it's an example)
>>
>> Matcher A:   (lemma=="be");
>> Matcher B:   /partially|partly/;
>> Matcher C:   /vaccinated/;
>>
>> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>>
>> You get the Annotation you've delegated to this task, with the entity
>> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
>> caused the annotation rule to fire
>>
>> (See Stanford's Tokens Regex)
>>
>> Peter
>>
>>
>> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
>> Timothy.Miller@childrens.harvard.edu> wrote:
>>
>>> But Sean, isn't what he's asking for essentially already implemented in
>>> cTAKES as the custom dictionary? I'm currently using that approach for my
>>> covid container:
>>>
>>>
>> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
>>> Tim
>>>
>>> ________________________________________
>>> From: Finan, Sean <Se...@childrens.harvard.edu>
>>> Sent: Tuesday, May 18, 2021 11:55 AM
>>> To: dev@ctakes.apache.org
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Greg,
>>>
>>> From 30,000 ft, I think that you would want to use the RutaEngine.
>>>
>>>
>>>
>> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
>>>
>> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
>>>
>> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
>>> That seems to be the actual analysis engine that loads and uses rules to
>>> create annotations.
>>> While you could use an xml descriptor or use the piper "set" command and
>>> do things like mapping ruta to ctakes type systems, I would take the
>>> alternate approach of "copying" the initialize(..) and process (..)
>> methods
>>> and modify them to use ctakes types directly.
>>>
>>> Disclaimer:  I know very little about uima ruta.  At some point I did
>> look
>>> into it but it was for a specific (ctakes-derivative) project and I
>> didn't
>>> go further than basic doc perusal.
>>>
>>> If you move forward with this please let us all know what you find.  I
>>> think that there will be great interest in the community.
>>>
>>> Sean
>>> ________________________________________
>>> From: Greg Silverman <gm...@umn.edu.INVALID>
>>> Sent: Tuesday, May 18, 2021 11:13 AM
>>> To: dev@ctakes.apache.org
>>> Cc: Himanshu Shekhar Sahoo
>>> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>>>
>>> * External Email - Caution *
>>>
>>>
>>> Hi Sean,
>>> I was wondering if there was a way to use rule-base lookup of a custom
>>> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
>>> When I Googled around, I stumbled on UIMA Ruta, but couldn't find
>> anything
>>> wrt to cTAKES specifics.
>>>
>>> Thanks!
>>>
>>>
>>> Greg--
>>>
>>> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
>>> Sean.Finan@childrens.harvard.edu> wrote:
>>>
>>>>  To which ctakes component(s) are you referring?
>>>> ________________________________________
>>>> From: Greg Silverman <gm...@umn.edu.INVALID>
>>>> Sent: Sunday, May 16, 2021 6:02 PM
>>>> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
>>>> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>>>>
>>>> * External Email - Caution *
>>>>
>>>>
>>>> I looked all over and could not find any information on how to add this
>>>> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>>>>
>>>> Thanks in advance!
>>>>
>>>> Greg--
>>>> --
>>>> Greg M. Silverman
>>>> Senior Systems Developer
>>>> NLP/IE <
>>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
>>>> Department of Surgery
>>>> University of Minnesota
>>>> gms@umn.edu
>>>>
>>>
>>> --
>>> Greg M. Silverman
>>> Senior Systems Developer
>>> NLP/IE <
>>>
>> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
>>> Department of Surgery
>>> University of Minnesota
>>> gms@umn.edu
>>>
-- 
Dr. Peter Klügl
Head of Text Mining/Machine Learning

Averbis GmbH
Salzstr. 15
79098 Freiburg
Germany

Fon: +49 761 708 394 0
Fax: +49 761 708 394 10
Email: peter.kluegl@averbis.com
Web: https://averbis.com

Headquarters: Freiburg im Breisgau
Register Court: Amtsgericht Freiburg im Breisgau, HRB 701080
Managing Directors: Dr. med. Philipp Daumke, Dr. Kornél Markó


Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Sean & everyone,  totally agree.   Ruta is an obvious candidate because it
is already so tightly coupled to UIMA.  It provides a very rich overlay to
the annotations and the type system.  Does anyone know if Ruta instances
are thread safe (assuming the JCAS is in thread-local storage)?   I saw one
conversation from a while ago asking the same question, but don't think I
saw an answer)

At times I've wondered whether a more generic rules engine that exposed
rules to the CAS could also be useful.  The logic wouldn't be restricted to
doing text interrogation.  Like  Ruta it would access the jCas via a Rules
Language but a predicate wiring API could provide support for a wide range
of operations involving external logic and data.   Also the ability to
invoke the rules stage at multiple times in the same pipeline with
different rule sets.   Perhaps all this could already be handled in Ruta's
extension mechanism.

Peter


On Wed, May 19, 2021 at 5:30 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi all,
> Correct.
>
> Tim  is correct in the sense that he is using a custom dictionary (custom
> synonyms, cuis, etc.) which kind of changes the "rules" of what the
> standard dictionary lookup considers a valid term based upon available
> tokens in the text.  There are other simple settings that further qualify
> how the standard dictionary lookup accepts or discards synonyms.
>
> I think that what Greg is asking about is something with introduced
> "logic" that can alter or remove terms already discovered by the standard
> dictionary lookup.
>
> Peter and Kean both outline some custom annotators that they have created
> to use logic that can alter/add/remove terms discovered by the standard
> dictionary lookup.  I do the same thing for different projects and advise
> everybody that applies ctakes to specific domains do the same.
>
> ctakes is a general purpose tool and results can definitely be improved
> when catered to a more narrow purpose.
>
> Back to Greg, I got the feeling that he might be interested in a more
> versatile annotator.  Introducing an engine that can utilize something like
> ruta has several advantages:
> 1.  You  can "easily" add complex rules in one place.
> 2.  You can change rules external to code ...
>   2a. the same pipeline can be catered to different projects without
> changing code in an annotator or creating a new annotator.
>   2b.  An end user who knows nothing about ctakes can change a ruta script
> to fit their purposes.
> 3. Rules are supported and documented by uima ruta, so you don't have to
> worry about that extra headache.
> 4. Once Greg adds it to apache ctakes (right? ;^) everybody in the
> community can apply ruta rules to their project.
>
> When I looked at it a few years ago it was for reason 2b.  In the end we
> went for different annotators like Peter and Kean outlined and just use
> piper file changes to satisfy #2 as that is definitely much easier.
> However, it doesn't benefit the community as a whole (#4).
>
> Cheers all, this is a great conversation!
>
> Sean
>
>
>
>
> ________________________________________
> From: Kean Kaufmann <ke...@recordsone.com>
> Sent: Wednesday, May 19, 2021 7:50 AM
> To: dev@ctakes.apache.org
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> > yes,  the line between "lookup" and rule execution is a little blurry
> sometimes.
>
> Sure is.  I blur it with a set of annotators that extend dictionary
> annotations based on words or annotations covered by the same Chunk, e.g.
>
> DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
> MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
> DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
> DiseaseDisorderMention
> ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention
>
> Higher recall than the regular UmlsLookupAnnotator;
> higher precision than the UmlsOverlapLookupAnnotator (which skips a
> specified number of tokens regardless of syntax).
>
> I've been wanting a more general framework to fit this into, and thinking
> it might be Ruta.
> Thanks for the pointer to TokensRegex; I'll look at that as well.
>
>
> On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Hi All,  yes,  the line between "lookup" and rule execution is a little
> > blurry sometimes.   Here's some more blurriness.
> >
> > I've done something related, adapting a UIMA tokens regex engine for
> > Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> > CONLLDEP Annotations as the tokens to reason over.   You can set up
> > expressions (rules) that look like this.
> > (Yes, this case is already covered in the dictionary, but it's an
> example)
> >
> > Matcher A:   (lemma=="be");
> > Matcher B:   /partially|partly/;
> > Matcher C:   /vaccinated/;
> >
> > Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
> >
> > You get the Annotation you've delegated to this task, with the entity
> > value  "vaccinated|1234|5678"  and the range which spanned the tokens
> that
> > caused the annotation rule to fire
> >
> > (See Stanford's Tokens Regex)
> >
> > Peter
> >
> >
> > On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> > Timothy.Miller@childrens.harvard.edu> wrote:
> >
> > > But Sean, isn't what he's asking for essentially already implemented in
> > > cTAKES as the custom dictionary? I'm currently using that approach for
> my
> > > covid container:
> > >
> > >
> >
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> > > Tim
> > >
> > > ________________________________________
> > > From: Finan, Sean <Se...@childrens.harvard.edu>
> > > Sent: Tuesday, May 18, 2021 11:55 AM
> > > To: dev@ctakes.apache.org
> > > Cc: Himanshu Shekhar Sahoo
> > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> [SUSPICIOUS]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi Greg,
> > >
> > > From 30,000 ft, I think that you would want to use the RutaEngine.
> > >
> > >
> > >
> >
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> > >
> > >
> >
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> > >
> > >
> >
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> > >
> > > That seems to be the actual analysis engine that loads and uses rules
> to
> > > create annotations.
> > > While you could use an xml descriptor or use the piper "set" command
> and
> > > do things like mapping ruta to ctakes type systems, I would take the
> > > alternate approach of "copying" the initialize(..) and process (..)
> > methods
> > > and modify them to use ctakes types directly.
> > >
> > > Disclaimer:  I know very little about uima ruta.  At some point I did
> > look
> > > into it but it was for a specific (ctakes-derivative) project and I
> > didn't
> > > go further than basic doc perusal.
> > >
> > > If you move forward with this please let us all know what you find.  I
> > > think that there will be great interest in the community.
> > >
> > > Sean
> > > ________________________________________
> > > From: Greg Silverman <gm...@umn.edu.INVALID>
> > > Sent: Tuesday, May 18, 2021 11:13 AM
> > > To: dev@ctakes.apache.org
> > > Cc: Himanshu Shekhar Sahoo
> > > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi Sean,
> > > I was wondering if there was a way to use rule-base lookup of a custom
> > > lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> > > When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> > anything
> > > wrt to cTAKES specifics.
> > >
> > > Thanks!
> > >
> > >
> > > Greg--
> > >
> > > On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > > Sean.Finan@childrens.harvard.edu> wrote:
> > >
> > > >  To which ctakes component(s) are you referring?
> > > > ________________________________________
> > > > From: Greg Silverman <gm...@umn.edu.INVALID>
> > > > Sent: Sunday, May 16, 2021 6:02 PM
> > > > To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > > > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > > >
> > > > * External Email - Caution *
> > > >
> > > >
> > > > I looked all over and could not find any information on how to add
> this
> > > > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > > >
> > > > Thanks in advance!
> > > >
> > > > Greg--
> > > > --
> > > > Greg M. Silverman
> > > > Senior Systems Developer
> > > > NLP/IE <
> > > >
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > > > >
> > > > Department of Surgery
> > > > University of Minnesota
> > > > gms@umn.edu
> > > >
> > >
> > >
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> > > >
> > > Department of Surgery
> > > University of Minnesota
> > > gms@umn.edu
> > >
> >
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi all,
Correct.

Tim  is correct in the sense that he is using a custom dictionary (custom synonyms, cuis, etc.) which kind of changes the "rules" of what the standard dictionary lookup considers a valid term based upon available tokens in the text.  There are other simple settings that further qualify how the standard dictionary lookup accepts or discards synonyms.

I think that what Greg is asking about is something with introduced "logic" that can alter or remove terms already discovered by the standard dictionary lookup.

Peter and Kean both outline some custom annotators that they have created to use logic that can alter/add/remove terms discovered by the standard dictionary lookup.  I do the same thing for different projects and advise everybody that applies ctakes to specific domains do the same.  

ctakes is a general purpose tool and results can definitely be improved when catered to a more narrow purpose.

Back to Greg, I got the feeling that he might be interested in a more versatile annotator.  Introducing an engine that can utilize something like ruta has several advantages:
1.  You  can "easily" add complex rules in one place.
2.  You can change rules external to code ...
  2a. the same pipeline can be catered to different projects without changing code in an annotator or creating a new annotator.
  2b.  An end user who knows nothing about ctakes can change a ruta script to fit their purposes.
3. Rules are supported and documented by uima ruta, so you don't have to worry about that extra headache.
4. Once Greg adds it to apache ctakes (right? ;^) everybody in the community can apply ruta rules to their project.

When I looked at it a few years ago it was for reason 2b.  In the end we went for different annotators like Peter and Kean outlined and just use piper file changes to satisfy #2 as that is definitely much easier.  However, it doesn't benefit the community as a whole (#4).

Cheers all, this is a great conversation!

Sean




________________________________________
From: Kean Kaufmann <ke...@recordsone.com>
Sent: Wednesday, May 19, 2021 7:50 AM
To: dev@ctakes.apache.org
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


> yes,  the line between "lookup" and rule execution is a little blurry
sometimes.

Sure is.  I blur it with a set of annotators that extend dictionary
annotations based on words or annotations covered by the same Chunk, e.g.

DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
DiseaseDisorderMention
ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention

Higher recall than the regular UmlsLookupAnnotator;
higher precision than the UmlsOverlapLookupAnnotator (which skips a
specified number of tokens regardless of syntax).

I've been wanting a more general framework to fit this into, and thinking
it might be Ruta.
Thanks for the pointer to TokensRegex; I'll look at that as well.


On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi All,  yes,  the line between "lookup" and rule execution is a little
> blurry sometimes.   Here's some more blurriness.
>
> I've done something related, adapting a UIMA tokens regex engine for
> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> CONLLDEP Annotations as the tokens to reason over.   You can set up
> expressions (rules) that look like this.
> (Yes, this case is already covered in the dictionary, but it's an example)
>
> Matcher A:   (lemma=="be");
> Matcher B:   /partially|partly/;
> Matcher C:   /vaccinated/;
>
> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>
> You get the Annotation you've delegated to this task, with the entity
> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
> caused the annotation rule to fire
>
> (See Stanford's Tokens Regex)
>
> Peter
>
>
> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
> > But Sean, isn't what he's asking for essentially already implemented in
> > cTAKES as the custom dictionary? I'm currently using that approach for my
> > covid container:
> >
> >
> https://urldefense.com/v3/__https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container__;!!NZvER7FxgEiBAiR_!7ZopTIhXKalQFx0xET_yET0agN2ZT8JWoa0UyqGSrXa4w-h_9-tRCEeiS4pr6s2Y-T4elV3bYac$
> > Tim
> >
> > ________________________________________
> > From: Finan, Sean <Se...@childrens.harvard.edu>
> > Sent: Tuesday, May 18, 2021 11:55 AM
> > To: dev@ctakes.apache.org
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> > Hi Greg,
> >
> > From 30,000 ft, I think that you would want to use the RutaEngine.
> >
> >
> >
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> >
> >
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> >
> >
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> >
> > That seems to be the actual analysis engine that loads and uses rules to
> > create annotations.
> > While you could use an xml descriptor or use the piper "set" command and
> > do things like mapping ruta to ctakes type systems, I would take the
> > alternate approach of "copying" the initialize(..) and process (..)
> methods
> > and modify them to use ctakes types directly.
> >
> > Disclaimer:  I know very little about uima ruta.  At some point I did
> look
> > into it but it was for a specific (ctakes-derivative) project and I
> didn't
> > go further than basic doc perusal.
> >
> > If you move forward with this please let us all know what you find.  I
> > think that there will be great interest in the community.
> >
> > Sean
> > ________________________________________
> > From: Greg Silverman <gm...@umn.edu.INVALID>
> > Sent: Tuesday, May 18, 2021 11:13 AM
> > To: dev@ctakes.apache.org
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Sean,
> > I was wondering if there was a way to use rule-base lookup of a custom
> > lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> > When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> anything
> > wrt to cTAKES specifics.
> >
> > Thanks!
> >
> >
> > Greg--
> >
> > On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > >  To which ctakes component(s) are you referring?
> > > ________________________________________
> > > From: Greg Silverman <gm...@umn.edu.INVALID>
> > > Sent: Sunday, May 16, 2021 6:02 PM
> > > To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > I looked all over and could not find any information on how to add this
> > > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > >
> > > Thanks in advance!
> > >
> > > Greg--
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > > >
> > > Department of Surgery
> > > University of Minnesota
> > > gms@umn.edu
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> > >
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
> >
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Kean Kaufmann <ke...@recordsone.com>.
> yes,  the line between "lookup" and rule execution is a little blurry
sometimes.

Sure is.  I blur it with a set of annotators that extend dictionary
annotations based on words or annotations covered by the same Chunk, e.g.

DiseaseDisorderMention + /screen(ing)?/i = ProcedureMention
MedicationMention + /dependenc[ey]|addiction/i = DiseaseDisorderMention
DiseaseDisorderMention + AnatomicalSiteMention in same Chunk =
DiseaseDisorderMention
ProcedureMention + AnatomicalSiteMention in same Chunk = ProcedureMention

Higher recall than the regular UmlsLookupAnnotator;
higher precision than the UmlsOverlapLookupAnnotator (which skips a
specified number of tokens regardless of syntax).

I've been wanting a more general framework to fit this into, and thinking
it might be Ruta.
Thanks for the pointer to TokensRegex; I'll look at that as well.


On Tue, May 18, 2021 at 5:39 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi All,  yes,  the line between "lookup" and rule execution is a little
> blurry sometimes.   Here's some more blurriness.
>
> I've done something related, adapting a UIMA tokens regex engine for
> Ctakes.  You create a new type in the TypeSystem.  In my case it uses
> CONLLDEP Annotations as the tokens to reason over.   You can set up
> expressions (rules) that look like this.
> (Yes, this case is already covered in the dictionary, but it's an example)
>
> Matcher A:   (lemma=="be");
> Matcher B:   /partially|partly/;
> Matcher C:   /vaccinated/;
>
> Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;
>
> You get the Annotation you've delegated to this task, with the entity
> value  "vaccinated|1234|5678"  and the range which spanned the tokens that
> caused the annotation rule to fire
>
> (See Stanford's Tokens Regex)
>
> Peter
>
>
> On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
> Timothy.Miller@childrens.harvard.edu> wrote:
>
> > But Sean, isn't what he's asking for essentially already implemented in
> > cTAKES as the custom dictionary? I'm currently using that approach for my
> > covid container:
> >
> >
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container
> > Tim
> >
> > ________________________________________
> > From: Finan, Sean <Se...@childrens.harvard.edu>
> > Sent: Tuesday, May 18, 2021 11:55 AM
> > To: dev@ctakes.apache.org
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
> >
> > * External Email - Caution *
> >
> >
> > Hi Greg,
> >
> > From 30,000 ft, I think that you would want to use the RutaEngine.
> >
> >
> >
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
> >
> >
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
> >
> >
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
> >
> > That seems to be the actual analysis engine that loads and uses rules to
> > create annotations.
> > While you could use an xml descriptor or use the piper "set" command and
> > do things like mapping ruta to ctakes type systems, I would take the
> > alternate approach of "copying" the initialize(..) and process (..)
> methods
> > and modify them to use ctakes types directly.
> >
> > Disclaimer:  I know very little about uima ruta.  At some point I did
> look
> > into it but it was for a specific (ctakes-derivative) project and I
> didn't
> > go further than basic doc perusal.
> >
> > If you move forward with this please let us all know what you find.  I
> > think that there will be great interest in the community.
> >
> > Sean
> > ________________________________________
> > From: Greg Silverman <gm...@umn.edu.INVALID>
> > Sent: Tuesday, May 18, 2021 11:13 AM
> > To: dev@ctakes.apache.org
> > Cc: Himanshu Shekhar Sahoo
> > Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Sean,
> > I was wondering if there was a way to use rule-base lookup of a custom
> > lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> > When I Googled around, I stumbled on UIMA Ruta, but couldn't find
> anything
> > wrt to cTAKES specifics.
> >
> > Thanks!
> >
> >
> > Greg--
> >
> > On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > >  To which ctakes component(s) are you referring?
> > > ________________________________________
> > > From: Greg Silverman <gm...@umn.edu.INVALID>
> > > Sent: Sunday, May 16, 2021 6:02 PM
> > > To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > I looked all over and could not find any information on how to add this
> > > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> > >
> > > Thanks in advance!
> > >
> > > Greg--
> > > --
> > > Greg M. Silverman
> > > Senior Systems Developer
> > > NLP/IE <
> > >
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > > >
> > > Department of Surgery
> > > University of Minnesota
> > > gms@umn.edu
> > >
> >
> >
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> > >
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
> >
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi All,  yes,  the line between "lookup" and rule execution is a little
blurry sometimes.   Here's some more blurriness.

I've done something related, adapting a UIMA tokens regex engine for
Ctakes.  You create a new type in the TypeSystem.  In my case it uses
CONLLDEP Annotations as the tokens to reason over.   You can set up
expressions (rules) that look like this.
(Yes, this case is already covered in the dictionary, but it's an example)

Matcher A:   (lemma=="be");
Matcher B:   /partially|partly/;
Matcher C:   /vaccinated/;

Rule  vaccinated|CUI1234|SNOMED5678:  A? B?  C;

You get the Annotation you've delegated to this task, with the entity
value  "vaccinated|1234|5678"  and the range which spanned the tokens that
caused the annotation rule to fire

(See Stanford's Tokens Regex)

Peter


On Tue, May 18, 2021 at 1:29 PM Miller, Timothy <
Timothy.Miller@childrens.harvard.edu> wrote:

> But Sean, isn't what he's asking for essentially already implemented in
> cTAKES as the custom dictionary? I'm currently using that approach for my
> covid container:
>
> https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container
> Tim
>
> ________________________________________
> From: Finan, Sean <Se...@childrens.harvard.edu>
> Sent: Tuesday, May 18, 2021 11:55 AM
> To: dev@ctakes.apache.org
> Cc: Himanshu Shekhar Sahoo
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]
>
> * External Email - Caution *
>
>
> Hi Greg,
>
> From 30,000 ft, I think that you would want to use the RutaEngine.
>
>
> https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
>
> https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
>
> https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$
>
> That seems to be the actual analysis engine that loads and uses rules to
> create annotations.
> While you could use an xml descriptor or use the piper "set" command and
> do things like mapping ruta to ctakes type systems, I would take the
> alternate approach of "copying" the initialize(..) and process (..) methods
> and modify them to use ctakes types directly.
>
> Disclaimer:  I know very little about uima ruta.  At some point I did look
> into it but it was for a specific (ctakes-derivative) project and I didn't
> go further than basic doc perusal.
>
> If you move forward with this please let us all know what you find.  I
> think that there will be great interest in the community.
>
> Sean
> ________________________________________
> From: Greg Silverman <gm...@umn.edu.INVALID>
> Sent: Tuesday, May 18, 2021 11:13 AM
> To: dev@ctakes.apache.org
> Cc: Himanshu Shekhar Sahoo
> Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean,
> I was wondering if there was a way to use rule-base lookup of a custom
> lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
> When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
> wrt to cTAKES specifics.
>
> Thanks!
>
>
> Greg--
>
> On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> >  To which ctakes component(s) are you referring?
> > ________________________________________
> > From: Greg Silverman <gm...@umn.edu.INVALID>
> > Sent: Sunday, May 16, 2021 6:02 PM
> > To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> > Subject: rule-based lookup for custom lexicon [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > I looked all over and could not find any information on how to add this
> > pipeline component to cTAKES. I assume it uses UIMA Ruta?
> >
> > Thanks in advance!
> >
> > Greg--
> > --
> > Greg M. Silverman
> > Senior Systems Developer
> > NLP/IE <
> >
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> > >
> > Department of Surgery
> > University of Minnesota
> > gms@umn.edu
> >
>
>
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>

Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

Posted by "Miller, Timothy" <Ti...@childrens.harvard.edu>.
But Sean, isn't what he's asking for essentially already implemented in cTAKES as the custom dictionary? I'm currently using that approach for my covid container:
https://github.com/Machine-Learning-for-Medical-Language/ctakes-covid-container
Tim

________________________________________
From: Finan, Sean <Se...@childrens.harvard.edu>
Sent: Tuesday, May 18, 2021 11:55 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL] [SUSPICIOUS]

* External Email - Caution *


Hi Greg,

From 30,000 ft, I think that you would want to use the RutaEngine.

https://urldefense.com/v3/__https://uima.apache.org/d/ruta-current/tools.ruta.book.html*ugr.tools.ruta.ae.basic__;Iw!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickztninUTU$
https://urldefense.com/v3/__https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzI7QF5CI$
https://urldefense.com/v3/__http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java__;!!NZvER7FxgEiBAiR_!6YH1mXOYKMXiRAvLt8yPYWLMMklVu7YuK7KW1hde-iOew4ufAIPpkFHnsJxSvv8r5GjWickzJJ96zT4$

That seems to be the actual analysis engine that loads and uses rules to create annotations.
While you could use an xml descriptor or use the piper "set" command and do things like mapping ruta to ctakes type systems, I would take the alternate approach of "copying" the initialize(..) and process (..) methods and modify them to use ctakes types directly.

Disclaimer:  I know very little about uima ruta.  At some point I did look into it but it was for a specific (ctakes-derivative) project and I didn't go further than basic doc perusal.

If you move forward with this please let us all know what you find.  I think that there will be great interest in the community.

Sean
________________________________________
From: Greg Silverman <gm...@umn.edu.INVALID>
Sent: Tuesday, May 18, 2021 11:13 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *


Hi Sean,
I was wondering if there was a way to use rule-base lookup of a custom
lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
wrt to cTAKES specifics.

Thanks!


Greg--

On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>  To which ctakes component(s) are you referring?
> ________________________________________
> From: Greg Silverman <gm...@umn.edu.INVALID>
> Sent: Sunday, May 16, 2021 6:02 PM
> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> I looked all over and could not find any information on how to add this
> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>
> Thanks in advance!
>
> Greg--
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$ >
Department of Surgery
University of Minnesota
gms@umn.edu

Re: rule-based lookup for custom lexicon [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Greg,

From 30,000 ft, I think that you would want to use the RutaEngine.

https://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.basic
https://javadoc.io/doc/org.apache.uima/ruta-core/latest/org/apache/uima/ruta/engine/RutaEngine.html
http://svn.apache.org/repos/asf/uima/ruta/trunk/ruta-core/src/main/java/org/apache/uima/ruta/engine/RutaEngine.java

That seems to be the actual analysis engine that loads and uses rules to create annotations.
While you could use an xml descriptor or use the piper "set" command and do things like mapping ruta to ctakes type systems, I would take the alternate approach of "copying" the initialize(..) and process (..) methods and modify them to use ctakes types directly.

Disclaimer:  I know very little about uima ruta.  At some point I did look into it but it was for a specific (ctakes-derivative) project and I didn't go further than basic doc perusal.  

If you move forward with this please let us all know what you find.  I think that there will be great interest in the community.

Sean
________________________________________
From: Greg Silverman <gm...@umn.edu.INVALID>
Sent: Tuesday, May 18, 2021 11:13 AM
To: dev@ctakes.apache.org
Cc: Himanshu Shekhar Sahoo
Subject: Re: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *


Hi Sean,
I was wondering if there was a way to use rule-base lookup of a custom
lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
wrt to cTAKES specifics.

Thanks!


Greg--

On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>  To which ctakes component(s) are you referring?
> ________________________________________
> From: Greg Silverman <gm...@umn.edu.INVALID>
> Sent: Sunday, May 16, 2021 6:02 PM
> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> I looked all over and could not find any information on how to add this
> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>
> Thanks in advance!
>
> Greg--
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>


--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!8uKf_4SXyKdCmvlMHvRGddxlzofg64D4_zsPdCThqeMAyn2akyMNI8wqM6yNUZA2N93F-aAsR7I$ >
Department of Surgery
University of Minnesota
gms@umn.edu

Re: rule-based lookup for custom lexicon [EXTERNAL]

Posted by Greg Silverman <gm...@umn.edu.INVALID>.
Hi Sean,
I was wondering if there was a way to use rule-base lookup of a custom
lexicon within cTAKES (say a locally curated list of covd-19 symptoms).
When I Googled around, I stumbled on UIMA Ruta, but couldn't find anything
wrt to cTAKES specifics.

Thanks!


Greg--

On Tue, May 18, 2021 at 10:04 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>  To which ctakes component(s) are you referring?
> ________________________________________
> From: Greg Silverman <gm...@umn.edu.INVALID>
> Sent: Sunday, May 16, 2021 6:02 PM
> To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
> Subject: rule-based lookup for custom lexicon [EXTERNAL]
>
> * External Email - Caution *
>
>
> I looked all over and could not find any information on how to add this
> pipeline component to cTAKES. I assume it uses UIMA Ruta?
>
> Thanks in advance!
>
> Greg--
> --
> Greg M. Silverman
> Senior Systems Developer
> NLP/IE <
> https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$
> >
> Department of Surgery
> University of Minnesota
> gms@umn.edu
>


-- 
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://healthinformatics.umn.edu/research/nlpie-group>
Department of Surgery
University of Minnesota
gms@umn.edu

Re: rule-based lookup for custom lexicon [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
 To which ctakes component(s) are you referring? 
________________________________________
From: Greg Silverman <gm...@umn.edu.INVALID>
Sent: Sunday, May 16, 2021 6:02 PM
To: dev@ctakes.apache.org; Himanshu Shekhar Sahoo
Subject: rule-based lookup for custom lexicon [EXTERNAL]

* External Email - Caution *


I looked all over and could not find any information on how to add this
pipeline component to cTAKES. I assume it uses UIMA Ruta?

Thanks in advance!

Greg--
--
Greg M. Silverman
Senior Systems Developer
NLP/IE <https://urldefense.com/v3/__https://healthinformatics.umn.edu/research/nlpie-group__;!!NZvER7FxgEiBAiR_!6hN356eDesvWNYzsrDMaXgF6IkZw313QU2QUQw5M8Jysvh1K1JxjEBeztZicX1DM2jC0o7_0qAA$ >
Department of Surgery
University of Minnesota
gms@umn.edu