You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@ctakes.apache.org by Vlad Valtchinov <vl...@gmail.com> on 2014/01/04 22:03:47 UTC

RadLex as dictionary in cTakes

Hello All-

 

best wishes for a good 2014.

 

I have a question about using a radiology-related ontology,

RadLex, in cTakes. If somebody out there has imported and used in 

processing radiology reports we'd like to hear from them to share

their experience.

 

A couple of specific questions:

 

1.      what are the pros and cons of implementing RadLex as a custom
dictionary, or as a part of UMLS-sanctioned ontology 

even though it is not yet part of UMLS but it is part of NCI's Metathesaurus

2.      what would be a the preferred way to import it in cTakes - via an
RRF upload, or other custom format

3.      could one take a NCImThesaurus download from NCI and use it to
import RadLex

4.      in a recent discussion regarding difference between custom
dictionaries and UMLS imported ontologies 

"How to augment/modify UMLS resources?" it looks like cTakes would actually
process the note once for the UMLS

supplied resources and once for the (each?) custom dictionary, if i
understand this correctly. Wouldn't it then be very

inefficient to have multiple custom dictionaries, as opposed to try to
maximize the UMLS ones? Any difference in 

the YTex behavior (VJ, please chime in)   

 

More generally, what are the pros and cons of using all of i.e. the UMLS
Metathesaurus (or the NCImThesaurus) as

ontology dictionaries in cTakes, apart from licensing issues?

 

Thanks, 

vlad

Brigham rad

Re: RadLex as dictionary in cTakes

Posted by Karthik Sarma <ks...@ksarma.com>.

Some time ago I did exactly what vijay suggested in 2. It works with
exactly the limitations described. For my purposes at the time it was ok
for me to just always prefer radlex annotations during analysis.

On Monday, January 6, 2014, vijay garla wrote:

> Re 1)
> I have augmented umls-derived dictionaries with custom dictionaries
> (single dictionary + single dictionary lookup component).  One disadvantage
> of using radlex in addition to ctakes is that overlapping concepts will be
> mapped to both ctakes & radlex (duplicates).  Another disadvantage is that
> Word Sense Disambiguation is not possible across UMLS & RADLEX (need the
> concept relations).
>
> Re 2)
> I can't speak to the 'preferred' way of importing this, but what I would
> do is import radlex into a DB, and put my own dictionary lookup table
> together from the UMLS and RADLEX.  The dictionary lookup table - be it in
> lucene, db, or csv - has at a minimum the following columns:
> * concept id (e.g. cui)
> * tokenized string (string run through ctakes tokenizer, each token
> delimited by a space char)
> * first word of tokenized string
> And optional
> * semantic type (tui) of the concept
>
> Re 3)
> You will have to create a dictionary out of this as discussed above
>
> Re 4)
> I prefer to have a single dictionary and a single dictionary lookup
> component for efficiency and to avoid duplicate annotation.
>
> One issue is that the ctakes dictionary lookup component is hardwired to
> output a specific annotation type (EntityMention, DrugMentionAnnotation,
> etc.)  If you don't need the extra annotations added by Drug NER and the
> relation extractor (which decorates the AnatomicalSiteMention I believe),
> then go with a single dictionary/single dictionary lookup component that
> outputs an EntityMention for each annotation (this is the YTEX default
> config).
>
> With ytex 0.8/ctakes 2.5 we created a dictionary lookup component that
> figured out which entity type to output (EntityMention vs DrugMention)
> based on the CUIs identified.  With ctakes 3.1 there are more types of
> entities (AnatomicalSiteMention, .. and more?).  I have yet to create a
> dictionary lookup component that dynamically determines which type of
> entity to create based on the CUIs/TUIs contained.  The way I would imagine
> doing this is as follows: In the dictionary lookup component, we could have
> a map of semantic types to subclass of EntityMention.  If any of the TUIs
> of the matched concepts are in this map, create the subclass of
> EntityMention.  E.g. We have a single dictionary, and a single dictionary
> lookup component, which when coming across 'hand' will find the CUI
> C0018563 which has the TUI T023 'Body Part'.  T023 is mapped to the
> AnatomicalSiteMention class and therefore the DictionaryLookup would create
> an AnatomicalSiteMention annotation.
>
> If people thing this a good idea, I'll add a jira ticket for 'Smarter
> DictonaryLookup'
>
> vj
>
>
>
> On Sat, Jan 4, 2014 at 4:03 PM, Vlad Valtchinov <vlad.valtchinov@gmail.com<javascript:_e({}, 'cvml', 'vlad.valtchinov@gmail.com');>
> > wrote:
>
>> Hello All-
>>
>>
>>
>> best wishes for a good 2014.
>>
>>
>>
>> I have a question about using a radiology-related ontology,
>>
>> RadLex, in cTakes. If somebody out there has imported and used in
>>
>> processing radiology reports we’d like to hear from them to share
>>
>> their experience.
>>
>>
>>
>> A couple of specific questions:
>>
>>
>>
>> 1.      what are the pros and cons of implementing RadLex as a custom
>> dictionary, or as a part of UMLS-sanctioned ontology
>>
>> even though it is not yet part of UMLS but it is part of NCI’s
>> Metathesaurus
>>
>> 2.      what would be a the preferred way to import it in cTakes – via
>> an RRF upload, or other custom format
>>
>> 3.      could one take a NCImThesaurus download from NCI and use it to
>> import RadLex
>>
>> 4.      in a recent discussion regarding difference between custom
>> dictionaries and UMLS imported ontologies
>>
>> “How to augment/modify UMLS resources?” it looks like cTakes would
>> actually process the note once for the UMLS
>>
>> supplied resources and once for the (each?) custom dictionary, if i
>> understand this correctly. Wouldn’t it then be very
>>
>> inefficient to have multiple custom dictionaries, as opposed to try to
>> maximize the UMLS ones? Any difference in
>>
>> the YTex behavior (VJ, please chime in)
>>
>>
>>
>> More generally, what are the pros and cons of using all of i.e. the UMLS
>> Metathesaurus (or the NCImThesaurus) as
>>
>> ontology dictionaries in cTakes, apart from licensing issues?
>>
>>
>>
>> Thanks,
>>
>> vlad
>>
>> Brigham rad
>>
>>
>>
>>
>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "ytex-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to ytex-users+unsubscribe@googlegroups.com <javascript:_e({},
>> 'cvml', 'ytex-users%2Bunsubscribe@googlegroups.com');>.
>> To post to this group, send email to ytex-users@googlegroups.com<javascript:_e({}, 'cvml', 'ytex-users@googlegroups.com');>
>> .
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/ytex-users/030101cf0990%2477021700%2465064500%24%40gmail.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

-- 




--
Karthik Sarma
UCLA Medical Scientist Training Program Class of 20??
Member, UCLA Medical Imaging & Informatics Lab
Member, CA Delegation to the House of Delegates of the American Medical
Association
ksarma@ksarma.com
gchat: ksarma@gmail.com
linkedin: www.linkedin.com/in/ksarma

Re: RadLex as dictionary in cTakes

Posted by vijay garla <vn...@gmail.com>.

Hi Vlad,

Assuming you have UMLS installed in your DB with RXNORM and RADLEX (I
didn't know radlex was included in the umls), and have run the YTEX
install, you can very easily create new dictionary lookup tables.

to do so,
* delete the contents of v_snomed_fword_lookup (delete
from v_snomed_fword_lookup)
* modify this script to include whatever SABs (source vocabularies) you
like:
https://svn.apache.org/repos/asf/ctakes/branches/ytex/ctakes-ytex/scripts/data/mssql/umls/insert_view.sql
You will see the following line:
and mrc.sab in ( 'SNOMEDCT','RXNORM' )
Change this to include whatever source vocabularies you want, and re-run

-vj


On Wed, Feb 5, 2014 at 5:05 PM, <vl...@gmail.com> wrote:

> A follow-up question regarding custom UMLS dictionaries in cTakes.
>
> Thanks to all the people who took the time to post, below.
>
> Additionally, I wanted to request opinion on 2-3 things.
>
> 1. how does one prepare the current Snomed + Rxnorm UMLS ontologies for
> import in cTakes, as a database.
> 2. in these steps, does one need only the RRF files (in META dir), or also
> the files from the Semantic Network for the given UMLS subset, stored in
> the NET directory by MMorpho.
> 2. what are the steps one needs to follow to make cTakes 3.1.1 use
> a subset of UMLS pre-stored in the db.
>
> Any and all leads are highly appreciated.
>
> Best,
> vlad
>
>
> On Monday, January 6, 2014 9:06:23 AM UTC-5, vijay garla wrote:
>
>> Re 1)
>> I have augmented umls-derived dictionaries with custom dictionaries
>> (single dictionary + single dictionary lookup component).  One disadvantage
>> of using radlex in addition to ctakes is that overlapping concepts will be
>> mapped to both ctakes & radlex (duplicates).  Another disadvantage is that
>> Word Sense Disambiguation is not possible across UMLS & RADLEX (need the
>> concept relations).
>>
>> Re 2)
>> I can't speak to the 'preferred' way of importing this, but what I would
>> do is import radlex into a DB, and put my own dictionary lookup table
>> together from the UMLS and RADLEX.  The dictionary lookup table - be it in
>> lucene, db, or csv - has at a minimum the following columns:
>> * concept id (e.g. cui)
>> * tokenized string (string run through ctakes tokenizer, each token
>> delimited by a space char)
>> * first word of tokenized string
>> And optional
>> * semantic type (tui) of the concept
>>
>> Re 3)
>> You will have to create a dictionary out of this as discussed above
>>
>> Re 4)
>> I prefer to have a single dictionary and a single dictionary lookup
>> component for efficiency and to avoid duplicate annotation.
>>
>> One issue is that the ctakes dictionary lookup component is hardwired to
>> output a specific annotation type (EntityMention, DrugMentionAnnotation,
>> etc.)  If you don't need the extra annotations added by Drug NER and the
>> relation extractor (which decorates the AnatomicalSiteMention I believe),
>> then go with a single dictionary/single dictionary lookup component that
>> outputs an EntityMention for each annotation (this is the YTEX default
>> config).
>>
>> With ytex 0.8/ctakes 2.5 we created a dictionary lookup component that
>> figured out which entity type to output (EntityMention vs DrugMention)
>> based on the CUIs identified.  With ctakes 3.1 there are more types of
>> entities (AnatomicalSiteMention, .. and more?).  I have yet to create a
>> dictionary lookup component that dynamically determines which type of
>> entity to create based on the CUIs/TUIs contained.  The way I would imagine
>> doing this is as follows: In the dictionary lookup component, we could have
>> a map of semantic types to subclass of EntityMention.  If any of the TUIs
>> of the matched concepts are in this map, create the subclass of
>> EntityMention.  E.g. We have a single dictionary, and a single dictionary
>> lookup component, which when coming across 'hand' will find the CUI
>> C0018563 which has the TUI T023 'Body Part'.  T023 is mapped to the
>> AnatomicalSiteMention class and therefore the DictionaryLookup would create
>> an AnatomicalSiteMention annotation.
>>
>> If people thing this a good idea, I'll add a jira ticket for 'Smarter
>> DictonaryLookup'
>>
>> vj
>>
>>
>>
>> On Sat, Jan 4, 2014 at 4:03 PM, Vlad Valtchinov <vl...@gmail.com>wrote:
>>
>>> Hello All-
>>>
>>>
>>>
>>> best wishes for a good 2014.
>>>
>>>
>>>
>>> I have a question about using a radiology-related ontology,
>>>
>>> RadLex, in cTakes. If somebody out there has imported and used in
>>>
>>> processing radiology reports we'd like to hear from them to share
>>>
>>> their experience.
>>>
>>>
>>>
>>> A couple of specific questions:
>>>
>>>
>>>
>>> 1.      what are the pros and cons of implementing RadLex as a custom
>>> dictionary, or as a part of UMLS-sanctioned ontology
>>>
>>> even though it is not yet part of UMLS but it is part of NCI's
>>> Metathesaurus
>>>
>>> 2.      what would be a the preferred way to import it in cTakes - via
>>> an RRF upload, or other custom format
>>>
>>> 3.      could one take a NCImThesaurus download from NCI and use it to
>>> import RadLex
>>>
>>> 4.      in a recent discussion regarding difference between custom
>>> dictionaries and UMLS imported ontologies
>>>
>>> "How to augment/modify UMLS resources?" it looks like cTakes would
>>> actually process the note once for the UMLS
>>>
>>> supplied resources and once for the (each?) custom dictionary, if i
>>> understand this correctly. Wouldn't it then be very
>>>
>>> inefficient to have multiple custom dictionaries, as opposed to try to
>>> maximize the UMLS ones? Any difference in
>>>
>>> the YTex behavior (VJ, please chime in)
>>>
>>>
>>>
>>> More generally, what are the pros and cons of using all of i.e. the UMLS
>>> Metathesaurus (or the NCImThesaurus) as
>>>
>>> ontology dictionaries in cTakes, apart from licensing issues?
>>>
>>>
>>>
>>> Thanks,
>>>
>>> vlad
>>>
>>> Brigham rad
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "ytex-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to ytex-users+...@googlegroups.com.
>>> To post to this group, send email to ytex-...@googlegroups.com.
>>>
>>> To view this discussion on the web visit https://groups.google.com/d/
>>> msgid/ytex-users/030101cf0990%2477021700%2465064500%24%40gmail.com.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "ytex-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ytex-users+unsubscribe@googlegroups.com.
> To post to this group, send email to ytex-users@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ytex-users/553c1e4e-afe3-4866-811d-60e447536410%40googlegroups.com
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

Re: RadLex as dictionary in cTakes

Posted by vl...@gmail.com.

A follow-up question regarding custom UMLS dictionaries in cTakes.

Thanks to all the people who took the time to post, below.

Additionally, I wanted to request opinion on 2-3 things.

1. how does one prepare the current Snomed + Rxnorm UMLS ontologies for 
import in cTakes, as a database.
2. in these steps, does one need only the RRF files (in META dir), or also 
the files from the Semantic Network for the given UMLS subset, stored in 
the NET directory by MMorpho.
2. what are the steps one needs to follow to make cTakes 3.1.1 use a subset 
of UMLS pre-stored in the db.

Any and all leads are highly appreciated.

Best,
vlad


On Monday, January 6, 2014 9:06:23 AM UTC-5, vijay garla wrote:

> Re 1)
> I have augmented umls-derived dictionaries with custom dictionaries 
> (single dictionary + single dictionary lookup component).  One disadvantage 
> of using radlex in addition to ctakes is that overlapping concepts will be 
> mapped to both ctakes & radlex (duplicates).  Another disadvantage is that 
> Word Sense Disambiguation is not possible across UMLS & RADLEX (need the 
> concept relations).
>
> Re 2)
> I can't speak to the 'preferred' way of importing this, but what I would 
> do is import radlex into a DB, and put my own dictionary lookup table 
> together from the UMLS and RADLEX.  The dictionary lookup table - be it in 
> lucene, db, or csv - has at a minimum the following columns:
> * concept id (e.g. cui)
> * tokenized string (string run through ctakes tokenizer, each token 
> delimited by a space char)
> * first word of tokenized string
> And optional
> * semantic type (tui) of the concept
>
> Re 3) 
> You will have to create a dictionary out of this as discussed above
>
> Re 4)
> I prefer to have a single dictionary and a single dictionary lookup 
> component for efficiency and to avoid duplicate annotation.  
>
> One issue is that the ctakes dictionary lookup component is hardwired to 
> output a specific annotation type (EntityMention, DrugMentionAnnotation, 
> etc.)  If you don't need the extra annotations added by Drug NER and the 
> relation extractor (which decorates the AnatomicalSiteMention I believe), 
> then go with a single dictionary/single dictionary lookup component that 
> outputs an EntityMention for each annotation (this is the YTEX default 
> config).
>
> With ytex 0.8/ctakes 2.5 we created a dictionary lookup component that 
> figured out which entity type to output (EntityMention vs DrugMention) 
> based on the CUIs identified.  With ctakes 3.1 there are more types of 
> entities (AnatomicalSiteMention, .. and more?).  I have yet to create a 
> dictionary lookup component that dynamically determines which type of 
> entity to create based on the CUIs/TUIs contained.  The way I would imagine 
> doing this is as follows: In the dictionary lookup component, we could have 
> a map of semantic types to subclass of EntityMention.  If any of the TUIs 
> of the matched concepts are in this map, create the subclass of 
> EntityMention.  E.g. We have a single dictionary, and a single dictionary 
> lookup component, which when coming across 'hand' will find the CUI 
> C0018563 which has the TUI T023 'Body Part'.  T023 is mapped to the 
> AnatomicalSiteMention class and therefore the DictionaryLookup would create 
> an AnatomicalSiteMention annotation.
>
> If people thing this a good idea, I'll add a jira ticket for 'Smarter 
> DictonaryLookup'
>
> vj
>
>
>
> On Sat, Jan 4, 2014 at 4:03 PM, Vlad Valtchinov <vlad.va...@gmail.com<javascript:>
> > wrote:
>
>> Hello All-
>>
>>  
>>
>> best wishes for a good 2014.
>>
>>  
>>
>> I have a question about using a radiology-related ontology,
>>
>> RadLex, in cTakes. If somebody out there has imported and used in 
>>
>> processing radiology reports we’d like to hear from them to share
>>
>> their experience.
>>
>>  
>>
>> A couple of specific questions:
>>
>>  
>>
>> 1.      what are the pros and cons of implementing RadLex as a custom 
>> dictionary, or as a part of UMLS-sanctioned ontology 
>>
>> even though it is not yet part of UMLS but it is part of NCI’s 
>> Metathesaurus
>>
>> 2.      what would be a the preferred way to import it in cTakes – via 
>> an RRF upload, or other custom format
>>
>> 3.      could one take a NCImThesaurus download from NCI and use it to 
>> import RadLex
>>
>> 4.      in a recent discussion regarding difference between custom 
>> dictionaries and UMLS imported ontologies 
>>
>> “How to augment/modify UMLS resources?” it looks like cTakes would 
>> actually process the note once for the UMLS
>>
>> supplied resources and once for the (each?) custom dictionary, if i 
>> understand this correctly. Wouldn’t it then be very
>>
>> inefficient to have multiple custom dictionaries, as opposed to try to 
>> maximize the UMLS ones? Any difference in 
>>
>> the YTex behavior (VJ, please chime in)   
>>
>>  
>>
>> More generally, what are the pros and cons of using all of i.e. the UMLS 
>> Metathesaurus (or the NCImThesaurus) as
>>
>> ontology dictionaries in cTakes, apart from licensing issues?
>>
>>  
>>
>> Thanks, 
>>
>> vlad
>>
>> Brigham rad
>>
>>  
>>
>>  
>>
>>  
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "ytex-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to ytex-users+...@googlegroups.com <javascript:>.
>> To post to this group, send email to ytex-...@googlegroups.com<javascript:>
>> .
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/ytex-users/030101cf0990%2477021700%2465064500%24%40gmail.com
>> .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>

Re: RadLex as dictionary in cTakes

Posted by vijay garla <vn...@gmail.com>.

Re 1)
I have augmented umls-derived dictionaries with custom dictionaries (single
dictionary + single dictionary lookup component).  One disadvantage of
using radlex in addition to ctakes is that overlapping concepts will be
mapped to both ctakes & radlex (duplicates).  Another disadvantage is that
Word Sense Disambiguation is not possible across UMLS & RADLEX (need the
concept relations).

Re 2)
I can't speak to the 'preferred' way of importing this, but what I would do
is import radlex into a DB, and put my own dictionary lookup table together
from the UMLS and RADLEX.  The dictionary lookup table - be it in lucene,
db, or csv - has at a minimum the following columns:
* concept id (e.g. cui)
* tokenized string (string run through ctakes tokenizer, each token
delimited by a space char)
* first word of tokenized string
And optional
* semantic type (tui) of the concept

Re 3)
You will have to create a dictionary out of this as discussed above

Re 4)
I prefer to have a single dictionary and a single dictionary lookup
component for efficiency and to avoid duplicate annotation.

One issue is that the ctakes dictionary lookup component is hardwired to
output a specific annotation type (EntityMention, DrugMentionAnnotation,
etc.)  If you don't need the extra annotations added by Drug NER and the
relation extractor (which decorates the AnatomicalSiteMention I believe),
then go with a single dictionary/single dictionary lookup component that
outputs an EntityMention for each annotation (this is the YTEX default
config).

With ytex 0.8/ctakes 2.5 we created a dictionary lookup component that
figured out which entity type to output (EntityMention vs DrugMention)
based on the CUIs identified.  With ctakes 3.1 there are more types of
entities (AnatomicalSiteMention, .. and more?).  I have yet to create a
dictionary lookup component that dynamically determines which type of
entity to create based on the CUIs/TUIs contained.  The way I would imagine
doing this is as follows: In the dictionary lookup component, we could have
a map of semantic types to subclass of EntityMention.  If any of the TUIs
of the matched concepts are in this map, create the subclass of
EntityMention.  E.g. We have a single dictionary, and a single dictionary
lookup component, which when coming across 'hand' will find the CUI
C0018563 which has the TUI T023 'Body Part'.  T023 is mapped to the
AnatomicalSiteMention class and therefore the DictionaryLookup would create
an AnatomicalSiteMention annotation.

If people thing this a good idea, I'll add a jira ticket for 'Smarter
DictonaryLookup'

vj

On Sat, Jan 4, 2014 at 4:03 PM, Vlad Valtchinov
<vl...@gmail.com>wrote:

> Hello All-
>
>
>
> best wishes for a good 2014.
>
>
>
> I have a question about using a radiology-related ontology,
>
> RadLex, in cTakes. If somebody out there has imported and used in
>
> processing radiology reports we’d like to hear from them to share
>
> their experience.
>
>
>
> A couple of specific questions:
>
>
>
> 1.      what are the pros and cons of implementing RadLex as a custom
> dictionary, or as a part of UMLS-sanctioned ontology
>
> even though it is not yet part of UMLS but it is part of NCI’s
> Metathesaurus
>
> 2.      what would be a the preferred way to import it in cTakes – via an
> RRF upload, or other custom format
>
> 3.      could one take a NCImThesaurus download from NCI and use it to
> import RadLex
>
> 4.      in a recent discussion regarding difference between custom
> dictionaries and UMLS imported ontologies
>
> “How to augment/modify UMLS resources?” it looks like cTakes would
> actually process the note once for the UMLS
>
> supplied resources and once for the (each?) custom dictionary, if i
> understand this correctly. Wouldn’t it then be very
>
> inefficient to have multiple custom dictionaries, as opposed to try to
> maximize the UMLS ones? Any difference in
>
> the YTex behavior (VJ, please chime in)
>
>
>
> More generally, what are the pros and cons of using all of i.e. the UMLS
> Metathesaurus (or the NCImThesaurus) as
>
> ontology dictionaries in cTakes, apart from licensing issues?
>
>
>
> Thanks,
>
> vlad
>
> Brigham rad
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "ytex-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ytex-users+unsubscribe@googlegroups.com.
> To post to this group, send email to ytex-users@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/ytex-users/030101cf0990%2477021700%2465064500%24%40gmail.com
> .
> For more options, visit https://groups.google.com/groups/opt_out.
>