You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Peter Abramowitsch <pa...@gmail.com> on 2020/08/05 22:51:57 UTC

The 2020 UMLS dictionary and our default SNO_RX

Hi All

I've been setting up a custom dictionary using UMLS with the goal of simply
adding a comprehensive genetic vocabulary HGNC  to the latest UMLS SNOMED
and RXNORM vocabularies in the hope of getting somewhere close to the
cTakes default dictionary again.

However, there are changes to concept vocabularies in UMLS2020AA that
affect the ability of cTakes to work well with older notes and possibly the
note-writing practices of older physicians and labs.   Some of the tried
and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil
count) are no longer part of SNOMED.  Probably this is because the
components of these parameters are now broken out into  more granular
types.   The other reason this may be is that a few of these acronyms now
overlap the names of Genes.  EOS is one of them.  This is just speculation.

In order to have these common parameters re-included via their common lab
acronyms, it is necessary to add another common US vocabulary such as
HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by adding
insert statements into the dictionary script, but it might be a
non-scalable exercise.

So my point here is that if, one day, we plan to create a new cTakes
release, and with it, a new UMLS lookup, we may need to consider adding a
third basic vocabulary into our current set of two.

Thoughts?
Peter

Re: The 2020 UMLS dictionary and our default SNO_RX

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi Jeff

Many thanks for all your suggestions.

Things have settled down now.  The blacklist feature has been very useful
for suppressing "false" acronym detection and I will add a few synonyms to
the dict script that have gone away.  Also added some post processing code
(that might be  useful for others?)  - when a range maps to two or more
concepts in different semantic domains, I set the confidence level in each
to 0.5.   Like the gene CAD and the acronym CAD, for example.

Peter

On Fri, Aug 7, 2020 at 6:29 AM Jeffrey Miller <je...@gmail.com> wrote:

> Hi Peter,
>
> Yes, I've chosen active subsets then I think I actually choose the select
> sources to exclude option, but I don't believe that should matter. I leave
> the precedence defaults alone.
>
> Jeff
>
> On Thu, Aug 6, 2020, 2:13 PM Peter Abramowitsch <pa...@gmail.com>
> wrote:
>
> > Hi Jeff
> >
> > You are absolutely right:  when I use sno_rx with the term WBC in a
> simple
> > context it is not showing up as a T059.  I was surprised about that
> >
> > I was wrong about the term I was looking at.   Here's the scenario that
> did
> > change
> >
> > Text context
> > afebrile, but has elevated WBC count;
> >
> > *Using sno_rx*
> > canonical text:  White blood cell count increased (lab result)
> > CUI: C0750426,
> > location:  Leukocytes,
> > location_snomed: 52501007
> > range_text:  elevated WBC count,
> > vocab_term: 414478003,
> > vocab_type: SNOMEDCT_US
> > ...other params.
> >
> > *Using new dict based on 2020AA*
> > Missing:
> >
> > Reason:
> > *grep elevated newdict_750426*
> >     INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> > count','elevated')
> >     INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> > count','elevated')
> > *grep elevated olddict_750426*
> >     INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> > count','elevated')
> >     INSERT INTO CUI_TERMS VALUES(750426,1,3,'elevated wbc count','wbc')
> > <----------------------  missing
> >     INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> > count','elevated')
> >
> > So back to your recommendation on using MMSYS
> >
> > You chose the ACTIVE_SUBSETS option - right?
> > And on the Sources to Exclude/Include page, do you deselect all sources
> to
> > exclude?
> > Have you tweaked the precedence of subsets or do you leave the default
> > order alone?
> >
> > Many thanks,
> > Peter
> >
> > On Thu, Aug 6, 2020 at 8:11 AM Jeffrey Miller <je...@gmail.com> wrote:
> >
> > > Peter,
> > >
> > > I have experienced similar issues with how text spans translate to
> > > different CUIs depending on the included vocabularies as well. I had a
> > > similar conversation with Sean on the dev forum last year I believe.
> > >
> > > I do not believe the behavior of 'wbc' has changed- if I run the
> clinical
> > > pipeline with sno_rx_16ab dictionary, it is tagged as an
> > > AnatomicalSiteMention. Are you seeing something different?
> > >
> > > Jeff
> > >
> > > On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <
> > > pabramowitsch@gmail.com>
> > > wrote:
> > >
> > > > Hi Jeff
> > > >
> > > > I thought I did load them all, but I'll go back and check.
> > > >
> > > > When looking at my gene issue  the result is that the lookup
> > arbitrarily
> > > > (seemingly anyway) flips between one and another when there are
> > overlaps
> > > > between vocabularies.    Ie. I see that both Vocab A & B both contain
> > > geneX
> > > > and geneY.   Neither of these are in SNOMED. So in my output, I get
> one
> > > of
> > > > the genes associated with Vocab A and another with Vocab B.   When I
> > > remove
> > > > Vocab B then obviously both are associated with Vocab A - which is
> > what I
> > > > wanted.
> > > >
> > > > If, for you, WBC is showing up as an anatomical location, rather
> than a
> > > > T059  then probably it's not getting the correct SNOMED code though.
> > > > Wouldn't that be a problem for your researchers?
> > > >
> > > > Peter
> > > >
> > > > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <je...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Peter,
> > > > >
> > > > > If I create a dictionary using UMLS 2020aa with just snomed and
> > rxnorm
> > > my
> > > > > cTAKES dictionary still seems to have a CUI associated with the
> > string
> > > > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> > > > mapping
> > > > > to a lab result TUI, but rather an anatomical site, but it seems to
> > be
> > > > the
> > > > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is
> > > conflicting
> > > > > with that too?
> > > > >
> > > > > Just to double check, when you installed UMLS through
> Metamorphosys,
> > > did
> > > > > you install all of the available vocabularies?
> > > > >
> > > > > Jeff
> > > > >
> > > > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> > > > pabramowitsch@gmail.com
> > > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi All
> > > > > >
> > > > > > I've been setting up a custom dictionary using UMLS with the goal
> > of
> > > > > simply
> > > > > > adding a comprehensive genetic vocabulary HGNC  to the latest
> UMLS
> > > > SNOMED
> > > > > > and RXNORM vocabularies in the hope of getting somewhere close to
> > the
> > > > > > cTakes default dictionary again.
> > > > > >
> > > > > > However, there are changes to concept vocabularies in UMLS2020AA
> > that
> > > > > > affect the ability of cTakes to work well with older notes and
> > > possibly
> > > > > the
> > > > > > note-writing practices of older physicians and labs.   Some of
> the
> > > > tried
> > > > > > and true acronyms such as WBC for leukocytes, RBC, and EOS
> > > (eosinophil
> > > > > > count) are no longer part of SNOMED.  Probably this is because
> the
> > > > > > components of these parameters are now broken out into  more
> > granular
> > > > > > types.   The other reason this may be is that a few of these
> > acronyms
> > > > now
> > > > > > overlap the names of Genes.  EOS is one of them.  This is just
> > > > > speculation.
> > > > > >
> > > > > > In order to have these common parameters re-included via their
> > common
> > > > lab
> > > > > > acronyms, it is necessary to add another common US vocabulary
> such
> > as
> > > > > > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED
> by
> > > > > adding
> > > > > > insert statements into the dictionary script, but it might be a
> > > > > > non-scalable exercise.
> > > > > >
> > > > > > So my point here is that if, one day, we plan to create a new
> > cTakes
> > > > > > release, and with it, a new UMLS lookup, we may need to consider
> > > > adding a
> > > > > > third basic vocabulary into our current set of two.
> > > > > >
> > > > > > Thoughts?
> > > > > > Peter
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: The 2020 UMLS dictionary and our default SNO_RX

Posted by Jeffrey Miller <je...@gmail.com>.
Hi Peter,

Yes, I've chosen active subsets then I think I actually choose the select
sources to exclude option, but I don't believe that should matter. I leave
the precedence defaults alone.

Jeff

On Thu, Aug 6, 2020, 2:13 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi Jeff
>
> You are absolutely right:  when I use sno_rx with the term WBC in a simple
> context it is not showing up as a T059.  I was surprised about that
>
> I was wrong about the term I was looking at.   Here's the scenario that did
> change
>
> Text context
> afebrile, but has elevated WBC count;
>
> *Using sno_rx*
> canonical text:  White blood cell count increased (lab result)
> CUI: C0750426,
> location:  Leukocytes,
> location_snomed: 52501007
> range_text:  elevated WBC count,
> vocab_term: 414478003,
> vocab_type: SNOMEDCT_US
> ...other params.
>
> *Using new dict based on 2020AA*
> Missing:
>
> Reason:
> *grep elevated newdict_750426*
>     INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> count','elevated')
>     INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> count','elevated')
> *grep elevated olddict_750426*
>     INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
> count','elevated')
>     INSERT INTO CUI_TERMS VALUES(750426,1,3,'elevated wbc count','wbc')
> <----------------------  missing
>     INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
> count','elevated')
>
> So back to your recommendation on using MMSYS
>
> You chose the ACTIVE_SUBSETS option - right?
> And on the Sources to Exclude/Include page, do you deselect all sources to
> exclude?
> Have you tweaked the precedence of subsets or do you leave the default
> order alone?
>
> Many thanks,
> Peter
>
> On Thu, Aug 6, 2020 at 8:11 AM Jeffrey Miller <je...@gmail.com> wrote:
>
> > Peter,
> >
> > I have experienced similar issues with how text spans translate to
> > different CUIs depending on the included vocabularies as well. I had a
> > similar conversation with Sean on the dev forum last year I believe.
> >
> > I do not believe the behavior of 'wbc' has changed- if I run the clinical
> > pipeline with sno_rx_16ab dictionary, it is tagged as an
> > AnatomicalSiteMention. Are you seeing something different?
> >
> > Jeff
> >
> > On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <
> > pabramowitsch@gmail.com>
> > wrote:
> >
> > > Hi Jeff
> > >
> > > I thought I did load them all, but I'll go back and check.
> > >
> > > When looking at my gene issue  the result is that the lookup
> arbitrarily
> > > (seemingly anyway) flips between one and another when there are
> overlaps
> > > between vocabularies.    Ie. I see that both Vocab A & B both contain
> > geneX
> > > and geneY.   Neither of these are in SNOMED. So in my output, I get one
> > of
> > > the genes associated with Vocab A and another with Vocab B.   When I
> > remove
> > > Vocab B then obviously both are associated with Vocab A - which is
> what I
> > > wanted.
> > >
> > > If, for you, WBC is showing up as an anatomical location, rather than a
> > > T059  then probably it's not getting the correct SNOMED code though.
> > > Wouldn't that be a problem for your researchers?
> > >
> > > Peter
> > >
> > > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <je...@gmail.com>
> wrote:
> > >
> > > > Hi Peter,
> > > >
> > > > If I create a dictionary using UMLS 2020aa with just snomed and
> rxnorm
> > my
> > > > cTAKES dictionary still seems to have a CUI associated with the
> string
> > > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> > > mapping
> > > > to a lab result TUI, but rather an anatomical site, but it seems to
> be
> > > the
> > > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is
> > conflicting
> > > > with that too?
> > > >
> > > > Just to double check, when you installed UMLS through Metamorphosys,
> > did
> > > > you install all of the available vocabularies?
> > > >
> > > > Jeff
> > > >
> > > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> > > pabramowitsch@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi All
> > > > >
> > > > > I've been setting up a custom dictionary using UMLS with the goal
> of
> > > > simply
> > > > > adding a comprehensive genetic vocabulary HGNC  to the latest UMLS
> > > SNOMED
> > > > > and RXNORM vocabularies in the hope of getting somewhere close to
> the
> > > > > cTakes default dictionary again.
> > > > >
> > > > > However, there are changes to concept vocabularies in UMLS2020AA
> that
> > > > > affect the ability of cTakes to work well with older notes and
> > possibly
> > > > the
> > > > > note-writing practices of older physicians and labs.   Some of the
> > > tried
> > > > > and true acronyms such as WBC for leukocytes, RBC, and EOS
> > (eosinophil
> > > > > count) are no longer part of SNOMED.  Probably this is because the
> > > > > components of these parameters are now broken out into  more
> granular
> > > > > types.   The other reason this may be is that a few of these
> acronyms
> > > now
> > > > > overlap the names of Genes.  EOS is one of them.  This is just
> > > > speculation.
> > > > >
> > > > > In order to have these common parameters re-included via their
> common
> > > lab
> > > > > acronyms, it is necessary to add another common US vocabulary such
> as
> > > > > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by
> > > > adding
> > > > > insert statements into the dictionary script, but it might be a
> > > > > non-scalable exercise.
> > > > >
> > > > > So my point here is that if, one day, we plan to create a new
> cTakes
> > > > > release, and with it, a new UMLS lookup, we may need to consider
> > > adding a
> > > > > third basic vocabulary into our current set of two.
> > > > >
> > > > > Thoughts?
> > > > > Peter
> > > > >
> > > >
> > >
> >
>

Re: The 2020 UMLS dictionary and our default SNO_RX

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi Jeff

You are absolutely right:  when I use sno_rx with the term WBC in a simple
context it is not showing up as a T059.  I was surprised about that

I was wrong about the term I was looking at.   Here's the scenario that did
change

Text context
afebrile, but has elevated WBC count;

*Using sno_rx*
canonical text:  White blood cell count increased (lab result)
CUI: C0750426,
location:  Leukocytes,
location_snomed: 52501007
range_text:  elevated WBC count,
vocab_term: 414478003,
vocab_type: SNOMEDCT_US
...other params.

*Using new dict based on 2020AA*
Missing:

Reason:
*grep elevated newdict_750426*
    INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
count','elevated')
    INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
count','elevated')
*grep elevated olddict_750426*
    INSERT INTO CUI_TERMS VALUES(750426,0,4,'elevated white blood
count','elevated')
    INSERT INTO CUI_TERMS VALUES(750426,1,3,'elevated wbc count','wbc')
<----------------------  missing
    INSERT INTO CUI_TERMS VALUES(750426,0,5,'elevated white blood cell
count','elevated')

So back to your recommendation on using MMSYS

You chose the ACTIVE_SUBSETS option - right?
And on the Sources to Exclude/Include page, do you deselect all sources to
exclude?
Have you tweaked the precedence of subsets or do you leave the default
order alone?

Many thanks,
Peter

On Thu, Aug 6, 2020 at 8:11 AM Jeffrey Miller <je...@gmail.com> wrote:

> Peter,
>
> I have experienced similar issues with how text spans translate to
> different CUIs depending on the included vocabularies as well. I had a
> similar conversation with Sean on the dev forum last year I believe.
>
> I do not believe the behavior of 'wbc' has changed- if I run the clinical
> pipeline with sno_rx_16ab dictionary, it is tagged as an
> AnatomicalSiteMention. Are you seeing something different?
>
> Jeff
>
> On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Hi Jeff
> >
> > I thought I did load them all, but I'll go back and check.
> >
> > When looking at my gene issue  the result is that the lookup arbitrarily
> > (seemingly anyway) flips between one and another when there are overlaps
> > between vocabularies.    Ie. I see that both Vocab A & B both contain
> geneX
> > and geneY.   Neither of these are in SNOMED. So in my output, I get one
> of
> > the genes associated with Vocab A and another with Vocab B.   When I
> remove
> > Vocab B then obviously both are associated with Vocab A - which is what I
> > wanted.
> >
> > If, for you, WBC is showing up as an anatomical location, rather than a
> > T059  then probably it's not getting the correct SNOMED code though.
> > Wouldn't that be a problem for your researchers?
> >
> > Peter
> >
> > On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <je...@gmail.com> wrote:
> >
> > > Hi Peter,
> > >
> > > If I create a dictionary using UMLS 2020aa with just snomed and rxnorm
> my
> > > cTAKES dictionary still seems to have a CUI associated with the string
> > > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> > mapping
> > > to a lab result TUI, but rather an anatomical site, but it seems to be
> > the
> > > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is
> conflicting
> > > with that too?
> > >
> > > Just to double check, when you installed UMLS through Metamorphosys,
> did
> > > you install all of the available vocabularies?
> > >
> > > Jeff
> > >
> > > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> > pabramowitsch@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi All
> > > >
> > > > I've been setting up a custom dictionary using UMLS with the goal of
> > > simply
> > > > adding a comprehensive genetic vocabulary HGNC  to the latest UMLS
> > SNOMED
> > > > and RXNORM vocabularies in the hope of getting somewhere close to the
> > > > cTakes default dictionary again.
> > > >
> > > > However, there are changes to concept vocabularies in UMLS2020AA that
> > > > affect the ability of cTakes to work well with older notes and
> possibly
> > > the
> > > > note-writing practices of older physicians and labs.   Some of the
> > tried
> > > > and true acronyms such as WBC for leukocytes, RBC, and EOS
> (eosinophil
> > > > count) are no longer part of SNOMED.  Probably this is because the
> > > > components of these parameters are now broken out into  more granular
> > > > types.   The other reason this may be is that a few of these acronyms
> > now
> > > > overlap the names of Genes.  EOS is one of them.  This is just
> > > speculation.
> > > >
> > > > In order to have these common parameters re-included via their common
> > lab
> > > > acronyms, it is necessary to add another common US vocabulary such as
> > > > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by
> > > adding
> > > > insert statements into the dictionary script, but it might be a
> > > > non-scalable exercise.
> > > >
> > > > So my point here is that if, one day, we plan to create a new cTakes
> > > > release, and with it, a new UMLS lookup, we may need to consider
> > adding a
> > > > third basic vocabulary into our current set of two.
> > > >
> > > > Thoughts?
> > > > Peter
> > > >
> > >
> >
>

Re: The 2020 UMLS dictionary and our default SNO_RX

Posted by Jeffrey Miller <je...@gmail.com>.
Peter,

I have experienced similar issues with how text spans translate to
different CUIs depending on the included vocabularies as well. I had a
similar conversation with Sean on the dev forum last year I believe.

I do not believe the behavior of 'wbc' has changed- if I run the clinical
pipeline with sno_rx_16ab dictionary, it is tagged as an
AnatomicalSiteMention. Are you seeing something different?

Jeff

On Wed, Aug 5, 2020 at 11:24 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi Jeff
>
> I thought I did load them all, but I'll go back and check.
>
> When looking at my gene issue  the result is that the lookup arbitrarily
> (seemingly anyway) flips between one and another when there are overlaps
> between vocabularies.    Ie. I see that both Vocab A & B both contain geneX
> and geneY.   Neither of these are in SNOMED. So in my output, I get one of
> the genes associated with Vocab A and another with Vocab B.   When I remove
> Vocab B then obviously both are associated with Vocab A - which is what I
> wanted.
>
> If, for you, WBC is showing up as an anatomical location, rather than a
> T059  then probably it's not getting the correct SNOMED code though.
> Wouldn't that be a problem for your researchers?
>
> Peter
>
> On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <je...@gmail.com> wrote:
>
> > Hi Peter,
> >
> > If I create a dictionary using UMLS 2020aa with just snomed and rxnorm my
> > cTAKES dictionary still seems to have a CUI associated with the string
> > 'wbc' that links to the snomed term for Leukocyte (Cell). It is not
> mapping
> > to a lab result TUI, but rather an anatomical site, but it seems to be
> the
> > same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is conflicting
> > with that too?
> >
> > Just to double check, when you installed UMLS through Metamorphosys, did
> > you install all of the available vocabularies?
> >
> > Jeff
> >
> > On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <
> pabramowitsch@gmail.com
> > >
> > wrote:
> >
> > > Hi All
> > >
> > > I've been setting up a custom dictionary using UMLS with the goal of
> > simply
> > > adding a comprehensive genetic vocabulary HGNC  to the latest UMLS
> SNOMED
> > > and RXNORM vocabularies in the hope of getting somewhere close to the
> > > cTakes default dictionary again.
> > >
> > > However, there are changes to concept vocabularies in UMLS2020AA that
> > > affect the ability of cTakes to work well with older notes and possibly
> > the
> > > note-writing practices of older physicians and labs.   Some of the
> tried
> > > and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil
> > > count) are no longer part of SNOMED.  Probably this is because the
> > > components of these parameters are now broken out into  more granular
> > > types.   The other reason this may be is that a few of these acronyms
> now
> > > overlap the names of Genes.  EOS is one of them.  This is just
> > speculation.
> > >
> > > In order to have these common parameters re-included via their common
> lab
> > > acronyms, it is necessary to add another common US vocabulary such as
> > > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by
> > adding
> > > insert statements into the dictionary script, but it might be a
> > > non-scalable exercise.
> > >
> > > So my point here is that if, one day, we plan to create a new cTakes
> > > release, and with it, a new UMLS lookup, we may need to consider
> adding a
> > > third basic vocabulary into our current set of two.
> > >
> > > Thoughts?
> > > Peter
> > >
> >
>

Re: The 2020 UMLS dictionary and our default SNO_RX

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi Jeff

I thought I did load them all, but I'll go back and check.

When looking at my gene issue  the result is that the lookup arbitrarily
(seemingly anyway) flips between one and another when there are overlaps
between vocabularies.    Ie. I see that both Vocab A & B both contain geneX
and geneY.   Neither of these are in SNOMED. So in my output, I get one of
the genes associated with Vocab A and another with Vocab B.   When I remove
Vocab B then obviously both are associated with Vocab A - which is what I
wanted.

If, for you, WBC is showing up as an anatomical location, rather than a
T059  then probably it's not getting the correct SNOMED code though.
Wouldn't that be a problem for your researchers?

Peter

On Wed, Aug 5, 2020 at 5:37 PM Jeffrey Miller <je...@gmail.com> wrote:

> Hi Peter,
>
> If I create a dictionary using UMLS 2020aa with just snomed and rxnorm my
> cTAKES dictionary still seems to have a CUI associated with the string
> 'wbc' that links to the snomed term for Leukocyte (Cell). It is not mapping
> to a lab result TUI, but rather an anatomical site, but it seems to be the
> same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is conflicting
> with that too?
>
> Just to double check, when you installed UMLS through Metamorphosys, did
> you install all of the available vocabularies?
>
> Jeff
>
> On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <pabramowitsch@gmail.com
> >
> wrote:
>
> > Hi All
> >
> > I've been setting up a custom dictionary using UMLS with the goal of
> simply
> > adding a comprehensive genetic vocabulary HGNC  to the latest UMLS SNOMED
> > and RXNORM vocabularies in the hope of getting somewhere close to the
> > cTakes default dictionary again.
> >
> > However, there are changes to concept vocabularies in UMLS2020AA that
> > affect the ability of cTakes to work well with older notes and possibly
> the
> > note-writing practices of older physicians and labs.   Some of the tried
> > and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil
> > count) are no longer part of SNOMED.  Probably this is because the
> > components of these parameters are now broken out into  more granular
> > types.   The other reason this may be is that a few of these acronyms now
> > overlap the names of Genes.  EOS is one of them.  This is just
> speculation.
> >
> > In order to have these common parameters re-included via their common lab
> > acronyms, it is necessary to add another common US vocabulary such as
> > HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by
> adding
> > insert statements into the dictionary script, but it might be a
> > non-scalable exercise.
> >
> > So my point here is that if, one day, we plan to create a new cTakes
> > release, and with it, a new UMLS lookup, we may need to consider adding a
> > third basic vocabulary into our current set of two.
> >
> > Thoughts?
> > Peter
> >
>

Re: The 2020 UMLS dictionary and our default SNO_RX

Posted by Jeffrey Miller <je...@gmail.com>.
Hi Peter,

If I create a dictionary using UMLS 2020aa with just snomed and rxnorm my
cTAKES dictionary still seems to have a CUI associated with the string
'wbc' that links to the snomed term for Leukocyte (Cell). It is not mapping
to a lab result TUI, but rather an anatomical site, but it seems to be the
same CUI that 'wbc' resolves to in sno_rx_16ab. Maybe HGNC is conflicting
with that too?

Just to double check, when you installed UMLS through Metamorphosys, did
you install all of the available vocabularies?

Jeff

On Wed, Aug 5, 2020 at 6:52 PM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi All
>
> I've been setting up a custom dictionary using UMLS with the goal of simply
> adding a comprehensive genetic vocabulary HGNC  to the latest UMLS SNOMED
> and RXNORM vocabularies in the hope of getting somewhere close to the
> cTakes default dictionary again.
>
> However, there are changes to concept vocabularies in UMLS2020AA that
> affect the ability of cTakes to work well with older notes and possibly the
> note-writing practices of older physicians and labs.   Some of the tried
> and true acronyms such as WBC for leukocytes, RBC, and EOS (eosinophil
> count) are no longer part of SNOMED.  Probably this is because the
> components of these parameters are now broken out into  more granular
> types.   The other reason this may be is that a few of these acronyms now
> overlap the names of Genes.  EOS is one of them.  This is just speculation.
>
> In order to have these common parameters re-included via their common lab
> acronyms, it is necessary to add another common US vocabulary such as
> HL7-V3.0 or NCI_CDISC.  Of course one can remap back into SNOMED by adding
> insert statements into the dictionary script, but it might be a
> non-scalable exercise.
>
> So my point here is that if, one day, we plan to create a new cTakes
> release, and with it, a new UMLS lookup, we may need to consider adding a
> third basic vocabulary into our current set of two.
>
> Thoughts?
> Peter
>