You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ctakes.apache.org by Peter Abramowitsch <pa...@gmail.com> on 2020/08/14 03:57:57 UTC

Need a little more help on dictionaries

Hi All

I'm able to create a subset with the UMLS mmsys tool, use the dictionary
creator on the full UMLS release, create, install and tweak the scripts
adding or removing aliases etc.  My goal is simply to add HUGO gene terms
to SNOMED and RXNORM.

However I must be missing some bit of information on the use of mmsys or
the dictionary creator, because some very common terms are missing from my
dictionary but present in the released sno_rx

As an example, the acronym SOB
in mmsys, the term SOB is present in my subset, and it is mapped into
SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
I see the cui_tui mapping it into the correct TUI for a finding  INSERT
INTO TUI VALUES(13404,184)
I see the cui and the preferred term "dyspnea" in my *script file, and I
can resolve it in a note using the default consumer and obtaining the
correct SNOMED ID
I see lots of cui_term entries for the same CUI, and I can resolve them
too.  but  SOB is not present in my cui terms.
How did it get there?

So either - I am not using one of the tools correctly, or in creating
SNO_RX, someone has added SOB by hand rather than using the creator.  And
if they have, they have probably also done other tweaks.

Sean, Ghandi or Jeff
Can you explain this?

Peter

Re: Need a little more help on dictionaries

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi Gandhi,  Yes  I added snomed, rxnorm and hugo( hgnc) so yes, i have
thousands of Snomed associations in my script file.  But some, like my
example aren't there. These missing ones tend to be the short acronyms
rather than the template phrases.  But they're present in mmsys with the
snomed mapping.

I tried to follow Jeff's suggestion of adding more likely vocabularies to
these three but then i got inconsistent results. Some terms that had shown
up with Snomed codes started being reported in other vocabs.

It may or may not be connected, but could you explain the function/behavior
of the source/destination checkboxes on the dictionary creator?

Peter

On Fri, Aug 14, 2020, 6:45 AM gandhi rajan <ga...@gmail.com> wrote:

> Hi Peter,
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> >>> Do you mean the entries are missing in your database? When I tried the
> latest UMLS installation I dont see snomed dictionary terms getting added
> by default. Did you selected snomed dictionary in dictionary GUI or it
> showed up in GUI?
>
> On Fri, Aug 14, 2020 at 9:32 AM Peter Abramowitsch <
> pabramowitsch@gmail.com>
> wrote:
>
> > Hi All
> >
> > I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> > creator on the full UMLS release, create, install and tweak the scripts
> > adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> > to SNOMED and RXNORM.
> >
> > However I must be missing some bit of information on the use of mmsys or
> > the dictionary creator, because some very common terms are missing from
> my
> > dictionary but present in the released sno_rx
> >
> > As an example, the acronym SOB
> > in mmsys, the term SOB is present in my subset, and it is mapped into
> > SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> > I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> > INTO TUI VALUES(13404,184)
> > I see the cui and the preferred term "dyspnea" in my *script file, and I
> > can resolve it in a note using the default consumer and obtaining the
> > correct SNOMED ID
> > I see lots of cui_term entries for the same CUI, and I can resolve them
> > too.  but  SOB is not present in my cui terms.
> > How did it get there?
> >
> > So either - I am not using one of the tools correctly, or in creating
> > SNO_RX, someone has added SOB by hand rather than using the creator.  And
> > if they have, they have probably also done other tweaks.
> >
> > Sean, Ghandi or Jeff
> > Can you explain this?
> >
> > Peter
> >
>
>
> --
> Regards,
> Gandhi
>
> "The best way to find urself is to lose urself in the service of others
> !!!"
>

Re: Need a little more help on dictionaries

Posted by gandhi rajan <ga...@gmail.com>.
Hi Peter,

However I must be missing some bit of information on the use of mmsys or
the dictionary creator, because some very common terms are missing from my
dictionary but present in the released sno_rx

>>> Do you mean the entries are missing in your database? When I tried the
latest UMLS installation I dont see snomed dictionary terms getting added
by default. Did you selected snomed dictionary in dictionary GUI or it
showed up in GUI?

On Fri, Aug 14, 2020 at 9:32 AM Peter Abramowitsch <pa...@gmail.com>
wrote:

> Hi All
>
> I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> creator on the full UMLS release, create, install and tweak the scripts
> adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> to SNOMED and RXNORM.
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> As an example, the acronym SOB
> in mmsys, the term SOB is present in my subset, and it is mapped into
> SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> INTO TUI VALUES(13404,184)
> I see the cui and the preferred term "dyspnea" in my *script file, and I
> can resolve it in a note using the default consumer and obtaining the
> correct SNOMED ID
> I see lots of cui_term entries for the same CUI, and I can resolve them
> too.  but  SOB is not present in my cui terms.
> How did it get there?
>
> So either - I am not using one of the tools correctly, or in creating
> SNO_RX, someone has added SOB by hand rather than using the creator.  And
> if they have, they have probably also done other tweaks.
>
> Sean, Ghandi or Jeff
> Can you explain this?
>
> Peter
>


-- 
Regards,
Gandhi

"The best way to find urself is to lose urself in the service of others !!!"

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Thanks Sean  ... now I'm going to jog your memory:

I quickly went through the dictionary code.  You were right.  There was a
class AutoTermExtractor in org.apache.ctakes.gui.dictionary.umls which
looks like it did what you said.  But all of it is all commented out.

Then there's another bit of code with a function extractAbbreviations() in
UmlsTermUtil, and this one relies on externalized files including this
one:  default/RightAbbreviations.txt.  And this file contains (SOB), one of
the abbreviations I was looking for.

Now this file seems to exist in multiple versions

cogitext:trunk-java8 peterabramowitsch$ find . -name
"RightAbbreviations.txt" -exec wc -l {} \;
    1178
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/default/RightAbbreviations.txt
       0
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/small/RightAbbreviations.txt
       8
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/tim/RightAbbreviations.txt
       0
./ctakes-gui-res/target/classes/org/apache/ctakes/gui/dictionary/data/tiny/RightAbbreviations.txt

Does this jog your memory enough to fill in the history and tell me what I
need to do?

Peter


On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> I don't have an answer but I do have a question:
>
> In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> -Shortness of breath" ?
>
> I think that the simple "SOB" and "sob" entries might be from other
> vocabularies.
>
> There is (was?) logic in the dictionary creator to multiply things like
> "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and create 3
> synonym entries: full, left and right.  There is a requirement that the
> left side be all caps and a fitting acronym for the right side.  However, I
> vacillated on the correctness of this behavior as almost all terms already
> had the 3 entries.  I am not sure what the current version of the creator
> does.
>
> Dictionary creation is indeed a touchy operation.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Thursday, August 13, 2020 11:57 PM
> To: dev@ctakes.apache.org
> Subject: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> creator on the full UMLS release, create, install and tweak the scripts
> adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> to SNOMED and RXNORM.
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> As an example, the acronym SOB
> in mmsys, the term SOB is present in my subset, and it is mapped into
> SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> INTO TUI VALUES(13404,184)
> I see the cui and the preferred term "dyspnea" in my *script file, and I
> can resolve it in a note using the default consumer and obtaining the
> correct SNOMED ID
> I see lots of cui_term entries for the same CUI, and I can resolve them
> too.  but  SOB is not present in my cui terms.
> How did it get there?
>
> So either - I am not using one of the tools correctly, or in creating
> SNO_RX, someone has added SOB by hand rather than using the creator.  And
> if they have, they have probably also done other tweaks.
>
> Sean, Ghandi or Jeff
> Can you explain this?
>
> Peter
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Thanks Sean.  Interesting.  I will have a look. Then have a look in the
creator code.


  I should work through emails from older to newer.  Just responded to
Gandhi.

On Fri, Aug 14, 2020, 4:53 AM Finan, Sean <Se...@childrens.harvard.edu>
wrote:

> Hi Peter,
>
> I don't have an answer but I do have a question:
>
> In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> -Shortness of breath" ?
>
> I think that the simple "SOB" and "sob" entries might be from other
> vocabularies.
>
> There is (was?) logic in the dictionary creator to multiply things like
> "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and create 3
> synonym entries: full, left and right.  There is a requirement that the
> left side be all caps and a fitting acronym for the right side.  However, I
> vacillated on the correctness of this behavior as almost all terms already
> had the 3 entries.  I am not sure what the current version of the creator
> does.
>
> Dictionary creation is indeed a touchy operation.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Thursday, August 13, 2020 11:57 PM
> To: dev@ctakes.apache.org
> Subject: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> creator on the full UMLS release, create, install and tweak the scripts
> adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> to SNOMED and RXNORM.
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> As an example, the acronym SOB
> in mmsys, the term SOB is present in my subset, and it is mapped into
> SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> INTO TUI VALUES(13404,184)
> I see the cui and the preferred term "dyspnea" in my *script file, and I
> can resolve it in a note using the default consumer and obtaining the
> correct SNOMED ID
> I see lots of cui_term entries for the same CUI, and I can resolve them
> too.  but  SOB is not present in my cui terms.
> How did it get there?
>
> So either - I am not using one of the tools correctly, or in creating
> SNO_RX, someone has added SOB by hand rather than using the creator.  And
> if they have, they have probably also done other tweaks.
>
> Sean, Ghandi or Jeff
> Can you explain this?
>
> Peter
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Peter - no worries, I understood your intent and was attempting to add some humor.
________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Friday, August 14, 2020 2:09 PM
To: dev@ctakes.apache.org
Subject: Re: Need a little more help on dictionaries [EXTERNAL]

* External Email - Caution *


Thanks Sean.

In no way was the comment  "explanation that makes sense" about you!    I
apologize if it sounded like that.

It is so funny, because in a former company where I was architect, many
years ago,  Oacis Healthcare (which implemented one of the first HL7
databases and gateways) there was another Sean, and this one too, held the
accumulated memory and wisdom about a vital chunk of historical software.
Everyone bombarded him with questions all day long because he was the one
true source.  At the end of the day, his exhaustion was total.

My statement was rhetorical.  Wracking my brain for an explanation I had
possibly missed.

Peter

On Fri, Aug 14, 2020 at 10:27 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>
> >Finally an explanation that makes sense.
> -- It frequently takes a while to get one of those out of me ...
>
> > I don't have check-in privileges so will keep it private for
> now.
> -- We shall have to do something about that.
>
> Cheers,
> Sean
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Friday, August 14, 2020 1:17 PM
> To: dev@ctakes.apache.org
> Subject: Re: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hurray!
> Finally an explanation that makes sense.  I just couldn't figure out how
> you could have made sno_rx with that dictionary creator.   Clearly, those
> helper files represent a LOT of work.
>
> I have locally modified the dictionary creator code to look for the system
> property ctakes.dictgui_helperdata as a way to point it to another of those
> directories.  I don't have check-in privileges so will keep it private for
> now.
>
> Many thanks for your help.
>
> Peter
>
> On Fri, Aug 14, 2020 at 9:51 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Peter,
> >
> > shining a flashlight back into the dark ages ...
> >
> > You have found the advanced configuration directories!
> >
> > Those actually precede the gui dictionary creator and were a big part of
> > formatting with the previous cli dictionary creator.  The cli was
> versatile
> > but not simple.  The default collection of configuration files for the
> cli
> > had a lot more going on.
> >
> > I think that I made "tiny/" directory the default for the gui because it
> > didn't do as much manipulation and I wanted things to be a greater 1:1
> > match with the source.
> >
> > I obviously used something other than the simple "tiny/" configuration
> > when I made sno_rx_16ab.   I remember running repeated tests on some
> > corpora as well as manually inspecting the produced databases.
> >
> > I can't believe that I had forgotten all of this.
> >
> > You should be able to mix and match files from the different
> configuration
> > directories and just throw them into your own directory (or tiny/) then
> > point DEFAULT_.. to your directory and recompile.
> >
> >
> > Sean
> >
> > ________________________________________
> > From: Peter Abramowitsch <pa...@gmail.com>
> > Sent: Friday, August 14, 2020 12:22 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Need a little more help on dictionaries [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Sean
> >
> > I think I found the answer, and I have one question.
> >
> > In dictionary creator, the hardwired dir is "tiny" that in fact has an
> > empty file for those abbreviations
> >
> > In DictionaryBuilder.java:
> >
> > *static private final String DEFAULT_DATA_DIR =
> > "org/apache/ctakes/gui/dictionary/data/tiny";*
> > *...*
> > *final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );*
> >
> > The command line args are not used in this application, neither are
> > sysprops or environment vars so there's no way to change it short of
> > recompiling.
> >
> > So the question is:  do you know why the empty version is the default?
> >
> > Peter
> >
> >
> >
> > On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Peter,
> > >
> > > I don't have an answer but I do have a question:
> > >
> > > In your mrconso.rrf, do you see a snomed line item for "SOB" or only
> "SOB
> > > -Shortness of breath" ?
> > >
> > > I think that the simple "SOB" and "sob" entries might be from other
> > > vocabularies.
> > >
> > > There is (was?) logic in the dictionary creator to multiply things like
> > > "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and
> > create 3
> > > synonym entries: full, left and right.  There is a requirement that the
> > > left side be all caps and a fitting acronym for the right side.
> > However, I
> > > vacillated on the correctness of this behavior as almost all terms
> > already
> > > had the 3 entries.  I am not sure what the current version of the
> creator
> > > does.
> > >
> > > Dictionary creation is indeed a touchy operation.
> > >
> > > Sean
> > > ________________________________________
> > > From: Peter Abramowitsch <pa...@gmail.com>
> > > Sent: Thursday, August 13, 2020 11:57 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Need a little more help on dictionaries [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi All
> > >
> > > I'm able to create a subset with the UMLS mmsys tool, use the
> dictionary
> > > creator on the full UMLS release, create, install and tweak the scripts
> > > adding or removing aliases etc.  My goal is simply to add HUGO gene
> terms
> > > to SNOMED and RXNORM.
> > >
> > > However I must be missing some bit of information on the use of mmsys
> or
> > > the dictionary creator, because some very common terms are missing from
> > my
> > > dictionary but present in the released sno_rx
> > >
> > > As an example, the acronym SOB
> > > in mmsys, the term SOB is present in my subset, and it is mapped into
> > > SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> > > I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> > > INTO TUI VALUES(13404,184)
> > > I see the cui and the preferred term "dyspnea" in my *script file, and
> I
> > > can resolve it in a note using the default consumer and obtaining the
> > > correct SNOMED ID
> > > I see lots of cui_term entries for the same CUI, and I can resolve them
> > > too.  but  SOB is not present in my cui terms.
> > > How did it get there?
> > >
> > > So either - I am not using one of the tools correctly, or in creating
> > > SNO_RX, someone has added SOB by hand rather than using the creator.
> And
> > > if they have, they have probably also done other tweaks.
> > >
> > > Sean, Ghandi or Jeff
> > > Can you explain this?
> > >
> > > Peter
> > >
> >
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Thanks Sean.

In no way was the comment  "explanation that makes sense" about you!    I
apologize if it sounded like that.

It is so funny, because in a former company where I was architect, many
years ago,  Oacis Healthcare (which implemented one of the first HL7
databases and gateways) there was another Sean, and this one too, held the
accumulated memory and wisdom about a vital chunk of historical software.
Everyone bombarded him with questions all day long because he was the one
true source.  At the end of the day, his exhaustion was total.

My statement was rhetorical.  Wracking my brain for an explanation I had
possibly missed.

Peter

On Fri, Aug 14, 2020 at 10:27 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

>
> >Finally an explanation that makes sense.
> -- It frequently takes a while to get one of those out of me ...
>
> > I don't have check-in privileges so will keep it private for
> now.
> -- We shall have to do something about that.
>
> Cheers,
> Sean
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Friday, August 14, 2020 1:17 PM
> To: dev@ctakes.apache.org
> Subject: Re: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hurray!
> Finally an explanation that makes sense.  I just couldn't figure out how
> you could have made sno_rx with that dictionary creator.   Clearly, those
> helper files represent a LOT of work.
>
> I have locally modified the dictionary creator code to look for the system
> property ctakes.dictgui_helperdata as a way to point it to another of those
> directories.  I don't have check-in privileges so will keep it private for
> now.
>
> Many thanks for your help.
>
> Peter
>
> On Fri, Aug 14, 2020 at 9:51 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Peter,
> >
> > shining a flashlight back into the dark ages ...
> >
> > You have found the advanced configuration directories!
> >
> > Those actually precede the gui dictionary creator and were a big part of
> > formatting with the previous cli dictionary creator.  The cli was
> versatile
> > but not simple.  The default collection of configuration files for the
> cli
> > had a lot more going on.
> >
> > I think that I made "tiny/" directory the default for the gui because it
> > didn't do as much manipulation and I wanted things to be a greater 1:1
> > match with the source.
> >
> > I obviously used something other than the simple "tiny/" configuration
> > when I made sno_rx_16ab.   I remember running repeated tests on some
> > corpora as well as manually inspecting the produced databases.
> >
> > I can't believe that I had forgotten all of this.
> >
> > You should be able to mix and match files from the different
> configuration
> > directories and just throw them into your own directory (or tiny/) then
> > point DEFAULT_.. to your directory and recompile.
> >
> >
> > Sean
> >
> > ________________________________________
> > From: Peter Abramowitsch <pa...@gmail.com>
> > Sent: Friday, August 14, 2020 12:22 PM
> > To: dev@ctakes.apache.org
> > Subject: Re: Need a little more help on dictionaries [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi Sean
> >
> > I think I found the answer, and I have one question.
> >
> > In dictionary creator, the hardwired dir is "tiny" that in fact has an
> > empty file for those abbreviations
> >
> > In DictionaryBuilder.java:
> >
> > *static private final String DEFAULT_DATA_DIR =
> > "org/apache/ctakes/gui/dictionary/data/tiny";*
> > *...*
> > *final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );*
> >
> > The command line args are not used in this application, neither are
> > sysprops or environment vars so there's no way to change it short of
> > recompiling.
> >
> > So the question is:  do you know why the empty version is the default?
> >
> > Peter
> >
> >
> >
> > On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
> > Sean.Finan@childrens.harvard.edu> wrote:
> >
> > > Hi Peter,
> > >
> > > I don't have an answer but I do have a question:
> > >
> > > In your mrconso.rrf, do you see a snomed line item for "SOB" or only
> "SOB
> > > -Shortness of breath" ?
> > >
> > > I think that the simple "SOB" and "sob" entries might be from other
> > > vocabularies.
> > >
> > > There is (was?) logic in the dictionary creator to multiply things like
> > > "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and
> > create 3
> > > synonym entries: full, left and right.  There is a requirement that the
> > > left side be all caps and a fitting acronym for the right side.
> > However, I
> > > vacillated on the correctness of this behavior as almost all terms
> > already
> > > had the 3 entries.  I am not sure what the current version of the
> creator
> > > does.
> > >
> > > Dictionary creation is indeed a touchy operation.
> > >
> > > Sean
> > > ________________________________________
> > > From: Peter Abramowitsch <pa...@gmail.com>
> > > Sent: Thursday, August 13, 2020 11:57 PM
> > > To: dev@ctakes.apache.org
> > > Subject: Need a little more help on dictionaries [EXTERNAL]
> > >
> > > * External Email - Caution *
> > >
> > >
> > > Hi All
> > >
> > > I'm able to create a subset with the UMLS mmsys tool, use the
> dictionary
> > > creator on the full UMLS release, create, install and tweak the scripts
> > > adding or removing aliases etc.  My goal is simply to add HUGO gene
> terms
> > > to SNOMED and RXNORM.
> > >
> > > However I must be missing some bit of information on the use of mmsys
> or
> > > the dictionary creator, because some very common terms are missing from
> > my
> > > dictionary but present in the released sno_rx
> > >
> > > As an example, the acronym SOB
> > > in mmsys, the term SOB is present in my subset, and it is mapped into
> > > SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> > > I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> > > INTO TUI VALUES(13404,184)
> > > I see the cui and the preferred term "dyspnea" in my *script file, and
> I
> > > can resolve it in a note using the default consumer and obtaining the
> > > correct SNOMED ID
> > > I see lots of cui_term entries for the same CUI, and I can resolve them
> > > too.  but  SOB is not present in my cui terms.
> > > How did it get there?
> > >
> > > So either - I am not using one of the tools correctly, or in creating
> > > SNO_RX, someone has added SOB by hand rather than using the creator.
> And
> > > if they have, they have probably also done other tweaks.
> > >
> > > Sean, Ghandi or Jeff
> > > Can you explain this?
> > >
> > > Peter
> > >
> >
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
>Finally an explanation that makes sense.
-- It frequently takes a while to get one of those out of me ...

> I don't have check-in privileges so will keep it private for
now.
-- We shall have to do something about that.

Cheers,
Sean

________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Friday, August 14, 2020 1:17 PM
To: dev@ctakes.apache.org
Subject: Re: Need a little more help on dictionaries [EXTERNAL]

* External Email - Caution *


Hurray!
Finally an explanation that makes sense.  I just couldn't figure out how
you could have made sno_rx with that dictionary creator.   Clearly, those
helper files represent a LOT of work.

I have locally modified the dictionary creator code to look for the system
property ctakes.dictgui_helperdata as a way to point it to another of those
directories.  I don't have check-in privileges so will keep it private for
now.

Many thanks for your help.

Peter

On Fri, Aug 14, 2020 at 9:51 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> shining a flashlight back into the dark ages ...
>
> You have found the advanced configuration directories!
>
> Those actually precede the gui dictionary creator and were a big part of
> formatting with the previous cli dictionary creator.  The cli was versatile
> but not simple.  The default collection of configuration files for the cli
> had a lot more going on.
>
> I think that I made "tiny/" directory the default for the gui because it
> didn't do as much manipulation and I wanted things to be a greater 1:1
> match with the source.
>
> I obviously used something other than the simple "tiny/" configuration
> when I made sno_rx_16ab.   I remember running repeated tests on some
> corpora as well as manually inspecting the produced databases.
>
> I can't believe that I had forgotten all of this.
>
> You should be able to mix and match files from the different configuration
> directories and just throw them into your own directory (or tiny/) then
> point DEFAULT_.. to your directory and recompile.
>
>
> Sean
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Friday, August 14, 2020 12:22 PM
> To: dev@ctakes.apache.org
> Subject: Re: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean
>
> I think I found the answer, and I have one question.
>
> In dictionary creator, the hardwired dir is "tiny" that in fact has an
> empty file for those abbreviations
>
> In DictionaryBuilder.java:
>
> *static private final String DEFAULT_DATA_DIR =
> "org/apache/ctakes/gui/dictionary/data/tiny";*
> *...*
> *final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );*
>
> The command line args are not used in this application, neither are
> sysprops or environment vars so there's no way to change it short of
> recompiling.
>
> So the question is:  do you know why the empty version is the default?
>
> Peter
>
>
>
> On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Peter,
> >
> > I don't have an answer but I do have a question:
> >
> > In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> > -Shortness of breath" ?
> >
> > I think that the simple "SOB" and "sob" entries might be from other
> > vocabularies.
> >
> > There is (was?) logic in the dictionary creator to multiply things like
> > "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and
> create 3
> > synonym entries: full, left and right.  There is a requirement that the
> > left side be all caps and a fitting acronym for the right side.
> However, I
> > vacillated on the correctness of this behavior as almost all terms
> already
> > had the 3 entries.  I am not sure what the current version of the creator
> > does.
> >
> > Dictionary creation is indeed a touchy operation.
> >
> > Sean
> > ________________________________________
> > From: Peter Abramowitsch <pa...@gmail.com>
> > Sent: Thursday, August 13, 2020 11:57 PM
> > To: dev@ctakes.apache.org
> > Subject: Need a little more help on dictionaries [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi All
> >
> > I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> > creator on the full UMLS release, create, install and tweak the scripts
> > adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> > to SNOMED and RXNORM.
> >
> > However I must be missing some bit of information on the use of mmsys or
> > the dictionary creator, because some very common terms are missing from
> my
> > dictionary but present in the released sno_rx
> >
> > As an example, the acronym SOB
> > in mmsys, the term SOB is present in my subset, and it is mapped into
> > SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> > I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> > INTO TUI VALUES(13404,184)
> > I see the cui and the preferred term "dyspnea" in my *script file, and I
> > can resolve it in a note using the default consumer and obtaining the
> > correct SNOMED ID
> > I see lots of cui_term entries for the same CUI, and I can resolve them
> > too.  but  SOB is not present in my cui terms.
> > How did it get there?
> >
> > So either - I am not using one of the tools correctly, or in creating
> > SNO_RX, someone has added SOB by hand rather than using the creator.  And
> > if they have, they have probably also done other tweaks.
> >
> > Sean, Ghandi or Jeff
> > Can you explain this?
> >
> > Peter
> >
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hurray!
Finally an explanation that makes sense.  I just couldn't figure out how
you could have made sno_rx with that dictionary creator.   Clearly, those
helper files represent a LOT of work.

I have locally modified the dictionary creator code to look for the system
property ctakes.dictgui_helperdata as a way to point it to another of those
directories.  I don't have check-in privileges so will keep it private for
now.

Many thanks for your help.

Peter

On Fri, Aug 14, 2020 at 9:51 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> shining a flashlight back into the dark ages ...
>
> You have found the advanced configuration directories!
>
> Those actually precede the gui dictionary creator and were a big part of
> formatting with the previous cli dictionary creator.  The cli was versatile
> but not simple.  The default collection of configuration files for the cli
> had a lot more going on.
>
> I think that I made "tiny/" directory the default for the gui because it
> didn't do as much manipulation and I wanted things to be a greater 1:1
> match with the source.
>
> I obviously used something other than the simple "tiny/" configuration
> when I made sno_rx_16ab.   I remember running repeated tests on some
> corpora as well as manually inspecting the produced databases.
>
> I can't believe that I had forgotten all of this.
>
> You should be able to mix and match files from the different configuration
> directories and just throw them into your own directory (or tiny/) then
> point DEFAULT_.. to your directory and recompile.
>
>
> Sean
>
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Friday, August 14, 2020 12:22 PM
> To: dev@ctakes.apache.org
> Subject: Re: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi Sean
>
> I think I found the answer, and I have one question.
>
> In dictionary creator, the hardwired dir is "tiny" that in fact has an
> empty file for those abbreviations
>
> In DictionaryBuilder.java:
>
> *static private final String DEFAULT_DATA_DIR =
> "org/apache/ctakes/gui/dictionary/data/tiny";*
> *...*
> *final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );*
>
> The command line args are not used in this application, neither are
> sysprops or environment vars so there's no way to change it short of
> recompiling.
>
> So the question is:  do you know why the empty version is the default?
>
> Peter
>
>
>
> On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
> Sean.Finan@childrens.harvard.edu> wrote:
>
> > Hi Peter,
> >
> > I don't have an answer but I do have a question:
> >
> > In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> > -Shortness of breath" ?
> >
> > I think that the simple "SOB" and "sob" entries might be from other
> > vocabularies.
> >
> > There is (was?) logic in the dictionary creator to multiply things like
> > "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and
> create 3
> > synonym entries: full, left and right.  There is a requirement that the
> > left side be all caps and a fitting acronym for the right side.
> However, I
> > vacillated on the correctness of this behavior as almost all terms
> already
> > had the 3 entries.  I am not sure what the current version of the creator
> > does.
> >
> > Dictionary creation is indeed a touchy operation.
> >
> > Sean
> > ________________________________________
> > From: Peter Abramowitsch <pa...@gmail.com>
> > Sent: Thursday, August 13, 2020 11:57 PM
> > To: dev@ctakes.apache.org
> > Subject: Need a little more help on dictionaries [EXTERNAL]
> >
> > * External Email - Caution *
> >
> >
> > Hi All
> >
> > I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> > creator on the full UMLS release, create, install and tweak the scripts
> > adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> > to SNOMED and RXNORM.
> >
> > However I must be missing some bit of information on the use of mmsys or
> > the dictionary creator, because some very common terms are missing from
> my
> > dictionary but present in the released sno_rx
> >
> > As an example, the acronym SOB
> > in mmsys, the term SOB is present in my subset, and it is mapped into
> > SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> > I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> > INTO TUI VALUES(13404,184)
> > I see the cui and the preferred term "dyspnea" in my *script file, and I
> > can resolve it in a note using the default consumer and obtaining the
> > correct SNOMED ID
> > I see lots of cui_term entries for the same CUI, and I can resolve them
> > too.  but  SOB is not present in my cui terms.
> > How did it get there?
> >
> > So either - I am not using one of the tools correctly, or in creating
> > SNO_RX, someone has added SOB by hand rather than using the creator.  And
> > if they have, they have probably also done other tweaks.
> >
> > Sean, Ghandi or Jeff
> > Can you explain this?
> >
> > Peter
> >
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Peter,

shining a flashlight back into the dark ages ...

You have found the advanced configuration directories!

Those actually precede the gui dictionary creator and were a big part of formatting with the previous cli dictionary creator.  The cli was versatile but not simple.  The default collection of configuration files for the cli had a lot more going on.  

I think that I made "tiny/" directory the default for the gui because it didn't do as much manipulation and I wanted things to be a greater 1:1 match with the source.

I obviously used something other than the simple "tiny/" configuration when I made sno_rx_16ab.   I remember running repeated tests on some corpora as well as manually inspecting the produced databases.

I can't believe that I had forgotten all of this.

You should be able to mix and match files from the different configuration directories and just throw them into your own directory (or tiny/) then point DEFAULT_.. to your directory and recompile.


Sean

________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Friday, August 14, 2020 12:22 PM
To: dev@ctakes.apache.org
Subject: Re: Need a little more help on dictionaries [EXTERNAL]

* External Email - Caution *


Hi Sean

I think I found the answer, and I have one question.

In dictionary creator, the hardwired dir is "tiny" that in fact has an
empty file for those abbreviations

In DictionaryBuilder.java:

*static private final String DEFAULT_DATA_DIR =
"org/apache/ctakes/gui/dictionary/data/tiny";*
*...*
*final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );*

The command line args are not used in this application, neither are
sysprops or environment vars so there's no way to change it short of
recompiling.

So the question is:  do you know why the empty version is the default?

Peter



On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> I don't have an answer but I do have a question:
>
> In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> -Shortness of breath" ?
>
> I think that the simple "SOB" and "sob" entries might be from other
> vocabularies.
>
> There is (was?) logic in the dictionary creator to multiply things like
> "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and create 3
> synonym entries: full, left and right.  There is a requirement that the
> left side be all caps and a fitting acronym for the right side.  However, I
> vacillated on the correctness of this behavior as almost all terms already
> had the 3 entries.  I am not sure what the current version of the creator
> does.
>
> Dictionary creation is indeed a touchy operation.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Thursday, August 13, 2020 11:57 PM
> To: dev@ctakes.apache.org
> Subject: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> creator on the full UMLS release, create, install and tweak the scripts
> adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> to SNOMED and RXNORM.
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> As an example, the acronym SOB
> in mmsys, the term SOB is present in my subset, and it is mapped into
> SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> INTO TUI VALUES(13404,184)
> I see the cui and the preferred term "dyspnea" in my *script file, and I
> can resolve it in a note using the default consumer and obtaining the
> correct SNOMED ID
> I see lots of cui_term entries for the same CUI, and I can resolve them
> too.  but  SOB is not present in my cui terms.
> How did it get there?
>
> So either - I am not using one of the tools correctly, or in creating
> SNO_RX, someone has added SOB by hand rather than using the creator.  And
> if they have, they have probably also done other tweaks.
>
> Sean, Ghandi or Jeff
> Can you explain this?
>
> Peter
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by Peter Abramowitsch <pa...@gmail.com>.
Hi Sean

I think I found the answer, and I have one question.

In dictionary creator, the hardwired dir is "tiny" that in fact has an
empty file for those abbreviations

In DictionaryBuilder.java:

*static private final String DEFAULT_DATA_DIR =
"org/apache/ctakes/gui/dictionary/data/tiny";*
*...*
*final UmlsTermUtil umlsTermUtil = new UmlsTermUtil( DEFAULT_DATA_DIR );*

The command line args are not used in this application, neither are
sysprops or environment vars so there's no way to change it short of
recompiling.

So the question is:  do you know why the empty version is the default?

Peter



On Fri, Aug 14, 2020 at 4:53 AM Finan, Sean <
Sean.Finan@childrens.harvard.edu> wrote:

> Hi Peter,
>
> I don't have an answer but I do have a question:
>
> In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB
> -Shortness of breath" ?
>
> I think that the simple "SOB" and "sob" entries might be from other
> vocabularies.
>
> There is (was?) logic in the dictionary creator to multiply things like
> "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and create 3
> synonym entries: full, left and right.  There is a requirement that the
> left side be all caps and a fitting acronym for the right side.  However, I
> vacillated on the correctness of this behavior as almost all terms already
> had the 3 entries.  I am not sure what the current version of the creator
> does.
>
> Dictionary creation is indeed a touchy operation.
>
> Sean
> ________________________________________
> From: Peter Abramowitsch <pa...@gmail.com>
> Sent: Thursday, August 13, 2020 11:57 PM
> To: dev@ctakes.apache.org
> Subject: Need a little more help on dictionaries [EXTERNAL]
>
> * External Email - Caution *
>
>
> Hi All
>
> I'm able to create a subset with the UMLS mmsys tool, use the dictionary
> creator on the full UMLS release, create, install and tweak the scripts
> adding or removing aliases etc.  My goal is simply to add HUGO gene terms
> to SNOMED and RXNORM.
>
> However I must be missing some bit of information on the use of mmsys or
> the dictionary creator, because some very common terms are missing from my
> dictionary but present in the released sno_rx
>
> As an example, the acronym SOB
> in mmsys, the term SOB is present in my subset, and it is mapped into
> SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
> I see the cui_tui mapping it into the correct TUI for a finding  INSERT
> INTO TUI VALUES(13404,184)
> I see the cui and the preferred term "dyspnea" in my *script file, and I
> can resolve it in a note using the default consumer and obtaining the
> correct SNOMED ID
> I see lots of cui_term entries for the same CUI, and I can resolve them
> too.  but  SOB is not present in my cui terms.
> How did it get there?
>
> So either - I am not using one of the tools correctly, or in creating
> SNO_RX, someone has added SOB by hand rather than using the creator.  And
> if they have, they have probably also done other tweaks.
>
> Sean, Ghandi or Jeff
> Can you explain this?
>
> Peter
>

Re: Need a little more help on dictionaries [EXTERNAL]

Posted by "Finan, Sean" <Se...@childrens.harvard.edu>.
Hi Peter,

I don't have an answer but I do have a question:

In your mrconso.rrf, do you see a snomed line item for "SOB" or only "SOB -Shortness of breath" ?

I think that the simple "SOB" and "sob" entries might be from other vocabularies.

There is (was?) logic in the dictionary creator to multiply things like "SOB - Shortness of breath", "SOB (Shortness of breath)"  etc. and create 3 synonym entries: full, left and right.  There is a requirement that the left side be all caps and a fitting acronym for the right side.  However, I vacillated on the correctness of this behavior as almost all terms already had the 3 entries.  I am not sure what the current version of the creator does.

Dictionary creation is indeed a touchy operation.

Sean
________________________________________
From: Peter Abramowitsch <pa...@gmail.com>
Sent: Thursday, August 13, 2020 11:57 PM
To: dev@ctakes.apache.org
Subject: Need a little more help on dictionaries [EXTERNAL]

* External Email - Caution *


Hi All

I'm able to create a subset with the UMLS mmsys tool, use the dictionary
creator on the full UMLS release, create, install and tweak the scripts
adding or removing aliases etc.  My goal is simply to add HUGO gene terms
to SNOMED and RXNORM.

However I must be missing some bit of information on the use of mmsys or
the dictionary creator, because some very common terms are missing from my
dictionary but present in the released sno_rx

As an example, the acronym SOB
in mmsys, the term SOB is present in my subset, and it is mapped into
SNOMED with the expected CUI 13404 and SNOMEDIDs same as sno_rx
I see the cui_tui mapping it into the correct TUI for a finding  INSERT
INTO TUI VALUES(13404,184)
I see the cui and the preferred term "dyspnea" in my *script file, and I
can resolve it in a note using the default consumer and obtaining the
correct SNOMED ID
I see lots of cui_term entries for the same CUI, and I can resolve them
too.  but  SOB is not present in my cui terms.
How did it get there?

So either - I am not using one of the tools correctly, or in creating
SNO_RX, someone has added SOB by hand rather than using the creator.  And
if they have, they have probably also done other tweaks.

Sean, Ghandi or Jeff
Can you explain this?

Peter